Configure LinearRegression "copy_X" Parameter

The copy_X parameter in scikit-learn’s LinearRegression determines whether the input data should be copied or overwritten during fitting.

By default, copy_X is set to True, which means that the original input data is preserved and a copy is made for internal use by the model.

Setting copy_X to False can save memory, especially when working with large datasets, as it allows the input data to be overwritten during the fitting process. However, this means that the original data will be modified.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Generate synthetic dataset
X, y = make_regression(n_samples=100, n_features=1, noise=20, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with copy_X=True (default)
lr_copy = LinearRegression(copy_X=True)
lr_copy.fit(X_train, y_train)

# Train with copy_X=False
lr_no_copy = LinearRegression(copy_X=False)
lr_no_copy.fit(X_train, y_train)

# Modify input data
X_train[:] = 0

# Retrain the models
lr_copy.fit(X_train, y_train)
lr_no_copy.fit(X_train, y_train)

print(f"Coefficient with copy_X=True: {lr_copy.coef_[0]:.3f}")
print(f"Coefficient with copy_X=False: {lr_no_copy.coef_[0]:.3f}")

Running the example gives an output like:

Coefficient with copy_X=True: 0.000
Coefficient with copy_X=False: 0.000

The key steps in this example are:

Generate a synthetic regression dataset
Split the data into train and test sets
Train LinearRegression models with copy_X=True and copy_X=False
Modify the input data and retrain the models
Compare the coefficients of the retrained models

Tips for setting copy_X:

Use copy_X=False when working with large datasets to save memory
Use copy_X=True (default) if you need to preserve the original input data

Potential issues:

Setting copy_X=False can lead to unexpected behavior if the input data is modified after fitting
When copy_X=False, the original input data will be overwritten during fitting

See Also