SKLearner Home | About | Contact | Examples

Configure LinearRegression "copy_X" Parameter

The copy_X parameter in scikit-learn’s LinearRegression determines whether the input data should be copied or overwritten during fitting.

By default, copy_X is set to True, which means that the original input data is preserved and a copy is made for internal use by the model.

Setting copy_X to False can save memory, especially when working with large datasets, as it allows the input data to be overwritten during the fitting process. However, this means that the original data will be modified.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Generate synthetic dataset
X, y = make_regression(n_samples=100, n_features=1, noise=20, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with copy_X=True (default)
lr_copy = LinearRegression(copy_X=True)
lr_copy.fit(X_train, y_train)

# Train with copy_X=False
lr_no_copy = LinearRegression(copy_X=False)
lr_no_copy.fit(X_train, y_train)

# Modify input data
X_train[:] = 0

# Retrain the models
lr_copy.fit(X_train, y_train)
lr_no_copy.fit(X_train, y_train)

print(f"Coefficient with copy_X=True: {lr_copy.coef_[0]:.3f}")
print(f"Coefficient with copy_X=False: {lr_no_copy.coef_[0]:.3f}")

Running the example gives an output like:

Coefficient with copy_X=True: 0.000
Coefficient with copy_X=False: 0.000

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Train LinearRegression models with copy_X=True and copy_X=False
  4. Modify the input data and retrain the models
  5. Compare the coefficients of the retrained models

Tips for setting copy_X:

Potential issues:



See Also