SKLearner Home | About | Contact | Examples

Configure Ridge "copy_X" Parameter

The copy_X parameter in scikit-learn’s Ridge class controls whether the input data is copied or overwritten during model fitting.

Ridge regression is a linear model that adds L2 regularization to ordinary least squares. The regularization helps to prevent overfitting and can improve the model’s generalization performance.

By default, copy_X is set to True, which means that the Ridge class will make a copy of the input data before fitting the model. This ensures that the original data is not modified during the fitting process.

Setting copy_X to False can save memory, as it avoids creating a copy of the input data. However, it will cause the input data to be overwritten during fitting, which may lead to unexpected changes to the original data.

from sklearn.datasets import make_regression
from sklearn.linear_model import Ridge

# Generate a small synthetic regression dataset
X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=42)

# Create two identical Ridge models with different copy_X settings
ridge_copy = Ridge(alpha=1.0, copy_X=True)
ridge_no_copy = Ridge(alpha=1.0, copy_X=False)

# Fit both models on the same training data
ridge_copy.fit(X, y)
ridge_no_copy.fit(X, y)

# The model coefficients are the same regardless of copy_X
print("Coefficients with copy_X=True:", ridge_copy.coef_)
print("Coefficients with copy_X=False:", ridge_no_copy.coef_)

# Setting copy_X=False modifies the original input data array
print("Input data after fitting with copy_X=False:")
print(X[:5])

Running the example gives an output like:

Coefficients with copy_X=True: [60.00694188 97.51793602 63.45136759 56.40717681 35.40203643]
Coefficients with copy_X=False: [60.00694188 97.51793602 63.45136759 56.40717681 35.40203643]
Input data after fitting with copy_X=False:
[[ 1.00732176 -0.80522018  0.03250041 -0.9742086   0.16967807]
 [ 0.11407617 -0.61342202  0.80371641 -0.84977945 -0.1429451 ]
 [-1.38010167 -1.03608254 -0.51754034 -1.08978535  0.40812084]
 [-0.61291772  0.23357756  1.40098721 -0.14896435  1.09740641]
 [-0.59049749  0.1529334  -1.90734061 -0.22873933  0.68219072]]

The key steps in this example are:

  1. Generate a small synthetic regression dataset using make_regression
  2. Create two Ridge models with different copy_X settings
  3. Fit both models on the same training data
  4. Verify that the model coefficients are the same regardless of copy_X
  5. Show that setting copy_X=False modifies the original input data array

Some tips and heuristics for setting copy_X:

Issues to consider:



See Also