SKLearner Home | About | Contact | Examples

Configure Lasso "copy_X" Parameter

The copy_X parameter in scikit-learn’s Lasso class controls whether the input data is copied or overwritten during fitting.

Lasso is a linear regression technique that performs both variable selection and regularization. It minimizes the sum of squared errors while also penalizing the absolute values of the coefficients, leading to sparse solutions.

By default, copy_X is set to True, which means that the input data X will be copied before any preprocessing or fitting takes place. This ensures that the original data remains unmodified, but it can be memory-intensive for large datasets.

Setting copy_X to False can save memory by allowing X to be overwritten, but it requires that X is not used elsewhere and can lead to unexpected behavior if X is modified externally.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.metrics import r2_score
import time

# Generate synthetic dataset
X, y = make_regression(n_samples=100000, n_features=100, noise=0.5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with copy_X=True
start_time = time.time()
lasso_copy_true = Lasso(alpha=0.1, copy_X=True, random_state=42)
lasso_copy_true.fit(X_train, y_train)
y_pred_true = lasso_copy_true.predict(X_test)
r2_true = r2_score(y_test, y_pred_true)
time_true = time.time() - start_time

# Train with copy_X=False
start_time = time.time()
lasso_copy_false = Lasso(alpha=0.1, copy_X=False, random_state=42)
lasso_copy_false.fit(X_train, y_train)
y_pred_false = lasso_copy_false.predict(X_test)
r2_false = r2_score(y_test, y_pred_false)
time_false = time.time() - start_time

print(f"copy_X=True, R-squared: {r2_true:.3f}, Time: {time_true:.3f} seconds")
print(f"copy_X=False, R-squared: {r2_false:.3f}, Time: {time_false:.3f} seconds")

The output will look like:

copy_X=True, R-squared: 1.000, Time: 0.173 seconds
copy_X=False, R-squared: 1.000, Time: 0.125 seconds

The key steps in this example are:

  1. Generate a large synthetic regression dataset
  2. Split the data into train and test sets
  3. Train Lasso models with copy_X=True and copy_X=False
  4. Evaluate the R-squared score and runtime for each setting

Tips and heuristics for setting copy_X:

Issues to consider:



See Also