Scikit-Learn "Lasso" versus "LassoCV"

Lasso regression is a popular technique for linear regression with regularization, useful for feature selection and preventing overfitting.

In scikit-learn, the Lasso class provides an implementation of this algorithm. To achieve optimal performance, the model’s hyperparameters need to be tuned, particularly the alpha parameter which controls the regularization strength.

Lasso has key hyperparameters such as alpha (regularization strength) and max_iter (maximum iterations for the solver). Manually tuning alpha can be time-consuming and requires domain knowledge.

On the other hand, LassoCV automates the hyperparameter tuning process using cross-validation. Its key hyperparameters include alphas (list of alpha values to try) and cv (number of folds for cross-validation).

The main difference is that LassoCV automates the hyperparameter tuning process, while Lasso requires manual tuning. However, this automation comes at a computational cost, as LassoCV trains multiple models during cross-validation.

Lasso is ideal for quick prototyping or when you have prior knowledge of good hyperparameter values. LassoCV is preferred when you need to tune hyperparameters and perform model selection, especially with new datasets.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso, LassoCV
from sklearn.metrics import mean_squared_error, r2_score

# Generate synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit and evaluate Lasso with default hyperparameters
lasso = Lasso(random_state=42)
lasso.fit(X_train, y_train)
y_pred_lasso = lasso.predict(X_test)
print(f"Lasso MSE: {mean_squared_error(y_test, y_pred_lasso):.3f}")
print(f"Lasso R2 score: {r2_score(y_test, y_pred_lasso):.3f}")

# Fit and evaluate LassoCV with cross-validation
lasso_cv = LassoCV(cv=5, random_state=42)
lasso_cv.fit(X_train, y_train)
y_pred_lasso_cv = lasso_cv.predict(X_test)
print(f"\nLassoCV MSE: {mean_squared_error(y_test, y_pred_lasso_cv):.3f}")
print(f"LassoCV R2 score: {r2_score(y_test, y_pred_lasso_cv):.3f}")
print(f"Best alpha: {lasso_cv.alpha_}")

Running the example gives an output like:

Lasso MSE: 10.715
Lasso R2 score: 1.000

LassoCV MSE: 0.099
LassoCV R2 score: 1.000
Best alpha: 0.09099259355963495

The steps are as follows:

Generate a synthetic regression dataset using make_regression.
Split the data into training and test sets using train_test_split.
Instantiate Lasso with default hyperparameters, fit it on the training data, and evaluate its performance on the test set.
Instantiate LassoCV with 5-fold cross-validation, fit it on the training data, and evaluate its performance on the test set.
Compare the test set performance (MSE and R2 score) of both models and print the best alpha found by LassoCV.

See Also