Lasso regression is a popular technique for linear regression with regularization, useful for feature selection and preventing overfitting.
In scikit-learn, the Lasso
class provides an implementation of this algorithm. To achieve optimal performance, the model’s hyperparameters need to be tuned, particularly the alpha
parameter which controls the regularization strength.
Lasso
has key hyperparameters such as alpha
(regularization strength) and max_iter
(maximum iterations for the solver). Manually tuning alpha
can be time-consuming and requires domain knowledge.
On the other hand, LassoCV
automates the hyperparameter tuning process using cross-validation. Its key hyperparameters include alphas
(list of alpha values to try) and cv
(number of folds for cross-validation).
The main difference is that LassoCV
automates the hyperparameter tuning process, while Lasso
requires manual tuning. However, this automation comes at a computational cost, as LassoCV
trains multiple models during cross-validation.
Lasso
is ideal for quick prototyping or when you have prior knowledge of good hyperparameter values. LassoCV
is preferred when you need to tune hyperparameters and perform model selection, especially with new datasets.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso, LassoCV
from sklearn.metrics import mean_squared_error, r2_score
# Generate synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Fit and evaluate Lasso with default hyperparameters
lasso = Lasso(random_state=42)
lasso.fit(X_train, y_train)
y_pred_lasso = lasso.predict(X_test)
print(f"Lasso MSE: {mean_squared_error(y_test, y_pred_lasso):.3f}")
print(f"Lasso R2 score: {r2_score(y_test, y_pred_lasso):.3f}")
# Fit and evaluate LassoCV with cross-validation
lasso_cv = LassoCV(cv=5, random_state=42)
lasso_cv.fit(X_train, y_train)
y_pred_lasso_cv = lasso_cv.predict(X_test)
print(f"\nLassoCV MSE: {mean_squared_error(y_test, y_pred_lasso_cv):.3f}")
print(f"LassoCV R2 score: {r2_score(y_test, y_pred_lasso_cv):.3f}")
print(f"Best alpha: {lasso_cv.alpha_}")
Running the example gives an output like:
Lasso MSE: 10.715
Lasso R2 score: 1.000
LassoCV MSE: 0.099
LassoCV R2 score: 1.000
Best alpha: 0.09099259355963495
The steps are as follows:
- Generate a synthetic regression dataset using
make_regression
. - Split the data into training and test sets using
train_test_split
. - Instantiate
Lasso
with default hyperparameters, fit it on the training data, and evaluate its performance on the test set. - Instantiate
LassoCV
with 5-fold cross-validation, fit it on the training data, and evaluate its performance on the test set. - Compare the test set performance (MSE and R2 score) of both models and print the best
alpha
found byLassoCV
.