Scikit-Learn LassoCV Regression Model

LassoCV is a regression algorithm that performs L1 regularization, shrinking some coefficients to zero to reduce model complexity and prevent overfitting. It automatically selects the best regularization parameter through cross-validation.

The key hyperparameters of LassoCV include alphas (range of regularization strengths to try), cv (number of cross-validation folds), and max_iter (maximum number of iterations).

The algorithm is appropriate for regression problems.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LassoCV
from sklearn.metrics import mean_squared_error

# generate regression dataset
X, y = make_regression(n_samples=100, n_features=20, noise=0.1, random_state=1)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# create model
model = LassoCV(alphas=[0.1, 1.0, 10.0], cv=5, max_iter=10000)

# fit model
model.fit(X_train, y_train)

# evaluate model
yhat = model.predict(X_test)
mse = mean_squared_error(y_test, yhat)
print('Mean Squared Error: %.3f' % mse)

# make a prediction
row = [[0.5]*20]
yhat = model.predict(row)
print('Predicted: %.3f' % yhat[0])

Running the example gives an output like:

Mean Squared Error: 0.217
Predicted: 234.444

The steps are as follows:

Generate a synthetic regression dataset using make_regression(), specifying the number of samples (n_samples), features (n_features), noise (noise), and random seed (random_state).
Split the dataset into training and test sets using train_test_split().
Instantiate a LassoCV model, specifying a range of alpha values (alphas), the number of cross-validation folds (cv), and maximum iterations (max_iter).
Fit the model on the training data with fit().
Evaluate the model’s performance using the mean squared error metric by comparing predictions (yhat) to actual values (y_test).
Make a single prediction by passing a new data sample to the predict() method.

This example shows how to use LassoCV for feature selection and regularization in regression tasks, automatically selecting the best regularization parameter through cross-validation to improve model performance and prevent overfitting.

See Also