LassoCV is a regression algorithm that performs L1 regularization, shrinking some coefficients to zero to reduce model complexity and prevent overfitting. It automatically selects the best regularization parameter through cross-validation.
The key hyperparameters of LassoCV
include alphas
(range of regularization strengths to try), cv
(number of cross-validation folds), and max_iter
(maximum number of iterations).
The algorithm is appropriate for regression problems.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LassoCV
from sklearn.metrics import mean_squared_error
# generate regression dataset
X, y = make_regression(n_samples=100, n_features=20, noise=0.1, random_state=1)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create model
model = LassoCV(alphas=[0.1, 1.0, 10.0], cv=5, max_iter=10000)
# fit model
model.fit(X_train, y_train)
# evaluate model
yhat = model.predict(X_test)
mse = mean_squared_error(y_test, yhat)
print('Mean Squared Error: %.3f' % mse)
# make a prediction
row = [[0.5]*20]
yhat = model.predict(row)
print('Predicted: %.3f' % yhat[0])
Running the example gives an output like:
Mean Squared Error: 0.217
Predicted: 234.444
The steps are as follows:
Generate a synthetic regression dataset using
make_regression()
, specifying the number of samples (n_samples
), features (n_features
), noise (noise
), and random seed (random_state
).Split the dataset into training and test sets using
train_test_split()
.Instantiate a
LassoCV
model, specifying a range of alpha values (alphas
), the number of cross-validation folds (cv
), and maximum iterations (max_iter
).Fit the model on the training data with
fit()
.Evaluate the model’s performance using the mean squared error metric by comparing predictions (
yhat
) to actual values (y_test
).Make a single prediction by passing a new data sample to the
predict()
method.
This example shows how to use LassoCV
for feature selection and regularization in regression tasks, automatically selecting the best regularization parameter through cross-validation to improve model performance and prevent overfitting.