Scikit-Learn ElasticNetCV Regression Model

ElasticNetCV is a linear regression model with built-in cross-validation for tuning the hyperparameters. It combines both L1 and L2 regularization, making it useful for high-dimensional datasets where feature selection is desired.

The key hyperparameters of ElasticNetCV include alpha (regularization strength), l1_ratio (mix of L1/L2 regularization), and cv (number of folds for cross-validation).

The algorithm is appropriate for regression problems.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNetCV
from sklearn.metrics import r2_score

# generate regression dataset
X, y = make_regression(n_samples=100, n_features=10, noise=0.5, random_state=1)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# create model
model = ElasticNetCV(cv=5)

# fit model
model.fit(X_train, y_train)

# evaluate model
yhat = model.predict(X_test)
r2 = r2_score(y_test, yhat)
print('R2: %.3f' % r2)

# make a prediction
row = [[1.91783824, 0.39869027, 0.35794608, -0.86778928, 1.70660296, 1.26799878, 0.18912945, -0.14618715, 0.39220467, 1.49778873]]
yhat = model.predict(row)
print('Predicted: %.3f' % yhat[0])

Running the example gives an output like:

R2: 0.986
Predicted: 197.776

The steps are as follows:

A synthetic regression dataset is generated using make_regression(). The dataset is split into training and test sets using train_test_split().
An ElasticNetCV model is instantiated with default hyperparameters, except for setting the number of cross-validation folds to 5.
The model is fit on the training data. During fitting, the model automatically tunes the hyperparameters using cross-validation.
The performance of the model is evaluated on the test set using the coefficient of determination (R2 score).
A single prediction is made by passing a new data sample to the predict() method.

This example demonstrates how to use ElasticNetCV for regression tasks. The built-in cross-validation simplifies hyperparameter tuning, while the combination of L1 and L2 regularization can yield sparse solutions, performing feature selection.

The model can handle high-dimensional datasets and provides a convenient way to train a regularized linear regression model without the need for manual hyperparameter tuning.

See Also