ElasticNetCV is a linear regression model with built-in cross-validation for tuning the hyperparameters. It combines both L1 and L2 regularization, making it useful for high-dimensional datasets where feature selection is desired.
The key hyperparameters of ElasticNetCV
include alpha
(regularization strength), l1_ratio
(mix of L1/L2 regularization), and cv
(number of folds for cross-validation).
The algorithm is appropriate for regression problems.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNetCV
from sklearn.metrics import r2_score
# generate regression dataset
X, y = make_regression(n_samples=100, n_features=10, noise=0.5, random_state=1)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create model
model = ElasticNetCV(cv=5)
# fit model
model.fit(X_train, y_train)
# evaluate model
yhat = model.predict(X_test)
r2 = r2_score(y_test, yhat)
print('R2: %.3f' % r2)
# make a prediction
row = [[1.91783824, 0.39869027, 0.35794608, -0.86778928, 1.70660296, 1.26799878, 0.18912945, -0.14618715, 0.39220467, 1.49778873]]
yhat = model.predict(row)
print('Predicted: %.3f' % yhat[0])
Running the example gives an output like:
R2: 0.986
Predicted: 197.776
The steps are as follows:
A synthetic regression dataset is generated using
make_regression()
. The dataset is split into training and test sets usingtrain_test_split()
.An
ElasticNetCV
model is instantiated with default hyperparameters, except for setting the number of cross-validation folds to 5.The model is fit on the training data. During fitting, the model automatically tunes the hyperparameters using cross-validation.
The performance of the model is evaluated on the test set using the coefficient of determination (R2 score).
A single prediction is made by passing a new data sample to the
predict()
method.
This example demonstrates how to use ElasticNetCV
for regression tasks. The built-in cross-validation simplifies hyperparameter tuning, while the combination of L1 and L2 regularization can yield sparse solutions, performing feature selection.
The model can handle high-dimensional datasets and provides a convenient way to train a regularized linear regression model without the need for manual hyperparameter tuning.