Logistic Regression with built-in cross-validation (LogisticRegressionCV
) is a powerful algorithm for binary and multi-class classification tasks. It combines logistic regression with cross-validation to select the best regularization strength.
The key hyperparameters of LogisticRegressionCV
include Cs
(range of regularization strength values), cv
(number of cross-validation folds), and penalty
(type of regularization). This algorithm is suitable for classification problems where model selection and regularization are important for preventing overfitting and ensuring model robustness.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegressionCV
from sklearn.metrics import accuracy_score
# generate binary classification dataset
X, y = make_classification(n_samples=100, n_features=5, n_classes=2, random_state=1)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create model with cross-validation
model = LogisticRegressionCV(Cs=10, cv=5, penalty='l2')
# fit model
model.fit(X_train, y_train)
# evaluate model
yhat = model.predict(X_test)
acc = accuracy_score(y_test, yhat)
print('Accuracy: %.3f' % acc)
# make a prediction
row = [[-1.10325445, -0.49821356, -0.05962247, -0.89224592, -0.70158632]]
yhat = model.predict(row)
print('Predicted: %d' % yhat[0])
Running the example gives an output like:
Accuracy: 0.950
Predicted: 0
The steps are as follows:
First, a synthetic binary classification dataset is generated using the
make_classification()
function. This creates a dataset with a specified number of samples (n_samples
), classes (n_classes
), and a fixed random seed (random_state
) for reproducibility. The dataset is split into training and test sets usingtrain_test_split()
.Next, a
LogisticRegressionCV
model is instantiated with the following hyperparameters:Cs=10
to test 10 different values for regularization strength,cv=5
for 5-fold cross-validation, andpenalty='l2'
for L2 regularization. The model is then fit on the training data using thefit()
method.The performance of the model is evaluated by comparing the predictions (
yhat
) to the actual values (y_test
) using the accuracy score metric.A single prediction can be made by passing a new data sample to the
predict()
method.
This example demonstrates how to effectively use LogisticRegressionCV
for binary classification tasks. It highlights the model’s ability to automatically perform cross-validation and select the optimal regularization strength, ensuring robust and accurate predictions.