Configure LogisticRegression "penalty" Parameter

The penalty parameter in scikit-learn’s LogisticRegression controls the type of regularization applied to the model coefficients.

Logistic Regression is a linear model for binary classification that estimates the probability of an instance belonging to a class. Regularization is used to prevent overfitting by constraining the model coefficients.

The penalty parameter determines the type of regularization: 'l1' for Lasso (L1), 'l2' for Ridge (L2), or 'elasticnet' for a combination of L1 and L2. L1 regularization leads to sparse coefficients, effectively performing feature selection, while L2 regularization generally results in better predictive performance.

The default value for penalty is 'l2'.

In practice, 'l2' is often used as the default choice, but 'l1' can be useful when feature selection is desired, and 'elasticnet' provides a compromise between the two.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different penalty values
penalty_values = ['l1', 'l2', 'elasticnet']
accuracies = []
sparsities = []

for penalty in penalty_values:
    lr = LogisticRegression(penalty=penalty, solver='saga', random_state=42, l1_ratio=0.5)
    lr.fit(X_train, y_train)
    y_pred = lr.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    sparsity = (lr.coef_ == 0).mean()
    accuracies.append(accuracy)
    sparsities.append(sparsity)
    print(f"penalty={penalty}, Accuracy: {accuracy:.3f}, Sparsity: {sparsity:.3f}")

Running this example gives an output like:

penalty=l1, Accuracy: 0.800, Sparsity: 0.100
penalty=l2, Accuracy: 0.795, Sparsity: 0.000
penalty=elasticnet, Accuracy: 0.800, Sparsity: 0.100

The key steps in this example are:

Generate a synthetic binary classification dataset with informative, redundant, and noise features
Split the data into train and test sets
Train LogisticRegression models with different penalty values
Evaluate the accuracy and sparsity of coefficients for each model

Some tips and heuristics for choosing the penalty:

Use 'l2' as a default, as it generally performs well and is more stable
Consider 'l1' when you want to perform feature selection and obtain sparse coefficients
'elasticnet' can be a good compromise, but requires tuning the additional l1_ratio parameter

Issues to consider:

L1 regularization may not yield a unique solution if there is high correlation between features
L2 regularization doesn’t perform feature selection, so the model may include noise features
Elastic Net requires tuning both the regularization strength and the l1_ratio parameter

See Also