The penalty
parameter in scikit-learn’s LogisticRegression
controls the type of regularization applied to the model coefficients.
Logistic Regression is a linear model for binary classification that estimates the probability of an instance belonging to a class. Regularization is used to prevent overfitting by constraining the model coefficients.
The penalty
parameter determines the type of regularization: 'l1'
for Lasso (L1), 'l2'
for Ridge (L2), or 'elasticnet'
for a combination of L1 and L2. L1 regularization leads to sparse coefficients, effectively performing feature selection, while L2 regularization generally results in better predictive performance.
The default value for penalty
is 'l2'
.
In practice, 'l2'
is often used as the default choice, but 'l1'
can be useful when feature selection is desired, and 'elasticnet'
provides a compromise between the two.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
n_redundant=5, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different penalty values
penalty_values = ['l1', 'l2', 'elasticnet']
accuracies = []
sparsities = []
for penalty in penalty_values:
lr = LogisticRegression(penalty=penalty, solver='saga', random_state=42, l1_ratio=0.5)
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
sparsity = (lr.coef_ == 0).mean()
accuracies.append(accuracy)
sparsities.append(sparsity)
print(f"penalty={penalty}, Accuracy: {accuracy:.3f}, Sparsity: {sparsity:.3f}")
Running this example gives an output like:
penalty=l1, Accuracy: 0.800, Sparsity: 0.100
penalty=l2, Accuracy: 0.795, Sparsity: 0.000
penalty=elasticnet, Accuracy: 0.800, Sparsity: 0.100
The key steps in this example are:
- Generate a synthetic binary classification dataset with informative, redundant, and noise features
- Split the data into train and test sets
- Train
LogisticRegression
models with differentpenalty
values - Evaluate the accuracy and sparsity of coefficients for each model
Some tips and heuristics for choosing the penalty
:
- Use
'l2'
as a default, as it generally performs well and is more stable - Consider
'l1'
when you want to perform feature selection and obtain sparse coefficients 'elasticnet'
can be a good compromise, but requires tuning the additionall1_ratio
parameter
Issues to consider:
- L1 regularization may not yield a unique solution if there is high correlation between features
- L2 regularization doesn’t perform feature selection, so the model may include noise features
- Elastic Net requires tuning both the regularization strength and the
l1_ratio
parameter