The l1_ratio
parameter in scikit-learn’s SGDClassifier
controls the balance between L1 and L2 regularization in Elastic Net regularization.
Elastic Net combines L1 and L2 penalties to address limitations of using either alone. It can select features like Lasso (L1) while maintaining the regularization properties of Ridge (L2).
The l1_ratio
parameter ranges from 0 to 1. A value of 0 corresponds to L2 regularization, 1 to L1 regularization, and values in between represent a mix of both.
The default value for l1_ratio
is 0.15, which favors L2 regularization. Common values range from 0.1 to 0.9, depending on the desired balance between sparsity and stability.
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=5,
n_redundant=5, n_repeated=0, n_classes=2,
random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different l1_ratio values
l1_ratio_values = [0, 0.25, 0.5, 0.75, 1]
results = []
for l1_ratio in l1_ratio_values:
sgd = SGDClassifier(loss='log_loss', penalty='elasticnet', l1_ratio=l1_ratio,
max_iter=1000, tol=1e-3, random_state=42)
sgd.fit(X_train, y_train)
y_pred = sgd.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
non_zero_coefs = np.sum(sgd.coef_ != 0)
results.append((l1_ratio, accuracy, non_zero_coefs))
print(f"l1_ratio={l1_ratio}, Accuracy: {accuracy:.3f}, Non-zero coefficients: {non_zero_coefs}")
Running the example gives an output like:
l1_ratio=0, Accuracy: 0.770, Non-zero coefficients: 20
l1_ratio=0.25, Accuracy: 0.625, Non-zero coefficients: 9
l1_ratio=0.5, Accuracy: 0.815, Non-zero coefficients: 8
l1_ratio=0.75, Accuracy: 0.795, Non-zero coefficients: 3
l1_ratio=1, Accuracy: 0.770, Non-zero coefficients: 3
The key steps in this example are:
- Generate a synthetic binary classification dataset with informative and noisy features
- Split the data into train and test sets
- Train
SGDClassifier
models with differentl1_ratio
values - Evaluate the accuracy and count non-zero coefficients for each model
- Print results to compare the effect of
l1_ratio
on accuracy and model sparsity
Tips for setting l1_ratio
:
- Start with the default value of 0.15 and adjust based on your needs for sparsity vs. stability
- Use higher values (closer to 1) for increased sparsity and feature selection
- Use lower values (closer to 0) for more stable solutions and to prevent overfitting
- Consider using cross-validation to find the optimal
l1_ratio
for your specific dataset
Issues to consider:
- The optimal
l1_ratio
depends on the nature of your data and the problem you’re solving - Very high
l1_ratio
values may lead to excessive sparsity, potentially removing important features - Very low
l1_ratio
values may not provide sufficient regularization for high-dimensional data - The effect of
l1_ratio
can vary depending on the scale of your features, so feature scaling may be necessary