SKLearner Home | About | Contact | Examples

Configure SGDClassifier "l1_ratio" Parameter

The l1_ratio parameter in scikit-learn’s SGDClassifier controls the balance between L1 and L2 regularization in Elastic Net regularization.

Elastic Net combines L1 and L2 penalties to address limitations of using either alone. It can select features like Lasso (L1) while maintaining the regularization properties of Ridge (L2).

The l1_ratio parameter ranges from 0 to 1. A value of 0 corresponds to L2 regularization, 1 to L1 regularization, and values in between represent a mix of both.

The default value for l1_ratio is 0.15, which favors L2 regularization. Common values range from 0.1 to 0.9, depending on the desired balance between sparsity and stability.

import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=5,
                           n_redundant=5, n_repeated=0, n_classes=2,
                           random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different l1_ratio values
l1_ratio_values = [0, 0.25, 0.5, 0.75, 1]
results = []

for l1_ratio in l1_ratio_values:
    sgd = SGDClassifier(loss='log_loss', penalty='elasticnet', l1_ratio=l1_ratio,
                        max_iter=1000, tol=1e-3, random_state=42)
    sgd.fit(X_train, y_train)
    y_pred = sgd.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    non_zero_coefs = np.sum(sgd.coef_ != 0)
    results.append((l1_ratio, accuracy, non_zero_coefs))
    print(f"l1_ratio={l1_ratio}, Accuracy: {accuracy:.3f}, Non-zero coefficients: {non_zero_coefs}")

Running the example gives an output like:

l1_ratio=0, Accuracy: 0.770, Non-zero coefficients: 20
l1_ratio=0.25, Accuracy: 0.625, Non-zero coefficients: 9
l1_ratio=0.5, Accuracy: 0.815, Non-zero coefficients: 8
l1_ratio=0.75, Accuracy: 0.795, Non-zero coefficients: 3
l1_ratio=1, Accuracy: 0.770, Non-zero coefficients: 3

The key steps in this example are:

  1. Generate a synthetic binary classification dataset with informative and noisy features
  2. Split the data into train and test sets
  3. Train SGDClassifier models with different l1_ratio values
  4. Evaluate the accuracy and count non-zero coefficients for each model
  5. Print results to compare the effect of l1_ratio on accuracy and model sparsity

Tips for setting l1_ratio:

Issues to consider:



See Also