SKLearner Home | About | Contact | Examples

Configure SGDClassifier "penalty" Parameter

The penalty parameter in scikit-learn’s SGDClassifier determines the type of regularization applied to the model during training.

Stochastic Gradient Descent (SGD) is an efficient method for training linear classifiers, particularly useful for large-scale learning. The penalty parameter controls the regularization term, which helps prevent overfitting.

The penalty parameter affects the model’s ability to generalize by adding a penalty term to the loss function, discouraging complex models. Different penalties lead to different types of regularization.

The default value for penalty is ’l2’. Common options include ’l2’ (Ridge), ’l1’ (Lasso), and ’elasticnet’ (combination of L1 and L2).

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score
import numpy as np

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different penalty options
penalties = ['l2', 'l1', 'elasticnet']
accuracies = []

for penalty in penalties:
    if penalty == 'elasticnet':
        sgd = SGDClassifier(penalty=penalty, l1_ratio=0.5, random_state=42)
    else:
        sgd = SGDClassifier(penalty=penalty, random_state=42)

    sgd.fit(X_train, y_train)
    y_pred = sgd.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"Penalty: {penalty}, Accuracy: {accuracy:.3f}")

# Compare feature importance (absolute values of coefficients)
for penalty, sgd in zip(penalties, [SGDClassifier(penalty=p, random_state=42).fit(X_train, y_train) for p in penalties]):
    coef_abs = np.abs(sgd.coef_[0])
    print(f"\nPenalty: {penalty}")
    print(f"Number of non-zero features: {np.sum(coef_abs > 1e-5)}")
    print(f"Top 5 feature importances: {coef_abs.argsort()[-5:][::-1]}")

Running the example gives an output like:

Penalty: l2, Accuracy: 0.770
Penalty: l1, Accuracy: 0.775
Penalty: elasticnet, Accuracy: 0.775

Penalty: l2
Number of non-zero features: 20
Top 5 feature importances: [11  2 17 14 18]

Penalty: l1
Number of non-zero features: 10
Top 5 feature importances: [11 14 17  2 15]

Penalty: elasticnet
Number of non-zero features: 14
Top 5 feature importances: [11 14 15  3  2]

The key steps in this example are:

  1. Generate a synthetic binary classification dataset with informative and noise features
  2. Split the data into train and test sets
  3. Train SGDClassifier models with different penalty values
  4. Evaluate the accuracy of each model on the test set
  5. Compare the number of non-zero features and top feature importances for each penalty

Some tips and heuristics for setting the penalty parameter:

Issues to consider:



See Also