SKLearner Home | About | Contact | Examples

Configure SGDClassifier "alpha" Parameter

The alpha parameter in scikit-learn’s SGDClassifier controls the regularization strength, which helps prevent overfitting.

SGDClassifier uses stochastic gradient descent for optimization, making it efficient for large-scale learning. The alpha parameter determines the weight of the regularization term in the loss function.

Higher alpha values increase regularization, potentially reducing overfitting but risking underfitting. Lower values decrease regularization, potentially allowing for more complex models but risking overfitting.

The default value for alpha is 0.0001. In practice, values are often tuned in the range of 1e-5 to 1.0, depending on the dataset and problem complexity.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score
import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different alpha values
alpha_values = [1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1.0]
accuracies = []

for alpha in alpha_values:
    sgd = SGDClassifier(alpha=alpha, random_state=42, max_iter=1000, tol=1e-3)
    sgd.fit(X_train, y_train)
    y_pred = sgd.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"alpha={alpha:.5f}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

alpha=0.00001, Accuracy: 0.765
alpha=0.00010, Accuracy: 0.770
alpha=0.00100, Accuracy: 0.760
alpha=0.01000, Accuracy: 0.785
alpha=0.10000, Accuracy: 0.805
alpha=1.00000, Accuracy: 0.795

The key steps in this example are:

  1. Generate a synthetic binary classification dataset with informative and noise features
  2. Split the data into train and test sets
  3. Train SGDClassifier models with different alpha values
  4. Evaluate the accuracy of each model on the test set

Some tips for setting the alpha parameter:

Issues to consider:



See Also