Configure SGDClassifier "learning_rate" Parameter

The learning_rate parameter in scikit-learn’s SGDClassifier controls how quickly the model adapts to the training data.

Stochastic Gradient Descent (SGD) is an optimization algorithm that iteratively updates model parameters based on individual training examples. The learning_rate determines the step size at each iteration while moving toward a minimum of the loss function.

A higher learning rate allows faster initial learning but may overshoot the optimal solution, while a lower learning rate provides more precise convergence but may require more iterations to reach the minimum.

The default value for learning_rate is ‘optimal’, which uses a heuristic proposed by Léon Bottou. Common alternatives include ‘constant’, ‘invscaling’, and ‘adaptive’.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score
import numpy as np

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different learning_rate values
learning_rates = ['constant', 'optimal', 'invscaling', 'adaptive']
accuracies = []

for lr in learning_rates:
    sgd = SGDClassifier(learning_rate=lr, eta0=0.01, random_state=42)
    sgd.fit(X_train, y_train)
    y_pred = sgd.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"learning_rate={lr}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

learning_rate=constant, Accuracy: 0.880
learning_rate=optimal, Accuracy: 0.790
learning_rate=invscaling, Accuracy: 0.865
learning_rate=adaptive, Accuracy: 0.865

The key steps in this example are:

Generate a synthetic binary classification dataset
Split the data into train and test sets
Train SGDClassifier models with different learning_rate values
Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting learning_rate:

Start with the default ‘optimal’ and experiment with alternatives if performance is unsatisfactory
Use ‘constant’ with a small eta0 for datasets with many features or sparse data
Try ‘invscaling’ or ‘adaptive’ for non-stationary problems or when dealing with large datasets

Issues to consider:

The optimal learning rate can vary depending on the specific dataset and problem
A learning rate that’s too high may cause the algorithm to diverge
A learning rate that’s too low may result in slow convergence or getting stuck in local minima

See Also