The learning_rate
parameter in scikit-learn’s SGDClassifier
controls how quickly the model adapts to the training data.
Stochastic Gradient Descent (SGD) is an optimization algorithm that iteratively updates model parameters based on individual training examples. The learning_rate
determines the step size at each iteration while moving toward a minimum of the loss function.
A higher learning rate allows faster initial learning but may overshoot the optimal solution, while a lower learning rate provides more precise convergence but may require more iterations to reach the minimum.
The default value for learning_rate
is ‘optimal’, which uses a heuristic proposed by Léon Bottou. Common alternatives include ‘constant’, ‘invscaling’, and ‘adaptive’.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score
import numpy as np
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different learning_rate values
learning_rates = ['constant', 'optimal', 'invscaling', 'adaptive']
accuracies = []
for lr in learning_rates:
sgd = SGDClassifier(learning_rate=lr, eta0=0.01, random_state=42)
sgd.fit(X_train, y_train)
y_pred = sgd.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"learning_rate={lr}, Accuracy: {accuracy:.3f}")
Running the example gives an output like:
learning_rate=constant, Accuracy: 0.880
learning_rate=optimal, Accuracy: 0.790
learning_rate=invscaling, Accuracy: 0.865
learning_rate=adaptive, Accuracy: 0.865
The key steps in this example are:
- Generate a synthetic binary classification dataset
- Split the data into train and test sets
- Train
SGDClassifier
models with differentlearning_rate
values - Evaluate the accuracy of each model on the test set
Some tips and heuristics for setting learning_rate
:
- Start with the default ‘optimal’ and experiment with alternatives if performance is unsatisfactory
- Use ‘constant’ with a small
eta0
for datasets with many features or sparse data - Try ‘invscaling’ or ‘adaptive’ for non-stationary problems or when dealing with large datasets
Issues to consider:
- The optimal learning rate can vary depending on the specific dataset and problem
- A learning rate that’s too high may cause the algorithm to diverge
- A learning rate that’s too low may result in slow convergence or getting stuck in local minima