Configure SGDClassifier "eta0" Parameter

The eta0 parameter in scikit-learn’s SGDClassifier sets the initial learning rate for the stochastic gradient descent optimization.

Stochastic Gradient Descent (SGD) is an optimization algorithm that iteratively updates model parameters to minimize the loss function. The eta0 parameter determines the step size at each iteration while moving toward a minimum of the loss function.

A higher eta0 value can lead to faster convergence but may overshoot the minimum, while a lower value provides more precise updates but may result in slower convergence.

The default value for eta0 is 0.01 when using the ‘constant’ learning rate schedule.

In practice, values between 0.001 and 0.1 are commonly used, depending on the specific problem and dataset characteristics.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score
import numpy as np

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different eta0 values
eta0_values = [0.0001, 0.001, 0.01, 0.1, 1.0]
accuracies = []

for eta0 in eta0_values:
    sgd = SGDClassifier(loss='log_loss', eta0=eta0, learning_rate='constant',
                        random_state=42, max_iter=1000)
    sgd.fit(X_train, y_train)
    y_pred = sgd.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"eta0={eta0:.4f}, Accuracy: {accuracy:.3f}")

# Find best eta0
best_eta0 = eta0_values[np.argmax(accuracies)]
print(f"\nBest eta0: {best_eta0:.4f}")

Running the example gives an output like:

eta0=0.0001, Accuracy: 0.815
eta0=0.0010, Accuracy: 0.810
eta0=0.0100, Accuracy: 0.830
eta0=0.1000, Accuracy: 0.755
eta0=1.0000, Accuracy: 0.740

Best eta0: 0.0100

The key steps in this example are:

Generate a synthetic binary classification dataset
Split the data into train and test sets
Train SGDClassifier models with different eta0 values
Evaluate the accuracy of each model on the test set
Identify the best eta0 value based on accuracy

Some tips for setting eta0:

Start with the default value and experiment with values an order of magnitude higher and lower
For larger datasets, smaller eta0 values often work better
If the loss is not decreasing, try reducing eta0
Consider using adaptive learning rate schedules like ‘optimal’ or ‘adaptive’ instead of a constant rate

Issues to consider:

The optimal eta0 depends on the scale of your features and the complexity of the problem
Too high eta0 can cause divergence, while too low can result in slow convergence
The effect of eta0 interacts with other parameters like alpha (regularization strength)
Monitor the model’s convergence using early stopping or validation curves to fine-tune eta0

See Also