The eta0
parameter in scikit-learn’s SGDClassifier
sets the initial learning rate for the stochastic gradient descent optimization.
Stochastic Gradient Descent (SGD) is an optimization algorithm that iteratively updates model parameters to minimize the loss function. The eta0
parameter determines the step size at each iteration while moving toward a minimum of the loss function.
A higher eta0
value can lead to faster convergence but may overshoot the minimum, while a lower value provides more precise updates but may result in slower convergence.
The default value for eta0
is 0.01 when using the ‘constant’ learning rate schedule.
In practice, values between 0.001 and 0.1 are commonly used, depending on the specific problem and dataset characteristics.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score
import numpy as np
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
n_redundant=5, n_classes=2, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different eta0 values
eta0_values = [0.0001, 0.001, 0.01, 0.1, 1.0]
accuracies = []
for eta0 in eta0_values:
sgd = SGDClassifier(loss='log_loss', eta0=eta0, learning_rate='constant',
random_state=42, max_iter=1000)
sgd.fit(X_train, y_train)
y_pred = sgd.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"eta0={eta0:.4f}, Accuracy: {accuracy:.3f}")
# Find best eta0
best_eta0 = eta0_values[np.argmax(accuracies)]
print(f"\nBest eta0: {best_eta0:.4f}")
Running the example gives an output like:
eta0=0.0001, Accuracy: 0.815
eta0=0.0010, Accuracy: 0.810
eta0=0.0100, Accuracy: 0.830
eta0=0.1000, Accuracy: 0.755
eta0=1.0000, Accuracy: 0.740
Best eta0: 0.0100
The key steps in this example are:
- Generate a synthetic binary classification dataset
- Split the data into train and test sets
- Train
SGDClassifier
models with differenteta0
values - Evaluate the accuracy of each model on the test set
- Identify the best
eta0
value based on accuracy
Some tips for setting eta0
:
- Start with the default value and experiment with values an order of magnitude higher and lower
- For larger datasets, smaller
eta0
values often work better - If the loss is not decreasing, try reducing
eta0
- Consider using adaptive learning rate schedules like ‘optimal’ or ‘adaptive’ instead of a constant rate
Issues to consider:
- The optimal
eta0
depends on the scale of your features and the complexity of the problem - Too high
eta0
can cause divergence, while too low can result in slow convergence - The effect of
eta0
interacts with other parameters likealpha
(regularization strength) - Monitor the model’s convergence using early stopping or validation curves to fine-tune
eta0