Configure SGDRegressor "eta0" Parameter

The eta0 parameter in scikit-learn’s SGDRegressor sets the initial learning rate for the model’s gradient descent optimization.

Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used for fitting linear models. It updates the model’s parameters based on the gradient of the loss function with respect to a single training example at each iteration.

The eta0 parameter controls the step size taken during each update. A larger value can lead to faster initial convergence but may overshoot the optimal solution, while a smaller value provides more precise updates but may require more iterations to converge.

The default value for eta0 is 0.01.

In practice, values between 0.1 and 0.0001 are commonly used, depending on the specific problem and dataset characteristics.

import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different eta0 values
eta0_values = [0.1, 0.01, 0.001, 0.0001]
mse_scores = []

for eta0 in eta0_values:
    sgd = SGDRegressor(eta0=eta0, random_state=42, max_iter=1000, tol=1e-3)
    sgd.fit(X_train, y_train)
    y_pred = sgd.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"eta0={eta0}, MSE: {mse:.3f}")

Running the example gives an output like:

eta0=0.1, MSE: 0.010
eta0=0.01, MSE: 0.010
eta0=0.001, MSE: 0.026
eta0=0.0001, MSE: 22.820

The key steps in this example are:

Generate a synthetic regression dataset
Split the data into train and test sets
Train SGDRegressor models with different eta0 values
Evaluate the mean squared error of each model on the test set

Some tips and heuristics for setting eta0:

Start with the default value of 0.01 and adjust based on model performance
Use larger values (e.g., 0.1) for faster initial convergence on simple problems
Use smaller values (e.g., 0.001 or 0.0001) for more complex problems or when fine-tuning is needed
Consider using adaptive learning rate schedules (e.g., ‘optimal’ or ‘invscaling’ learning_rate)

Issues to consider:

Too large eta0 can cause overshooting and unstable convergence
Too small eta0 may result in slow convergence or getting stuck in local optima
The optimal eta0 depends on the scale of the features and the complexity of the problem
Consider combining eta0 tuning with feature scaling for better results

See Also