The learning_rate
parameter in scikit-learn’s SGDRegressor
controls how quickly the model adapts to the problem.
Stochastic Gradient Descent (SGD) is an optimization algorithm that iteratively adjusts model parameters to minimize the loss function. The learning rate determines the step size at each iteration while moving toward a minimum of the loss function.
The learning_rate
parameter in SGDRegressor
can be set to various options: ‘constant’, ‘optimal’, ‘invscaling’, or ‘adaptive’. Each option affects how the learning rate changes during training.
The default value for learning_rate
is ‘invscaling’. Common choices include ‘constant’ for a fixed learning rate and ‘adaptive’ for an automatically adjusted rate.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different learning_rate options
learning_rates = ['constant', 'optimal', 'invscaling', 'adaptive']
mse_scores = []
for lr in learning_rates:
sgd = SGDRegressor(learning_rate=lr, random_state=42, max_iter=1000)
sgd.fit(X_train, y_train)
y_pred = sgd.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mse_scores.append(mse)
print(f"learning_rate={lr}, MSE: {mse:.3f}")
# Find the best learning rate
best_lr = learning_rates[np.argmin(mse_scores)]
print(f"Best learning_rate: {best_lr}")
Running the example gives an output like:
learning_rate=constant, MSE: 0.010
learning_rate=optimal, MSE: 0.031
learning_rate=invscaling, MSE: 0.010
learning_rate=adaptive, MSE: 0.010
Best learning_rate: invscaling
The key steps in this example are:
- Generate a synthetic regression dataset
- Split the data into train and test sets
- Train
SGDRegressor
models with differentlearning_rate
options - Evaluate the mean squared error of each model on the test set
- Identify the best performing learning rate option
Some tips and heuristics for setting learning_rate
:
- ‘constant’ is suitable when you have a good idea of the optimal learning rate
- ‘optimal’ works well for simple problems but may struggle with complex datasets
- ‘invscaling’ is a good default choice as it gradually decreases the learning rate
- ‘adaptive’ can be effective for datasets where the optimal learning rate changes during training
Issues to consider:
- The optimal learning rate depends on the specific dataset and problem
- Too high a learning rate can cause the model to overshoot the minimum
- Too low a learning rate can result in slow convergence or getting stuck in local minima
- The effectiveness of each option can vary based on the scale of your features and the complexity of the relationship between features and target variable