The average
parameter in scikit-learn’s SGDRegressor
controls whether to use averaged stochastic gradient descent.
Stochastic Gradient Descent (SGD) is an optimization algorithm used for fitting linear models. The SGDRegressor
implements a plain stochastic gradient descent learning routine which supports different loss functions and penalties for regression.
The average
parameter determines whether to compute averaged SGD weights and store the result in the coef_
attribute. When set to True, it can often improve the stability and performance of the model.
By default, average
is set to False. Common values are False for no averaging, True for equal weight averaging, or an integer > 1 for weighted averaging over that many iterations.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different average values
average_values = [False, True, 10]
mse_scores = []
for avg in average_values:
sgd = SGDRegressor(max_iter=1000, tol=1e-3, average=avg, random_state=42)
sgd.fit(X_train, y_train)
y_pred = sgd.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mse_scores.append(mse)
print(f"average={avg}, MSE: {mse:.3f}")
# Find best average value
best_avg = average_values[np.argmin(mse_scores)]
print(f"Best average value: {best_avg}")
Running the example gives an output like:
average=False, MSE: 0.012
average=True, MSE: 49.879
average=10, MSE: 47.799
Best average value: False
The key steps in this example are:
- Generate a synthetic regression dataset
- Split the data into train and test sets
- Train
SGDRegressor
models with differentaverage
values - Evaluate the mean squared error of each model on the test set
- Identify the best
average
value based on lowest MSE
Some tips and heuristics for setting average
:
- Start with False (no averaging) as a baseline
- Try True for equal weight averaging, which often improves stability
- Experiment with integer values > 1 for weighted averaging over recent iterations
- Use cross-validation to find the optimal
average
value for your specific dataset
Issues to consider:
- Averaging can slow down convergence initially but may lead to better final performance
- The optimal
average
value depends on the specific dataset and problem - Using averaging may increase computational cost, especially with large datasets
- Consider the trade-off between model stability and adaptability to recent data points