The average
parameter in scikit-learn’s SGDClassifier
controls whether to use averaged stochastic gradient descent (ASGD) instead of standard stochastic gradient descent.
Stochastic Gradient Descent (SGD) is an optimization method used to find the parameters that minimize the loss function. The average
parameter enables ASGD, which can improve model stability and generalization.
When average
is set to True or a positive integer, the classifier uses ASGD. If False (default), standard SGD is used. When set to an integer > 1, it averages the last n updates.
The default value for average
is False.
Common values for average
include True, False, and positive integers like 10 or 100, depending on the dataset size and problem complexity.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
n_redundant=5, n_classes=2, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different average values
average_values = [False, True, 10, 100]
accuracies = []
for avg in average_values:
sgd = SGDClassifier(loss='log_loss', max_iter=1000, tol=1e-3, random_state=42, average=avg)
sgd.fit(X_train, y_train)
y_pred = sgd.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"average={avg}, Accuracy: {accuracy:.3f}")
Running the example gives an output like:
average=False, Accuracy: 0.795
average=True, Accuracy: 0.823
average=10, Accuracy: 0.823
average=100, Accuracy: 0.824
The key steps in this example are:
- Generate a synthetic binary classification dataset
- Split the data into train and test sets
- Train
SGDClassifier
models with differentaverage
values - Evaluate the accuracy of each model on the test set
Some tips and heuristics for setting average
:
- Use
average=True
for datasets with high noise or when model stability is a concern - Try integer values for
average
to fine-tune the averaging window - Experiment with different
average
values and compare performance
Issues to consider:
- ASGD may converge slower than standard SGD but often to a better optimum
- The optimal
average
setting depends on the dataset size and problem complexity - Using ASGD may increase computational cost, especially with large datasets