Configure SGDClassifier "average" Parameter

The average parameter in scikit-learn’s SGDClassifier controls whether to use averaged stochastic gradient descent (ASGD) instead of standard stochastic gradient descent.

Stochastic Gradient Descent (SGD) is an optimization method used to find the parameters that minimize the loss function. The average parameter enables ASGD, which can improve model stability and generalization.

When average is set to True or a positive integer, the classifier uses ASGD. If False (default), standard SGD is used. When set to an integer > 1, it averages the last n updates.

The default value for average is False.

Common values for average include True, False, and positive integers like 10 or 100, depending on the dataset size and problem complexity.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different average values
average_values = [False, True, 10, 100]
accuracies = []

for avg in average_values:
    sgd = SGDClassifier(loss='log_loss', max_iter=1000, tol=1e-3, random_state=42, average=avg)
    sgd.fit(X_train, y_train)
    y_pred = sgd.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"average={avg}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

average=False, Accuracy: 0.795
average=True, Accuracy: 0.823
average=10, Accuracy: 0.823
average=100, Accuracy: 0.824

The key steps in this example are:

Generate a synthetic binary classification dataset
Split the data into train and test sets
Train SGDClassifier models with different average values
Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting average:

Use average=True for datasets with high noise or when model stability is a concern
Try integer values for average to fine-tune the averaging window
Experiment with different average values and compare performance

Issues to consider:

ASGD may converge slower than standard SGD but often to a better optimum
The optimal average setting depends on the dataset size and problem complexity
Using ASGD may increase computational cost, especially with large datasets

See Also