Configure SGDClassifier "early_stopping" Parameter

The early_stopping parameter in scikit-learn’s SGDClassifier determines whether to use early stopping to terminate training when validation scores stop improving.

Stochastic Gradient Descent (SGD) is an efficient method for fitting linear classifiers, but it can be challenging to determine the optimal number of iterations. Early stopping helps prevent overfitting by monitoring the model’s performance on a validation set.

When early_stopping is set to True, the algorithm uses a portion of the training data as a validation set. It stops training when the validation score doesn’t improve for a number of consecutive epochs.

The default value for early_stopping is False. When enabled, a common setting is True with a validation fraction of 0.1 to 0.2.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=2, random_state=42)

# Split into train, validation, and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

# Train without early stopping
sgd_no_early = SGDClassifier(max_iter=1000, random_state=42)
sgd_no_early.fit(X_train, y_train)

# Train with early stopping
sgd_early = SGDClassifier(early_stopping=True, validation_fraction=0.2,
                          n_iter_no_change=5, max_iter=1000, random_state=42)
sgd_early.fit(X_train, y_train)

# Evaluate models
print(f"No early stopping - iterations: {sgd_no_early.n_iter_}, "
      f"accuracy: {accuracy_score(y_test, sgd_no_early.predict(X_test)):.3f}")
print(f"With early stopping - iterations: {sgd_early.n_iter_}, "
      f"accuracy: {accuracy_score(y_test, sgd_early.predict(X_test)):.3f}")

Running the example gives an output like:

No early stopping - iterations: 125, accuracy: 0.761
With early stopping - iterations: 11, accuracy: 0.769

The key steps in this example are:

Generate a synthetic binary classification dataset
Split the data into train, validation, and test sets
Train SGDClassifier models with and without early stopping
Compare the number of iterations and final accuracy of both models

Tips for using early_stopping:

Enable early stopping for large datasets or when the optimal number of iterations is unknown
Set validation_fraction between 0.1 and 0.2 to balance between training data size and validation reliability
Monitor validation scores during training to ensure the model is improving

Issues to consider:

Early stopping may reduce training time but could potentially stop before reaching optimal performance
It may affect model convergence, especially with small validation sets
Interact with other parameters like max_iter and tol to fine-tune the stopping criteria

See Also