SKLearner Home | About | Contact | Examples

Configure SGDRegressor "early_stopping" Parameter

The early_stopping parameter in scikit-learn’s SGDRegressor determines whether to use early stopping to terminate training when validation score is not improving.

Stochastic Gradient Descent (SGD) is an iterative optimization algorithm. Early stopping can prevent overfitting by halting training when the model’s performance on a validation set stops improving.

When early_stopping is set to True, the algorithm uses a fraction of the training data as a validation set. It stops training if the validation score doesn’t improve for a number of consecutive epochs.

The default value for early_stopping is False. Common values are True with default settings, or True with custom validation_fraction and n_iter_no_change.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=10000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different early_stopping configurations
configs = [
    {'early_stopping': False},
    {'early_stopping': True},
    {'early_stopping': True, 'validation_fraction': 0.1, 'n_iter_no_change': 5}
]

for config in configs:
    sgd = SGDRegressor(max_iter=1000, tol=1e-3, random_state=42, **config)
    sgd.fit(X_train, y_train)
    y_pred = sgd.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    print(f"Config: {config}")
    print(f"Number of iterations: {sgd.n_iter_}")
    print(f"MSE: {mse:.4f}\n")

Running the example gives an output like:

Config: {'early_stopping': False}
Number of iterations: 7
MSE: 0.0106

Config: {'early_stopping': True}
Number of iterations: 6
MSE: 0.0107

Config: {'early_stopping': True, 'validation_fraction': 0.1, 'n_iter_no_change': 5}
Number of iterations: 6
MSE: 0.0107

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Create SGDRegressor instances with different early_stopping configurations
  4. Train models and evaluate performance on the test set
  5. Compare the number of iterations and mean squared error for each configuration

Tips for setting early_stopping:

Issues to consider:



See Also