Configure SGDRegressor "n_iter_no_change" Parameter

The n_iter_no_change parameter in scikit-learn’s SGDRegressor controls early stopping in stochastic gradient descent, determining how many iterations to continue without improvement before stopping training.

Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used for fitting linear models. It updates the model’s parameters based on one sample at a time, making it efficient for large datasets.

The n_iter_no_change parameter sets the number of iterations with no improvement to the model’s score on the validation set that will trigger early stopping. This helps prevent overfitting and reduces unnecessary computation.

The default value for n_iter_no_change is 5. In practice, values between 2 and 10 are commonly used, depending on the dataset’s size and complexity.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different n_iter_no_change values
n_iter_no_change_values = [2, 5, 10, 20]
mse_scores = []

for n in n_iter_no_change_values:
    sgd = SGDRegressor(max_iter=1000, tol=1e-3, n_iter_no_change=n, random_state=42)
    sgd.fit(X_train, y_train)
    y_pred = sgd.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"n_iter_no_change={n}, MSE: {mse:.3f}, Iterations: {sgd.n_iter_}")

Running the example gives an output like:

n_iter_no_change=2, MSE: 0.010, Iterations: 10
n_iter_no_change=5, MSE: 0.010, Iterations: 13
n_iter_no_change=10, MSE: 0.010, Iterations: 18
n_iter_no_change=20, MSE: 0.010, Iterations: 28

The key steps in this example are:

Generate a synthetic regression dataset
Split the data into train and test sets
Train SGDRegressor models with different n_iter_no_change values
Evaluate the mean squared error of each model on the test set
Compare the number of iterations and performance for each value

Some tips and heuristics for setting n_iter_no_change:

Start with the default value of 5 and adjust based on convergence behavior
Use smaller values for faster stopping, larger values for more stable convergence
Consider the trade-off between computation time and model performance

Issues to consider:

Too small values may cause premature stopping, leading to underfitting
Too large values may result in unnecessary computation without significant improvement
The optimal value depends on the dataset’s size, complexity, and noise level
Monitor both the number of iterations and the model’s performance to find the best balance

See Also