SKLearner Home | About | Contact | Examples

Configure SGDRegressor "shuffle" Parameter

The shuffle parameter in scikit-learn’s SGDRegressor determines whether the training data is shuffled before each epoch during the fitting process.

Stochastic Gradient Descent (SGD) is an optimization algorithm used to find the parameters that minimize the loss function of a model. It processes one training sample at a time, making it efficient for large datasets.

When shuffle is set to True, the order of the training samples is randomized before each epoch. This can help prevent the model from learning spurious patterns based on the order of the data and often leads to faster convergence.

The default value for shuffle is True. Common settings are True for most cases, and False when preserving the order of samples is necessary or when working with time series data.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different shuffle values
shuffle_values = [True, False]
mse_scores = []

for shuffle in shuffle_values:
    sgd = SGDRegressor(max_iter=1000, tol=1e-3, random_state=42, shuffle=shuffle)
    sgd.fit(X_train, y_train)
    y_pred = sgd.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"shuffle={shuffle}, MSE: {mse:.3f}")

# Compare performance
print(f"Improvement: {(mse_scores[1] - mse_scores[0]) / mse_scores[1] * 100:.2f}%")

Running the example gives an output like:

shuffle=True, MSE: 0.010
shuffle=False, MSE: 0.010
Improvement: 0.86%

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Train SGDRegressor models with different shuffle values
  4. Evaluate the mean squared error (MSE) of each model on the test set
  5. Compare the performance improvement of shuffling vs. not shuffling

Some tips and heuristics for setting shuffle:

Issues to consider:



See Also