SKLearner Home | About | Contact | Examples

Configure MLPRegressor "shuffle" Parameter

The shuffle parameter in scikit-learn’s MLPRegressor determines whether to shuffle the training data in each iteration during training.

Multi-layer Perceptron (MLP) is a type of artificial neural network that learns a non-linear function approximator for regression. It uses backpropagation for training and can learn complex non-linear relationships in data.

The shuffle parameter affects how the model processes training data in each epoch. When set to True, it randomizes the order of training samples, which can help prevent the model from memorizing the order of examples and potentially improve generalization.

By default, shuffle is set to True. Common configurations include True for most cases, and False when preserving the order of samples is important or when working with time series data.

from sklearn.neural_network import MLPRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different shuffle values
shuffle_values = [True, False]
mse_scores = []

for shuffle in shuffle_values:
    mlp = MLPRegressor(hidden_layer_sizes=(100,), max_iter=500, random_state=42, shuffle=shuffle)
    mlp.fit(X_train, y_train)
    y_pred = mlp.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"shuffle={shuffle}, MSE: {mse:.3f}")

# Compare performance
print(f"Percentage difference: {(mse_scores[1] - mse_scores[0]) / mse_scores[0] * 100:.2f}%")

Running the example gives an output like:

shuffle=True, MSE: 139.311
shuffle=False, MSE: 127.190
Percentage difference: -8.70%

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Train MLPRegressor models with different shuffle values
  4. Evaluate the mean squared error (MSE) of each model on the test set
  5. Compare the performance difference between shuffled and non-shuffled training

Some tips and heuristics for setting shuffle:

Issues to consider:



See Also