SKLearner Home | About | Contact | Examples

Configure MLPClassifier "shuffle" Parameter

The shuffle parameter in scikit-learn’s MLPClassifier controls whether the training data is shuffled at each iteration during training.

Multi-layer Perceptron (MLP) is a type of artificial neural network that learns a non-linear function approximator for classification or regression. The shuffle parameter determines if the order of samples is randomized in each epoch.

Shuffling can help prevent the model from learning spurious patterns related to the order of the training data, potentially improving generalization. However, for some time-series or sequential data, maintaining order might be crucial.

The default value for shuffle is True.

In practice, shuffle=True is commonly used for most datasets, while shuffle=False may be preferred for sequential or time-series data.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with shuffle=True and shuffle=False
mlp_shuffle = MLPClassifier(hidden_layer_sizes=(100,), max_iter=500, random_state=42, shuffle=True)
mlp_no_shuffle = MLPClassifier(hidden_layer_sizes=(100,), max_iter=500, random_state=42, shuffle=False)

mlp_shuffle.fit(X_train, y_train)
mlp_no_shuffle.fit(X_train, y_train)

# Evaluate models
y_pred_shuffle = mlp_shuffle.predict(X_test)
y_pred_no_shuffle = mlp_no_shuffle.predict(X_test)

accuracy_shuffle = accuracy_score(y_test, y_pred_shuffle)
accuracy_no_shuffle = accuracy_score(y_test, y_pred_no_shuffle)

print(f"Accuracy with shuffle=True: {accuracy_shuffle:.3f}")
print(f"Accuracy with shuffle=False: {accuracy_no_shuffle:.3f}")

Running the example gives an output like:

Accuracy with shuffle=True: 0.885
Accuracy with shuffle=False: 0.905

The key steps in this example are:

  1. Generate a synthetic multi-class classification dataset
  2. Split the data into train and test sets
  3. Train two MLPClassifier models, one with shuffle=True and one with shuffle=False
  4. Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting shuffle:

Issues to consider:



See Also