Configure MLPClassifier "batch_size" Parameter

The batch_size parameter in scikit-learn’s MLPClassifier controls the number of samples used in each iteration during training.

MLPClassifier implements a multi-layer perceptron (MLP) neural network for classification tasks. It uses backpropagation for training and can learn non-linear decision boundaries.

The batch_size parameter determines how many samples are used to estimate the gradient at each step. It affects both the speed of training and the quality of the model’s convergence.

By default, batch_size is set to ‘auto’, which uses the minimum of 200 or the number of samples. Common values range from 32 to 256, depending on dataset size and available memory.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
import time

# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different batch_size values
batch_sizes = ['auto', 32, 64, 128, 256]
results = []

for batch_size in batch_sizes:
    start_time = time.time()
    mlp = MLPClassifier(hidden_layer_sizes=(100, 50), max_iter=100, random_state=42,
                        batch_size=batch_size)
    mlp.fit(X_train, y_train)
    train_time = time.time() - start_time

    y_pred = mlp.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    results.append((batch_size, accuracy, train_time))
    print(f"batch_size={batch_size}, Accuracy: {accuracy:.3f}, Training time: {train_time:.2f}s")

Running the example gives an output like:

batch_size=auto, Accuracy: 0.939, Training time: 3.86s
batch_size=32, Accuracy: 0.945, Training time: 10.40s
batch_size=64, Accuracy: 0.942, Training time: 7.37s
batch_size=128, Accuracy: 0.945, Training time: 5.56s
batch_size=256, Accuracy: 0.945, Training time: 3.73s

The key steps in this example are:

Generate a synthetic multi-class classification dataset
Split the data into train and test sets
Train MLPClassifier models with different batch_size values
Measure training time and evaluate accuracy for each model
Compare performance across different batch sizes

Tips and heuristics for setting batch_size:

Smaller batch sizes often lead to faster initial convergence but can be noisier
Larger batch sizes provide more stable gradient estimates but require more memory
For small datasets, consider using the full batch (batch_size=‘auto’)
Experiment with powers of 2 (32, 64, 128, 256) for efficient computation on most hardware

Issues to consider:

Very small batch sizes may lead to unstable training and poor generalization
Very large batch sizes may cause the model to converge to sharp minimums and generalize poorly
The optimal batch size often depends on the specific dataset and model architecture
There’s often a trade-off between training speed and model performance

See Also