Configure MLPClassifier "momentum" Parameter

The momentum parameter in scikit-learn’s MLPClassifier controls the contribution of the previous gradient step to the current update.

Momentum is a technique used in neural network training to accelerate convergence and help overcome local minima. It adds a fraction of the previous weight update to the current one, creating a smoothing effect on weight updates.

The default value for momentum in MLPClassifier is 0.9.

In practice, values between 0.5 and 0.99 are commonly used, with higher values potentially leading to faster convergence but also the risk of overshooting optima.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different momentum values
momentum_values = [0.0, 0.5, 0.9, 0.99]
accuracies = []

for m in momentum_values:
    mlp = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000, momentum=m, random_state=42)
    mlp.fit(X_train, y_train)
    y_pred = mlp.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"momentum={m}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

momentum=0.0, Accuracy: 0.885
momentum=0.5, Accuracy: 0.885
momentum=0.9, Accuracy: 0.885
momentum=0.99, Accuracy: 0.885

The key steps in this example are:

Generate a synthetic multi-class classification dataset
Split the data into train and test sets
Train MLPClassifier models with different momentum values
Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting momentum:

Start with the default value of 0.9 and adjust based on model performance
Higher momentum can help escape local minima but may overshoot global minima
Lower momentum may lead to slower convergence but can be more stable

Issues to consider:

The optimal momentum value depends on the specific problem and dataset
Very high momentum (close to 1.0) can cause training instability
Momentum interacts with learning rate, so consider adjusting both together
Monitor training curves to detect oscillations or divergence with high momentum

See Also