The momentum
parameter in scikit-learn’s MLPClassifier
controls the contribution of the previous gradient step to the current update.
Momentum is a technique used in neural network training to accelerate convergence and help overcome local minima. It adds a fraction of the previous weight update to the current one, creating a smoothing effect on weight updates.
The default value for momentum
in MLPClassifier
is 0.9.
In practice, values between 0.5 and 0.99 are commonly used, with higher values potentially leading to faster convergence but also the risk of overshooting optima.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
n_redundant=5, n_classes=3, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different momentum values
momentum_values = [0.0, 0.5, 0.9, 0.99]
accuracies = []
for m in momentum_values:
mlp = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000, momentum=m, random_state=42)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"momentum={m}, Accuracy: {accuracy:.3f}")
Running the example gives an output like:
momentum=0.0, Accuracy: 0.885
momentum=0.5, Accuracy: 0.885
momentum=0.9, Accuracy: 0.885
momentum=0.99, Accuracy: 0.885
The key steps in this example are:
- Generate a synthetic multi-class classification dataset
- Split the data into train and test sets
- Train
MLPClassifier
models with differentmomentum
values - Evaluate the accuracy of each model on the test set
Some tips and heuristics for setting momentum
:
- Start with the default value of 0.9 and adjust based on model performance
- Higher momentum can help escape local minima but may overshoot global minima
- Lower momentum may lead to slower convergence but can be more stable
Issues to consider:
- The optimal momentum value depends on the specific problem and dataset
- Very high momentum (close to 1.0) can cause training instability
- Momentum interacts with learning rate, so consider adjusting both together
- Monitor training curves to detect oscillations or divergence with high momentum