The momentum parameter in scikit-learn’s MLPClassifier controls the contribution of the previous gradient step to the current update.
Momentum is a technique used in neural network training to accelerate convergence and help overcome local minima. It adds a fraction of the previous weight update to the current one, creating a smoothing effect on weight updates.
The default value for momentum in MLPClassifier is 0.9.
In practice, values between 0.5 and 0.99 are commonly used, with higher values potentially leading to faster convergence but also the risk of overshooting optima.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
n_redundant=5, n_classes=3, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different momentum values
momentum_values = [0.0, 0.5, 0.9, 0.99]
accuracies = []
for m in momentum_values:
mlp = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000, momentum=m, random_state=42)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"momentum={m}, Accuracy: {accuracy:.3f}")
Running the example gives an output like:
momentum=0.0, Accuracy: 0.885
momentum=0.5, Accuracy: 0.885
momentum=0.9, Accuracy: 0.885
momentum=0.99, Accuracy: 0.885
The key steps in this example are:
- Generate a synthetic multi-class classification dataset
- Split the data into train and test sets
- Train
MLPClassifiermodels with differentmomentumvalues - Evaluate the accuracy of each model on the test set
Some tips and heuristics for setting momentum:
- Start with the default value of 0.9 and adjust based on model performance
- Higher momentum can help escape local minima but may overshoot global minima
- Lower momentum may lead to slower convergence but can be more stable
Issues to consider:
- The optimal momentum value depends on the specific problem and dataset
- Very high momentum (close to 1.0) can cause training instability
- Momentum interacts with learning rate, so consider adjusting both together
- Monitor training curves to detect oscillations or divergence with high momentum