The momentum
parameter in scikit-learn’s MLPRegressor
controls the contribution of the previous gradient step to the current update.
MLPRegressor
is a multi-layer perceptron regressor that uses backpropagation for training. It’s suitable for modeling non-linear relationships in regression tasks.
Momentum helps accelerate gradients in the relevant direction and dampens oscillations. It can improve convergence speed and help overcome local optima.
The default value for momentum
is 0.9. Common values range from 0.0 (no momentum) to 0.99, with 0.9 being a popular choice.
from sklearn.neural_network import MLPRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different momentum values
momentum_values = [0.0, 0.5, 0.9, 0.99]
mse_scores = []
for m in momentum_values:
mlp = MLPRegressor(hidden_layer_sizes=(100,), max_iter=1000, momentum=m, random_state=42)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mse_scores.append(mse)
print(f"momentum={m}, MSE: {mse:.3f}")
# Find best momentum value
best_momentum = momentum_values[np.argmin(mse_scores)]
print(f"Best momentum value: {best_momentum}")
Running the example gives an output like:
momentum=0.0, MSE: 30.530
momentum=0.5, MSE: 30.530
momentum=0.9, MSE: 30.530
momentum=0.99, MSE: 30.530
Best momentum value: 0.0
The key steps in this example are:
- Generate a synthetic regression dataset
- Split the data into train and test sets
- Train
MLPRegressor
models with differentmomentum
values - Evaluate the mean squared error (MSE) of each model on the test set
- Identify the best performing momentum value
Some tips and heuristics for setting momentum
:
- Start with the default value of 0.9 and adjust based on model performance
- Lower values (0.0-0.5) can be useful for simpler problems or when overfitting occurs
- Higher values (0.9-0.99) often work well for complex problems with many local optima
Issues to consider:
- The optimal momentum value depends on the specific dataset and problem complexity
- Too high momentum can cause the model to overshoot optimal weights
- Too low momentum may result in slow convergence or getting stuck in local optima
- Momentum should be tuned in conjunction with learning rate for best results