SKLearner Home | About | Contact | Examples

Configure MLPRegressor "nesterovs_momentum" Parameter

The nesterovs_momentum parameter in scikit-learn’s MLPRegressor controls whether to use Nesterov’s momentum in the optimization process.

Nesterov’s momentum is an optimization technique that helps accelerate gradient descent, particularly for high-curvature loss functions. It modifies the traditional momentum method by evaluating the gradient at the “looked-ahead” position.

The nesterovs_momentum parameter is a boolean that determines whether to use Nesterov’s momentum (True) or classical momentum (False) during training.

By default, nesterovs_momentum is set to True in MLPRegressor. Common alternatives include setting it to False to use classical momentum or disabling momentum altogether by setting momentum=0.0.

from sklearn.neural_network import MLPRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import time

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with Nesterov's momentum
start_time = time.time()
mlp_nesterov = MLPRegressor(hidden_layer_sizes=(100, 50), max_iter=1000, random_state=42)
mlp_nesterov.fit(X_train, y_train)
nesterov_time = time.time() - start_time
y_pred_nesterov = mlp_nesterov.predict(X_test)
mse_nesterov = mean_squared_error(y_test, y_pred_nesterov)

# Train without Nesterov's momentum
start_time = time.time()
mlp_classic = MLPRegressor(hidden_layer_sizes=(100, 50), max_iter=1000,
                           nesterovs_momentum=False, random_state=42)
mlp_classic.fit(X_train, y_train)
classic_time = time.time() - start_time
y_pred_classic = mlp_classic.predict(X_test)
mse_classic = mean_squared_error(y_test, y_pred_classic)

print(f"Nesterov's Momentum - Time: {nesterov_time:.2f}s, MSE: {mse_nesterov:.4f}")
print(f"Classic Momentum - Time: {classic_time:.2f}s, MSE: {mse_classic:.4f}")

Running the example gives an output like:

Nesterov's Momentum - Time: 3.44s, MSE: 35.5553
Classic Momentum - Time: 3.47s, MSE: 35.5553

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Train two MLPRegressor models, one with Nesterov’s momentum and one without
  4. Compare training time and mean squared error for both models

Some tips for using Nesterov’s momentum:

Issues to consider:



See Also