Configure MLPRegressor "beta_2" Parameter

The beta_2 parameter in scikit-learn’s MLPRegressor controls the decay rate for the second moment estimate in the Adam optimizer.

Adam (Adaptive Moment Estimation) is an optimization algorithm used for updating network weights. The beta_2 parameter specifically affects how the optimizer estimates the second moment (uncentered variance) of the gradients.

A higher beta_2 value results in a slower decay of the second moment estimate, which can help smooth out the learning process in the presence of noisy gradients. Conversely, a lower value allows for quicker adaptation to changes in the gradient.

The default value for beta_2 is 0.999. In practice, values between 0.9 and 0.999 are commonly used, with 0.999 being a popular choice for many problems.

import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different beta_2 values
beta_2_values = [0.9, 0.99, 0.999, 0.9999]
mse_scores = []

for beta_2 in beta_2_values:
    mlp = MLPRegressor(hidden_layer_sizes=(100,), max_iter=500, random_state=42, beta_2=beta_2)
    mlp.fit(X_train, y_train)
    y_pred = mlp.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"beta_2={beta_2}, MSE: {mse:.4f}")

Running the example gives an output like:

beta_2=0.9, MSE: 3.3350
beta_2=0.99, MSE: 21.6741
beta_2=0.999, MSE: 139.3109
beta_2=0.9999, MSE: 152.1716

The key steps in this example are:

Generate a synthetic regression dataset with multiple features
Split the data into train and test sets
Train MLPRegressor models with different beta_2 values
Evaluate the mean squared error of each model on the test set

Some tips and heuristics for setting beta_2:

Start with the default value of 0.999 and adjust if needed
Use higher values (closer to 1) for problems with sparse gradients
Lower values may work better for problems with rapidly changing gradients

Issues to consider:

The optimal beta_2 value can depend on the specific problem and dataset
Very high values (>0.999) might slow down convergence
Very low values (<0.9) may lead to unstable training
Consider the interplay between beta_2 and other optimizer parameters like learning rate

See Also