Configure MLPRegressor "beta_1" Parameter

The beta_1 parameter in scikit-learn’s MLPRegressor controls the exponential decay rate for the first moment estimates in the Adam optimizer.

MLPRegressor uses the Adam (Adaptive Moment Estimation) optimizer by default, which is an algorithm for first-order gradient-based optimization of stochastic objective functions. The beta_1 parameter influences how quickly the optimizer adapts to changes in the gradient.

beta_1 represents the exponential decay rate for the first moment estimates. A higher value gives more weight to past gradients, while a lower value makes the optimizer more responsive to recent gradients.

The default value for beta_1 is 0.9. In practice, values between 0.9 and 0.999 are commonly used, with 0.9 being a good starting point for most problems.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different beta_1 values
beta_1_values = [0.8, 0.9, 0.95, 0.99]
mse_scores = []

for beta_1 in beta_1_values:
    mlp = MLPRegressor(hidden_layer_sizes=(100,), max_iter=1000, random_state=42, beta_1=beta_1)
    mlp.fit(X_train, y_train)
    y_pred = mlp.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"beta_1={beta_1}, MSE: {mse:.3f}")

# Find best beta_1
best_beta_1 = beta_1_values[np.argmin(mse_scores)]
print(f"Best beta_1: {best_beta_1}")

Running the example gives an output like:

beta_1=0.8, MSE: 31.643
beta_1=0.9, MSE: 30.530
beta_1=0.95, MSE: 28.340
beta_1=0.99, MSE: 21.502
Best beta_1: 0.99

The key steps in this example are:

Generate a synthetic regression dataset
Split the data into train and test sets
Train MLPRegressor models with different beta_1 values
Evaluate the mean squared error of each model on the test set
Identify the best performing beta_1 value

Some tips and heuristics for setting beta_1:

Start with the default value of 0.9 and adjust based on model performance
Higher values (closer to 1) can help with noisy gradients but may slow down convergence
Lower values can make the optimizer more responsive but may lead to overshooting optimal values

Issues to consider:

The optimal beta_1 value can depend on the specific problem and dataset
Very high values (> 0.99) may cause the optimizer to adapt too slowly
Very low values (< 0.8) may cause instability in training

See Also