The beta_1
parameter in scikit-learn’s MLPRegressor
controls the exponential decay rate for the first moment estimates in the Adam optimizer.
MLPRegressor
uses the Adam (Adaptive Moment Estimation) optimizer by default, which is an algorithm for first-order gradient-based optimization of stochastic objective functions. The beta_1
parameter influences how quickly the optimizer adapts to changes in the gradient.
beta_1
represents the exponential decay rate for the first moment estimates. A higher value gives more weight to past gradients, while a lower value makes the optimizer more responsive to recent gradients.
The default value for beta_1
is 0.9. In practice, values between 0.9 and 0.999 are commonly used, with 0.9 being a good starting point for most problems.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different beta_1 values
beta_1_values = [0.8, 0.9, 0.95, 0.99]
mse_scores = []
for beta_1 in beta_1_values:
mlp = MLPRegressor(hidden_layer_sizes=(100,), max_iter=1000, random_state=42, beta_1=beta_1)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mse_scores.append(mse)
print(f"beta_1={beta_1}, MSE: {mse:.3f}")
# Find best beta_1
best_beta_1 = beta_1_values[np.argmin(mse_scores)]
print(f"Best beta_1: {best_beta_1}")
Running the example gives an output like:
beta_1=0.8, MSE: 31.643
beta_1=0.9, MSE: 30.530
beta_1=0.95, MSE: 28.340
beta_1=0.99, MSE: 21.502
Best beta_1: 0.99
The key steps in this example are:
- Generate a synthetic regression dataset
- Split the data into train and test sets
- Train
MLPRegressor
models with differentbeta_1
values - Evaluate the mean squared error of each model on the test set
- Identify the best performing
beta_1
value
Some tips and heuristics for setting beta_1
:
- Start with the default value of 0.9 and adjust based on model performance
- Higher values (closer to 1) can help with noisy gradients but may slow down convergence
- Lower values can make the optimizer more responsive but may lead to overshooting optimal values
Issues to consider:
- The optimal
beta_1
value can depend on the specific problem and dataset - Very high values (> 0.99) may cause the optimizer to adapt too slowly
- Very low values (< 0.8) may cause instability in training