Configure MLPClassifier "beta_2" Parameter

The beta_2 parameter in scikit-learn’s MLPClassifier controls the exponential decay rate for the second moment estimates in the Adam optimizer.

Adam (Adaptive Moment Estimation) is an optimization algorithm that computes adaptive learning rates for each parameter. The beta_2 parameter specifically influences how quickly the algorithm forgets past squared gradients.

A higher beta_2 value results in slower decay of the second moment estimates, which can help with handling noisy gradients but may slow down convergence. Conversely, a lower value allows the optimizer to adapt more quickly to changes in the gradient.

The default value for beta_2 is 0.999. In practice, values between 0.9 and 0.999 are commonly used, with 0.999 being a good starting point for most problems.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different beta_2 values
beta_2_values = [0.9, 0.99, 0.999, 0.9999]
accuracies = []

for beta_2 in beta_2_values:
    mlp = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000, random_state=42,
                        solver='adam', beta_2=beta_2)
    mlp.fit(X_train, y_train)
    y_pred = mlp.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"beta_2={beta_2}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

beta_2=0.9, Accuracy: 0.885
beta_2=0.99, Accuracy: 0.900
beta_2=0.999, Accuracy: 0.885
beta_2=0.9999, Accuracy: 0.885

The key steps in this example are:

Generate a synthetic multi-class classification dataset
Split the data into train and test sets
Train MLPClassifier models with different beta_2 values
Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting beta_2:

Start with the default value of 0.999 and adjust if necessary
Lower values (e.g., 0.9) can be beneficial for problems with rapidly changing gradients
Higher values (e.g., 0.9999) may help with very noisy gradients

Issues to consider:

The optimal beta_2 value depends on the specific problem and dataset
Very high values close to 1 may cause the optimizer to adapt too slowly
Very low values may cause the optimizer to be too sensitive to recent gradients
Consider the interplay between beta_2 and other optimizer parameters like learning rate

See Also