SKLearner Home | About | Contact | Examples

Configure MLPClassifier "beta_2" Parameter

The beta_2 parameter in scikit-learn’s MLPClassifier controls the exponential decay rate for the second moment estimates in the Adam optimizer.

Adam (Adaptive Moment Estimation) is an optimization algorithm that computes adaptive learning rates for each parameter. The beta_2 parameter specifically influences how quickly the algorithm forgets past squared gradients.

A higher beta_2 value results in slower decay of the second moment estimates, which can help with handling noisy gradients but may slow down convergence. Conversely, a lower value allows the optimizer to adapt more quickly to changes in the gradient.

The default value for beta_2 is 0.999. In practice, values between 0.9 and 0.999 are commonly used, with 0.999 being a good starting point for most problems.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different beta_2 values
beta_2_values = [0.9, 0.99, 0.999, 0.9999]
accuracies = []

for beta_2 in beta_2_values:
    mlp = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000, random_state=42,
                        solver='adam', beta_2=beta_2)
    mlp.fit(X_train, y_train)
    y_pred = mlp.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"beta_2={beta_2}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

beta_2=0.9, Accuracy: 0.885
beta_2=0.99, Accuracy: 0.900
beta_2=0.999, Accuracy: 0.885
beta_2=0.9999, Accuracy: 0.885

The key steps in this example are:

  1. Generate a synthetic multi-class classification dataset
  2. Split the data into train and test sets
  3. Train MLPClassifier models with different beta_2 values
  4. Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting beta_2:

Issues to consider:



See Also