The beta_2
parameter in scikit-learn’s MLPClassifier
controls the exponential decay rate for the second moment estimates in the Adam optimizer.
Adam (Adaptive Moment Estimation) is an optimization algorithm that computes adaptive learning rates for each parameter. The beta_2
parameter specifically influences how quickly the algorithm forgets past squared gradients.
A higher beta_2
value results in slower decay of the second moment estimates, which can help with handling noisy gradients but may slow down convergence. Conversely, a lower value allows the optimizer to adapt more quickly to changes in the gradient.
The default value for beta_2
is 0.999. In practice, values between 0.9 and 0.999 are commonly used, with 0.999 being a good starting point for most problems.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
n_redundant=5, n_classes=3, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different beta_2 values
beta_2_values = [0.9, 0.99, 0.999, 0.9999]
accuracies = []
for beta_2 in beta_2_values:
mlp = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000, random_state=42,
solver='adam', beta_2=beta_2)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"beta_2={beta_2}, Accuracy: {accuracy:.3f}")
Running the example gives an output like:
beta_2=0.9, Accuracy: 0.885
beta_2=0.99, Accuracy: 0.900
beta_2=0.999, Accuracy: 0.885
beta_2=0.9999, Accuracy: 0.885
The key steps in this example are:
- Generate a synthetic multi-class classification dataset
- Split the data into train and test sets
- Train
MLPClassifier
models with differentbeta_2
values - Evaluate the accuracy of each model on the test set
Some tips and heuristics for setting beta_2
:
- Start with the default value of 0.999 and adjust if necessary
- Lower values (e.g., 0.9) can be beneficial for problems with rapidly changing gradients
- Higher values (e.g., 0.9999) may help with very noisy gradients
Issues to consider:
- The optimal
beta_2
value depends on the specific problem and dataset - Very high values close to 1 may cause the optimizer to adapt too slowly
- Very low values may cause the optimizer to be too sensitive to recent gradients
- Consider the interplay between
beta_2
and other optimizer parameters like learning rate