Configure MLPClassifier "epsilon" Parameter

The epsilon parameter in scikit-learn’s MLPClassifier controls the value added to the denominator for numerical stability.

MLPClassifier is a multi-layer perceptron neural network for classification tasks. It uses backpropagation for training and can handle complex non-linear relationships in data.

The epsilon parameter is used in the Adam optimizer to prevent division by zero. It affects the learning process by influencing how weight updates are calculated during training.

The default value for epsilon is 1e-8. In practice, values between 1e-8 and 1e-5 are commonly used, depending on the specific problem and dataset characteristics.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different epsilon values
epsilon_values = [1e-8, 1e-7, 1e-6, 1e-5]
accuracies = []

for eps in epsilon_values:
    mlp = MLPClassifier(hidden_layer_sizes=(100, 50), max_iter=500,
                        random_state=42, epsilon=eps)
    mlp.fit(X_train, y_train)
    y_pred = mlp.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"epsilon={eps:.0e}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

epsilon=1e-08, Accuracy: 0.895
epsilon=1e-07, Accuracy: 0.890
epsilon=1e-06, Accuracy: 0.875
epsilon=1e-05, Accuracy: 0.885

The key steps in this example are:

Generate a synthetic multi-class classification dataset
Split the data into train and test sets
Train MLPClassifier models with different epsilon values
Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting epsilon:

Start with the default value of 1e-8 and adjust if training is unstable
Increase epsilon if you encounter NaN or Inf values during training
Smaller epsilon values may lead to more precise updates but can cause instability
Larger epsilon values can improve stability but might slow down convergence

Issues to consider:

The optimal epsilon value can vary depending on the scale of your features and the complexity of the problem
Very small epsilon values might lead to numerical instability, especially with limited floating-point precision
Large epsilon values might prevent the optimizer from making small but important weight updates
The effect of epsilon may be more pronounced in problems with small gradients or when using adaptive learning rate methods

See Also