The epsilon parameter in scikit-learn’s MLPClassifier controls the value added to the denominator for numerical stability.
MLPClassifier is a multi-layer perceptron neural network for classification tasks. It uses backpropagation for training and can handle complex non-linear relationships in data.
The epsilon parameter is used in the Adam optimizer to prevent division by zero. It affects the learning process by influencing how weight updates are calculated during training.
The default value for epsilon is 1e-8. In practice, values between 1e-8 and 1e-5 are commonly used, depending on the specific problem and dataset characteristics.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
n_redundant=5, n_classes=3, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different epsilon values
epsilon_values = [1e-8, 1e-7, 1e-6, 1e-5]
accuracies = []
for eps in epsilon_values:
mlp = MLPClassifier(hidden_layer_sizes=(100, 50), max_iter=500,
random_state=42, epsilon=eps)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"epsilon={eps:.0e}, Accuracy: {accuracy:.3f}")
Running the example gives an output like:
epsilon=1e-08, Accuracy: 0.895
epsilon=1e-07, Accuracy: 0.890
epsilon=1e-06, Accuracy: 0.875
epsilon=1e-05, Accuracy: 0.885
The key steps in this example are:
- Generate a synthetic multi-class classification dataset
- Split the data into train and test sets
- Train
MLPClassifiermodels with differentepsilonvalues - Evaluate the accuracy of each model on the test set
Some tips and heuristics for setting epsilon:
- Start with the default value of 1e-8 and adjust if training is unstable
- Increase
epsilonif you encounter NaN or Inf values during training - Smaller
epsilonvalues may lead to more precise updates but can cause instability - Larger
epsilonvalues can improve stability but might slow down convergence
Issues to consider:
- The optimal
epsilonvalue can vary depending on the scale of your features and the complexity of the problem - Very small
epsilonvalues might lead to numerical instability, especially with limited floating-point precision - Large
epsilonvalues might prevent the optimizer from making small but important weight updates - The effect of
epsilonmay be more pronounced in problems with small gradients or when using adaptive learning rate methods