The epsilon
parameter in scikit-learn’s MLPClassifier
controls the value added to the denominator for numerical stability.
MLPClassifier
is a multi-layer perceptron neural network for classification tasks. It uses backpropagation for training and can handle complex non-linear relationships in data.
The epsilon
parameter is used in the Adam optimizer to prevent division by zero. It affects the learning process by influencing how weight updates are calculated during training.
The default value for epsilon
is 1e-8. In practice, values between 1e-8 and 1e-5 are commonly used, depending on the specific problem and dataset characteristics.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
n_redundant=5, n_classes=3, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different epsilon values
epsilon_values = [1e-8, 1e-7, 1e-6, 1e-5]
accuracies = []
for eps in epsilon_values:
mlp = MLPClassifier(hidden_layer_sizes=(100, 50), max_iter=500,
random_state=42, epsilon=eps)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"epsilon={eps:.0e}, Accuracy: {accuracy:.3f}")
Running the example gives an output like:
epsilon=1e-08, Accuracy: 0.895
epsilon=1e-07, Accuracy: 0.890
epsilon=1e-06, Accuracy: 0.875
epsilon=1e-05, Accuracy: 0.885
The key steps in this example are:
- Generate a synthetic multi-class classification dataset
- Split the data into train and test sets
- Train
MLPClassifier
models with differentepsilon
values - Evaluate the accuracy of each model on the test set
Some tips and heuristics for setting epsilon
:
- Start with the default value of 1e-8 and adjust if training is unstable
- Increase
epsilon
if you encounter NaN or Inf values during training - Smaller
epsilon
values may lead to more precise updates but can cause instability - Larger
epsilon
values can improve stability but might slow down convergence
Issues to consider:
- The optimal
epsilon
value can vary depending on the scale of your features and the complexity of the problem - Very small
epsilon
values might lead to numerical instability, especially with limited floating-point precision - Large
epsilon
values might prevent the optimizer from making small but important weight updates - The effect of
epsilon
may be more pronounced in problems with small gradients or when using adaptive learning rate methods