The epsilon
parameter in scikit-learn’s MLPRegressor
controls the numerical stability of the Adam optimizer.
Multi-layer Perceptron (MLP) is a type of artificial neural network used for regression tasks. The MLPRegressor
in scikit-learn implements this algorithm with the Adam optimizer for weight updates.
The epsilon
parameter adds a small constant to the denominator during the Adam weight update to prevent division by zero and improve numerical stability.
The default value for epsilon
is 1e-8 (0.00000001).
In practice, values between 1e-8 and 1e-5 are commonly used, depending on the specific problem and dataset characteristics.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different epsilon values
epsilon_values = [1e-8, 1e-7, 1e-6, 1e-5]
mse_scores = []
for eps in epsilon_values:
mlp = MLPRegressor(hidden_layer_sizes=(100, 50), max_iter=500, random_state=42, epsilon=eps)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mse_scores.append(mse)
print(f"epsilon={eps:.1e}, MSE: {mse:.4f}")
# Find best epsilon
best_epsilon = epsilon_values[np.argmin(mse_scores)]
print(f"\nBest epsilon: {best_epsilon:.1e}")
Running the example gives an output like:
epsilon=1.0e-08, MSE: 12.1675
epsilon=1.0e-07, MSE: 12.2065
epsilon=1.0e-06, MSE: 12.2210
epsilon=1.0e-05, MSE: 12.2668
Best epsilon: 1.0e-08
The key steps in this example are:
- Generate a synthetic regression dataset
- Split the data into train and test sets
- Train
MLPRegressor
models with differentepsilon
values - Evaluate the mean squared error (MSE) of each model on the test set
- Identify the best
epsilon
value based on the lowest MSE
Some tips and heuristics for setting epsilon
:
- Start with the default value of 1e-8 and adjust if numerical instability occurs
- Increase
epsilon
if you encounter NaN or inf values during training - Decreasing
epsilon
might lead to more precise updates but could cause instability
Issues to consider:
- The optimal
epsilon
value can vary depending on the scale of your features and target variable - Very small
epsilon
values may lead to numerical instability, while large values may slow down convergence - The effect of
epsilon
is often subtle and may interact with other hyperparameters like learning rate