Configure MLPRegressor "epsilon" Parameter

The epsilon parameter in scikit-learn’s MLPRegressor controls the numerical stability of the Adam optimizer.

Multi-layer Perceptron (MLP) is a type of artificial neural network used for regression tasks. The MLPRegressor in scikit-learn implements this algorithm with the Adam optimizer for weight updates.

The epsilon parameter adds a small constant to the denominator during the Adam weight update to prevent division by zero and improve numerical stability.

The default value for epsilon is 1e-8 (0.00000001).

In practice, values between 1e-8 and 1e-5 are commonly used, depending on the specific problem and dataset characteristics.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different epsilon values
epsilon_values = [1e-8, 1e-7, 1e-6, 1e-5]
mse_scores = []

for eps in epsilon_values:
    mlp = MLPRegressor(hidden_layer_sizes=(100, 50), max_iter=500, random_state=42, epsilon=eps)
    mlp.fit(X_train, y_train)
    y_pred = mlp.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"epsilon={eps:.1e}, MSE: {mse:.4f}")

# Find best epsilon
best_epsilon = epsilon_values[np.argmin(mse_scores)]
print(f"\nBest epsilon: {best_epsilon:.1e}")

Running the example gives an output like:

epsilon=1.0e-08, MSE: 12.1675
epsilon=1.0e-07, MSE: 12.2065
epsilon=1.0e-06, MSE: 12.2210
epsilon=1.0e-05, MSE: 12.2668

Best epsilon: 1.0e-08

The key steps in this example are:

Generate a synthetic regression dataset
Split the data into train and test sets
Train MLPRegressor models with different epsilon values
Evaluate the mean squared error (MSE) of each model on the test set
Identify the best epsilon value based on the lowest MSE

Some tips and heuristics for setting epsilon:

Start with the default value of 1e-8 and adjust if numerical instability occurs
Increase epsilon if you encounter NaN or inf values during training
Decreasing epsilon might lead to more precise updates but could cause instability

Issues to consider:

The optimal epsilon value can vary depending on the scale of your features and target variable
Very small epsilon values may lead to numerical instability, while large values may slow down convergence
The effect of epsilon is often subtle and may interact with other hyperparameters like learning rate

See Also