The tol
parameter in scikit-learn’s MLPClassifier
controls the tolerance for optimization convergence.
Multi-layer Perceptron (MLP) is a type of artificial neural network used for classification tasks. The tol
parameter determines the threshold for improvement in the loss function that stops the optimization process.
A smaller tol
value leads to more precise optimization but may increase training time. Conversely, a larger value may result in faster convergence but potentially less optimal results.
The default value for tol
is 1e-4 (0.0001).
In practice, values between 1e-5 and 1e-2 are commonly used, depending on the desired trade-off between precision and training speed.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
n_redundant=5, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different tol values
tol_values = [1e-5, 1e-4, 1e-3, 1e-2]
accuracies = []
for tol in tol_values:
mlp = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000, tol=tol, random_state=42)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"tol={tol:.0e}, Accuracy: {accuracy:.3f}, Iterations: {mlp.n_iter_}")
Running the example gives an output like:
tol=1e-05, Accuracy: 0.935, Iterations: 691
tol=1e-04, Accuracy: 0.945, Iterations: 377
tol=1e-03, Accuracy: 0.940, Iterations: 95
tol=1e-02, Accuracy: 0.915, Iterations: 26
The key steps in this example are:
- Generate a synthetic classification dataset with informative and noise features
- Split the data into train and test sets
- Train
MLPClassifier
models with differenttol
values - Evaluate the accuracy and number of iterations for each model
Some tips and heuristics for setting tol
:
- Start with the default value of 1e-4 and adjust based on performance and training time
- Use smaller values for more precise optimization, larger values for faster convergence
- Monitor both model performance and training iterations to find the optimal balance
Issues to consider:
- Very small
tol
values may lead to overfitting or excessive training time - Large
tol
values might cause premature convergence, resulting in suboptimal performance - The optimal
tol
depends on the complexity of the dataset and model architecture - Consider using early stopping with validation data for more robust optimization