Configure MLPClassifier "validation_fraction" Parameter

The validation_fraction parameter in scikit-learn’s MLPClassifier determines the proportion of training data to set aside as a validation set for early stopping.

MLPClassifier is a multi-layer perceptron neural network for classification. It uses a validation set to monitor performance during training and can stop early if no improvement is seen for a specified number of consecutive epochs.

validation_fraction affects how much of the training data is reserved for validation. A larger fraction provides a more reliable estimate of generalization performance but reduces the amount of data available for training.

The default value for validation_fraction is 0.1 (10% of the training data).

Typical values range from 0.1 to 0.3, depending on the size of the dataset and the complexity of the problem.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different validation_fraction values
fractions = [0.1, 0.2, 0.3]
accuracies = []

for fraction in fractions:
    mlp = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000,
                        validation_fraction=fraction, random_state=42)
    mlp.fit(X_train, y_train)
    y_pred = mlp.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"validation_fraction={fraction}, Accuracy: {accuracy:.3f}, "
          f"Iterations: {mlp.n_iter_}")

Running the example gives an output like:

validation_fraction=0.1, Accuracy: 0.885, Iterations: 501
validation_fraction=0.2, Accuracy: 0.885, Iterations: 501
validation_fraction=0.3, Accuracy: 0.885, Iterations: 501

The key steps in this example are:

Generate a synthetic multi-class classification dataset
Split the data into train and test sets
Train MLPClassifier models with different validation_fraction values
Evaluate the accuracy of each model on the test set
Compare the number of iterations needed for convergence

Some tips and heuristics for setting validation_fraction:

Use larger fractions (e.g., 0.2 or 0.3) for smaller datasets to ensure a representative validation set
For large datasets, smaller fractions (e.g., 0.1 or 0.15) may be sufficient
Consider cross-validation as an alternative if you have limited data

Issues to consider:

A larger validation set reduces the amount of data available for training, which can impact model performance
Too small a validation set may not provide reliable estimates of generalization performance
The optimal fraction depends on the size and complexity of your dataset
Early stopping based on validation performance can help prevent overfitting

See Also