The validation_fraction
parameter in scikit-learn’s MLPClassifier
determines the proportion of training data to set aside as a validation set for early stopping.
MLPClassifier
is a multi-layer perceptron neural network for classification. It uses a validation set to monitor performance during training and can stop early if no improvement is seen for a specified number of consecutive epochs.
validation_fraction
affects how much of the training data is reserved for validation. A larger fraction provides a more reliable estimate of generalization performance but reduces the amount of data available for training.
The default value for validation_fraction
is 0.1 (10% of the training data).
Typical values range from 0.1 to 0.3, depending on the size of the dataset and the complexity of the problem.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
n_redundant=5, n_classes=3, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different validation_fraction values
fractions = [0.1, 0.2, 0.3]
accuracies = []
for fraction in fractions:
mlp = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000,
validation_fraction=fraction, random_state=42)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"validation_fraction={fraction}, Accuracy: {accuracy:.3f}, "
f"Iterations: {mlp.n_iter_}")
Running the example gives an output like:
validation_fraction=0.1, Accuracy: 0.885, Iterations: 501
validation_fraction=0.2, Accuracy: 0.885, Iterations: 501
validation_fraction=0.3, Accuracy: 0.885, Iterations: 501
The key steps in this example are:
- Generate a synthetic multi-class classification dataset
- Split the data into train and test sets
- Train
MLPClassifier
models with differentvalidation_fraction
values - Evaluate the accuracy of each model on the test set
- Compare the number of iterations needed for convergence
Some tips and heuristics for setting validation_fraction
:
- Use larger fractions (e.g., 0.2 or 0.3) for smaller datasets to ensure a representative validation set
- For large datasets, smaller fractions (e.g., 0.1 or 0.15) may be sufficient
- Consider cross-validation as an alternative if you have limited data
Issues to consider:
- A larger validation set reduces the amount of data available for training, which can impact model performance
- Too small a validation set may not provide reliable estimates of generalization performance
- The optimal fraction depends on the size and complexity of your dataset
- Early stopping based on validation performance can help prevent overfitting