The early_stopping
parameter in scikit-learn’s HistGradientBoostingClassifier
controls whether to use early stopping to terminate training when validation score is not improving.
Early stopping is a technique to prevent overfitting by monitoring the model’s performance on a validation set during training. If the performance stops improving for a specified number of iterations, training is halted.
When early_stopping
is set to True
, the algorithm uses the last 10% of the training data as a validation set. It stops training when the validation score is not improving by at least tol
for n_iter_no_change
iterations.
The default value for early_stopping
is True
. Setting it to False
disables early stopping, and the model will always train for the full number of iterations specified by max_iter
.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import accuracy_score
import time
# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
n_redundant=5, n_classes=3, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with early stopping enabled
start_time = time.time()
hgb_early = HistGradientBoostingClassifier(max_iter=1000, early_stopping=True, random_state=42)
hgb_early.fit(X_train, y_train)
early_time = time.time() - start_time
early_score = accuracy_score(y_test, hgb_early.predict(X_test))
# Train with early stopping disabled
start_time = time.time()
hgb_full = HistGradientBoostingClassifier(max_iter=1000, early_stopping=False, random_state=42)
hgb_full.fit(X_train, y_train)
full_time = time.time() - start_time
full_score = accuracy_score(y_test, hgb_full.predict(X_test))
print(f"Early stopping: Time={early_time:.2f}s, Accuracy={early_score:.4f}")
print(f"Full training: Time={full_time:.2f}s, Accuracy={full_score:.4f}")
print(f"Iterations with early stopping: {hgb_early.n_iter_}")
print(f"Iterations without early stopping: {hgb_full.n_iter_}")
Running the example gives an output like:
Early stopping: Time=0.88s, Accuracy=0.9155
Full training: Time=6.35s, Accuracy=0.9205
Iterations with early stopping: 96
Iterations without early stopping: 1000
The key steps in this example are:
- Generate a synthetic multi-class classification dataset
- Split the data into train and test sets
- Train
HistGradientBoostingClassifier
models with early stopping enabled and disabled - Measure training time and final accuracy for both models
- Compare the number of iterations performed by each model
Some tips for using early stopping:
- Use early stopping to prevent overfitting and reduce training time
- Adjust
n_iter_no_change
andtol
to fine-tune early stopping behavior - Consider using a separate validation set instead of the default 10% holdout
Issues to consider:
- Early stopping may result in underfitting if stopped too early
- Disabling early stopping can lead to overfitting and longer training times
- The effectiveness of early stopping depends on the dataset and model complexity