SKLearner Home | About | Contact | Examples

Configure HistGradientBoostingClassifier "early_stopping" Parameter

The early_stopping parameter in scikit-learn’s HistGradientBoostingClassifier controls whether to use early stopping to terminate training when validation score is not improving.

Early stopping is a technique to prevent overfitting by monitoring the model’s performance on a validation set during training. If the performance stops improving for a specified number of iterations, training is halted.

When early_stopping is set to True, the algorithm uses the last 10% of the training data as a validation set. It stops training when the validation score is not improving by at least tol for n_iter_no_change iterations.

The default value for early_stopping is True. Setting it to False disables early stopping, and the model will always train for the full number of iterations specified by max_iter.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import accuracy_score
import time

# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with early stopping enabled
start_time = time.time()
hgb_early = HistGradientBoostingClassifier(max_iter=1000, early_stopping=True, random_state=42)
hgb_early.fit(X_train, y_train)
early_time = time.time() - start_time
early_score = accuracy_score(y_test, hgb_early.predict(X_test))

# Train with early stopping disabled
start_time = time.time()
hgb_full = HistGradientBoostingClassifier(max_iter=1000, early_stopping=False, random_state=42)
hgb_full.fit(X_train, y_train)
full_time = time.time() - start_time
full_score = accuracy_score(y_test, hgb_full.predict(X_test))

print(f"Early stopping: Time={early_time:.2f}s, Accuracy={early_score:.4f}")
print(f"Full training: Time={full_time:.2f}s, Accuracy={full_score:.4f}")
print(f"Iterations with early stopping: {hgb_early.n_iter_}")
print(f"Iterations without early stopping: {hgb_full.n_iter_}")

Running the example gives an output like:

Early stopping: Time=0.88s, Accuracy=0.9155
Full training: Time=6.35s, Accuracy=0.9205
Iterations with early stopping: 96
Iterations without early stopping: 1000

The key steps in this example are:

  1. Generate a synthetic multi-class classification dataset
  2. Split the data into train and test sets
  3. Train HistGradientBoostingClassifier models with early stopping enabled and disabled
  4. Measure training time and final accuracy for both models
  5. Compare the number of iterations performed by each model

Some tips for using early stopping:

Issues to consider:



See Also