SKLearner Home | About | Contact | Examples

Configure HistGradientBoostingClassifier "warm_start" Parameter

The warm_start parameter in scikit-learn’s HistGradientBoostingClassifier allows for incremental fitting by reusing the solution of the previous call to fit.

HistGradientBoostingClassifier is a fast implementation of gradient boosting trees, using histogram-based algorithms for efficient training. It builds an ensemble of decision trees sequentially, with each tree correcting errors made by the previous ones.

When warm_start is set to True, subsequent calls to fit() will add more estimators to the ensemble instead of creating a new model. This can be useful for fine-tuning the number of estimators or for online learning scenarios.

The default value for warm_start is False, meaning each call to fit() creates a new ensemble. Setting it to True is common when you want to incrementally add estimators or perform early stopping.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import accuracy_score
import time

# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize models
model_cold = HistGradientBoostingClassifier(random_state=42)
model_warm = HistGradientBoostingClassifier(warm_start=True, random_state=42)

# Training with incremental fitting
n_estimators_list = [10, 20, 50, 100]
cold_times, warm_times = [], []
cold_scores, warm_scores = [], []

for n_estimators in n_estimators_list:
    # Cold start
    start_time = time.time()
    model_cold.set_params(max_iter=n_estimators)
    model_cold.fit(X_train, y_train)
    cold_times.append(time.time() - start_time)
    cold_scores.append(accuracy_score(y_test, model_cold.predict(X_test)))

    # Warm start
    start_time = time.time()
    model_warm.set_params(max_iter=n_estimators)
    model_warm.fit(X_train, y_train)
    warm_times.append(time.time() - start_time)
    warm_scores.append(accuracy_score(y_test, model_warm.predict(X_test)))

# Print results
for i, n_estimators in enumerate(n_estimators_list):
    print(f"n_estimators={n_estimators}:")
    print(f"  Cold start - Time: {cold_times[i]:.3f}s, Accuracy: {cold_scores[i]:.3f}")
    print(f"  Warm start - Time: {warm_times[i]:.3f}s, Accuracy: {warm_scores[i]:.3f}")

Running the example gives an output like:

n_estimators=10:
  Cold start - Time: 0.058s, Accuracy: 0.890
  Warm start - Time: 0.053s, Accuracy: 0.890
n_estimators=20:
  Cold start - Time: 0.084s, Accuracy: 0.915
  Warm start - Time: 0.053s, Accuracy: 0.915
n_estimators=50:
  Cold start - Time: 0.186s, Accuracy: 0.935
  Warm start - Time: 0.116s, Accuracy: 0.935
n_estimators=100:
  Cold start - Time: 0.312s, Accuracy: 0.943
  Warm start - Time: 0.168s, Accuracy: 0.943

The key steps in this example are:

  1. Generate a synthetic classification dataset
  2. Split the data into train and test sets
  3. Create two HistGradientBoostingClassifier models, one with warm_start=False (default) and one with warm_start=True
  4. Train both models incrementally, increasing the number of estimators in stages
  5. Compare training times and accuracy scores for each stage

Some tips for using warm_start:

Issues to consider:



See Also