SKLearner Home | About | Contact | Examples

Configure HistGradientBoostingRegressor "warm_start" Parameter

The warm_start parameter in scikit-learn’s HistGradientBoostingRegressor allows for incremental fitting of the model by reusing the solution of the previous call to fit.

HistGradientBoostingRegressor is a gradient boosting algorithm that uses histogram-based decision trees. It’s designed for efficiency and can handle large datasets.

When warm_start is set to True, the model can be trained incrementally, adding more estimators to an existing fitted model. This is particularly useful for online learning or when dealing with large datasets that don’t fit in memory.

The default value for warm_start is False. It’s commonly set to True when you want to continue training a model with additional data or increase the number of estimators without starting from scratch.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create models with warm_start=False and warm_start=True
model_cold = HistGradientBoostingRegressor(max_iter=10, random_state=42)
model_warm = HistGradientBoostingRegressor(max_iter=10, warm_start=True, random_state=42)

# Train models incrementally
n_iterations = [10, 20, 30, 40, 50]
for n in n_iterations:
    model_cold.set_params(max_iter=n)
    model_cold.fit(X_train, y_train)
    y_pred_cold = model_cold.predict(X_test)
    mse_cold = mean_squared_error(y_test, y_pred_cold)

    model_warm.set_params(max_iter=n)
    model_warm.fit(X_train, y_train)
    y_pred_warm = model_warm.predict(X_test)
    mse_warm = mean_squared_error(y_test, y_pred_warm)

    print(f"Iterations: {n}")
    print(f"Cold MSE: {mse_cold:.4f}")
    print(f"Warm MSE: {mse_warm:.4f}")
    print()

Running the example gives an output like:

Iterations: 10
Cold MSE: 6192.5739
Warm MSE: 6192.5739

Iterations: 20
Cold MSE: 3390.2090
Warm MSE: 3390.2090

Iterations: 30
Cold MSE: 2137.6786
Warm MSE: 2137.6786

Iterations: 40
Cold MSE: 1547.2405
Warm MSE: 1547.2405

Iterations: 50
Cold MSE: 1306.1602
Warm MSE: 1306.1602

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Create two HistGradientBoostingRegressor models: one with warm_start=False, one with warm_start=True
  4. Train both models incrementally, increasing the number of iterations
  5. Evaluate the models’ performance using mean squared error at each step

Tips and heuristics for using warm_start:

Issues to consider:



See Also