Configure BaggingRegressor "warm_start" Parameter

The warm_start parameter in scikit-learn’s BaggingRegressor determines whether to reuse the solution of the previous call to fit and add more estimators to the ensemble.

Bagging is an ensemble method that combines predictions from multiple base estimators to improve generalization and robustness. The warm_start parameter allows for incremental fitting of the ensemble.

When warm_start is set to True, the model can be fitted incrementally, adding new estimators to the existing ensemble. When False, fitting always starts from scratch.

The default value for warm_start is False.

Setting warm_start to True is useful when you want to gradually increase the number of estimators or when working with large datasets that don’t fit in memory.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize BaggingRegressors
br_cold = BaggingRegressor(n_estimators=10, random_state=42, warm_start=False)
br_warm = BaggingRegressor(n_estimators=10, random_state=42, warm_start=True)

# Fit incrementally and evaluate
n_estimator_steps = [10, 20, 50, 100]
cold_mse = []
warm_mse = []

for n_estimators in n_estimator_steps:
    br_cold.n_estimators = n_estimators
    br_cold.fit(X_train, y_train)
    y_pred_cold = br_cold.predict(X_test)
    cold_mse.append(mean_squared_error(y_test, y_pred_cold))

    br_warm.n_estimators = n_estimators
    br_warm.fit(X_train, y_train)
    y_pred_warm = br_warm.predict(X_test)
    warm_mse.append(mean_squared_error(y_test, y_pred_warm))

    print(f"n_estimators={n_estimators}")
    print(f"Cold start MSE: {cold_mse[-1]:.4f}")
    print(f"Warm start MSE: {warm_mse[-1]:.4f}")
    print()

Running the example gives an output like:

n_estimators=10
Cold start MSE: 7486.4813
Warm start MSE: 7486.4813

n_estimators=20
Cold start MSE: 7469.0224
Warm start MSE: 7469.0224

n_estimators=50
Cold start MSE: 7111.1779
Warm start MSE: 7111.1779

n_estimators=100
Cold start MSE: 6971.3948
Warm start MSE: 6971.3948

The key steps in this example are:

Generate a synthetic regression dataset
Split the data into train and test sets
Create two BaggingRegressor instances, one with warm_start=False and another with warm_start=True
Incrementally fit both models, increasing the number of estimators in stages
Evaluate the mean squared error of both models at each stage

Some tips for using warm_start:

Use warm_start=True when you want to incrementally add estimators to an existing ensemble
It can be beneficial for large datasets or when tuning the optimal number of estimators
Combine with early stopping techniques to find the optimal number of estimators efficiently

Issues to consider:

With warm_start=True, changing other parameters of the model may lead to unexpected results
The computational efficiency of warm start depends on the base estimator and dataset size
Memory usage increases with the number of estimators, which may be a concern for very large ensembles

See Also