SKLearner Home | About | Contact | Examples

Configure BaggingRegressor "warm_start" Parameter

The warm_start parameter in scikit-learn’s BaggingRegressor determines whether to reuse the solution of the previous call to fit and add more estimators to the ensemble.

Bagging is an ensemble method that combines predictions from multiple base estimators to improve generalization and robustness. The warm_start parameter allows for incremental fitting of the ensemble.

When warm_start is set to True, the model can be fitted incrementally, adding new estimators to the existing ensemble. When False, fitting always starts from scratch.

The default value for warm_start is False.

Setting warm_start to True is useful when you want to gradually increase the number of estimators or when working with large datasets that don’t fit in memory.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize BaggingRegressors
br_cold = BaggingRegressor(n_estimators=10, random_state=42, warm_start=False)
br_warm = BaggingRegressor(n_estimators=10, random_state=42, warm_start=True)

# Fit incrementally and evaluate
n_estimator_steps = [10, 20, 50, 100]
cold_mse = []
warm_mse = []

for n_estimators in n_estimator_steps:
    br_cold.n_estimators = n_estimators
    br_cold.fit(X_train, y_train)
    y_pred_cold = br_cold.predict(X_test)
    cold_mse.append(mean_squared_error(y_test, y_pred_cold))

    br_warm.n_estimators = n_estimators
    br_warm.fit(X_train, y_train)
    y_pred_warm = br_warm.predict(X_test)
    warm_mse.append(mean_squared_error(y_test, y_pred_warm))

    print(f"n_estimators={n_estimators}")
    print(f"Cold start MSE: {cold_mse[-1]:.4f}")
    print(f"Warm start MSE: {warm_mse[-1]:.4f}")
    print()

Running the example gives an output like:

n_estimators=10
Cold start MSE: 7486.4813
Warm start MSE: 7486.4813

n_estimators=20
Cold start MSE: 7469.0224
Warm start MSE: 7469.0224

n_estimators=50
Cold start MSE: 7111.1779
Warm start MSE: 7111.1779

n_estimators=100
Cold start MSE: 6971.3948
Warm start MSE: 6971.3948

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Create two BaggingRegressor instances, one with warm_start=False and another with warm_start=True
  4. Incrementally fit both models, increasing the number of estimators in stages
  5. Evaluate the mean squared error of both models at each stage

Some tips for using warm_start:

Issues to consider:



See Also