SKLearner Home | About | Contact | Examples

Configure BaggingClassifier "warm_start" Parameter

The warm_start parameter in scikit-learn’s BaggingClassifier allows for incremental fitting of additional estimators to an existing ensemble.

Bagging (Bootstrap Aggregating) is an ensemble method that combines predictions from multiple base estimators trained on different subsets of the data. The warm_start parameter enables adding more estimators to the ensemble without retraining from scratch.

When warm_start is set to True, subsequent calls to fit() will add estimators to the existing ensemble rather than creating a new one. This can be useful for iteratively increasing the number of estimators or for online learning scenarios.

The default value for warm_start is False, which means a new ensemble is created each time fit() is called.

In practice, warm_start is often used when experimenting with the optimal number of estimators or when dealing with large datasets that require incremental training.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import time

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize BaggingClassifier with warm_start=False
bc = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=10,
                       random_state=42, warm_start=False)

# Fit and evaluate initial model
start_time = time.time()
bc.fit(X_train, y_train)
initial_time = time.time() - start_time
initial_score = accuracy_score(y_test, bc.predict(X_test))

print(f"Initial model (10 estimators):")
print(f"Time: {initial_time:.3f} seconds")
print(f"Accuracy: {initial_score:.3f}")

# Set warm_start=True and add more estimators
bc.set_params(warm_start=True, n_estimators=20)

# Fit additional estimators and evaluate
start_time = time.time()
bc.fit(X_train, y_train)
additional_time = time.time() - start_time
final_score = accuracy_score(y_test, bc.predict(X_test))

print(f"\nFinal model (20 estimators):")
print(f"Time to add 10 estimators: {additional_time:.3f} seconds")
print(f"Accuracy: {final_score:.3f}")

Running the example gives an output like:

Initial model (10 estimators):
Time: 0.087 seconds
Accuracy: 0.885

Final model (20 estimators):
Time to add 10 estimators: 0.082 seconds
Accuracy: 0.875

The key steps in this example are:

  1. Generate a synthetic binary classification dataset
  2. Split the data into train and test sets
  3. Create a BaggingClassifier with warm_start=False and 10 estimators
  4. Fit and evaluate the initial model
  5. Set warm_start=True and increase n_estimators to 20
  6. Fit additional estimators and evaluate the final model
  7. Compare performance and training time

Some tips and heuristics for using warm_start:

Issues to consider:



See Also