SKLearner Home | About | Contact | Examples

Configure BaggingRegressor "bootstrap" Parameter

The bootstrap parameter in scikit-learn’s BaggingRegressor controls whether bootstrap samples are used when building base estimators.

Bagging (Bootstrap Aggregating) is an ensemble method that combines predictions from multiple base estimators to improve generalization and reduce overfitting. The bootstrap parameter determines whether individual estimators are trained on bootstrap samples or the original dataset.

When bootstrap is True, each base estimator is trained on a random subset of the training data, sampled with replacement. This introduces randomness and can help reduce overfitting. When False, each estimator uses the full dataset, which may lead to stronger individual models but less diversity in the ensemble.

The default value for bootstrap is True.

In practice, both True and False can be effective depending on the dataset and base estimator characteristics.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different bootstrap values
bootstrap_values = [True, False]
mse_scores = []

for bootstrap in bootstrap_values:
    bagging = BaggingRegressor(n_estimators=10, bootstrap=bootstrap, random_state=42)
    bagging.fit(X_train, y_train)
    y_pred = bagging.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"bootstrap={bootstrap}, MSE: {mse:.3f}")

# Calculate relative difference
relative_diff = (mse_scores[1] - mse_scores[0]) / mse_scores[0] * 100
print(f"Relative difference: {relative_diff:.2f}%")

Running the example gives an output like:

bootstrap=True, MSE: 7486.481
bootstrap=False, MSE: 17501.890
Relative difference: 133.78%

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Train BaggingRegressor models with bootstrap set to True and False
  4. Evaluate the mean squared error of each model on the test set
  5. Compare the relative difference in performance between the two configurations

Some tips and heuristics for setting bootstrap:

Issues to consider:



See Also