SKLearner Home | About | Contact | Examples

Configure BaggingRegressor "bootstrap_features" Parameter

The bootstrap_features parameter in scikit-learn’s BaggingRegressor controls whether features are sampled with replacement when building individual base estimators.

Bagging is an ensemble method that combines predictions from multiple base estimators to improve generalization and reduce overfitting. The bootstrap_features parameter determines if features are randomly sampled with replacement for each base estimator.

When bootstrap_features is set to True, each base estimator is trained on a random subset of features, potentially with duplicates. This can increase the diversity of the ensemble and may improve performance on datasets with many irrelevant features.

The default value for bootstrap_features is False, meaning all features are used for each base estimator.

In practice, setting bootstrap_features to True can be beneficial when dealing with high-dimensional data or when you want to increase the randomness in the ensemble.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different bootstrap_features values
bootstrap_features_values = [False, True]
mse_scores = []

for bootstrap_features in bootstrap_features_values:
    br = BaggingRegressor(n_estimators=10, bootstrap_features=bootstrap_features, random_state=42)
    br.fit(X_train, y_train)
    y_pred = br.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"bootstrap_features={bootstrap_features}, MSE: {mse:.3f}")

# Compare relative performance
relative_performance = (mse_scores[0] - mse_scores[1]) / mse_scores[0] * 100
print(f"Relative improvement: {relative_performance:.2f}%")

Running the example gives an output like:

bootstrap_features=False, MSE: 7486.481
bootstrap_features=True, MSE: 17215.800
Relative improvement: -129.96%

The key steps in this example are:

  1. Generate a synthetic regression dataset with multiple features
  2. Split the data into train and test sets
  3. Train BaggingRegressor models with different bootstrap_features values
  4. Evaluate the mean squared error of each model on the test set
  5. Compare the relative performance improvement

Some tips and heuristics for setting bootstrap_features:

Issues to consider:



See Also