SKLearner Home | About | Contact | Examples

Configure BaggingClassifier "bootstrap" Parameter

The bootstrap parameter in scikit-learn’s BaggingClassifier determines whether bootstrap samples are used when building base estimators.

Bagging (Bootstrap Aggregating) is an ensemble method that creates multiple subsets of the original dataset, trains a model on each subset, and combines their predictions. The bootstrap parameter controls how these subsets are created.

When bootstrap is set to True, samples are drawn with replacement, allowing the same instance to appear multiple times in a subset. When False, the whole dataset is used to train each base estimator.

The default value for bootstrap is True.

In practice, True is commonly used to introduce randomness, while False may be preferred for smaller datasets or when overfitting is a concern.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different bootstrap values
bootstrap_values = [True, False]
base_estimator = DecisionTreeClassifier(random_state=42)

for bootstrap in bootstrap_values:
    bagging = BaggingClassifier(estimator=base_estimator, n_estimators=10,
                                bootstrap=bootstrap, random_state=42)
    bagging.fit(X_train, y_train)
    y_pred = bagging.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"bootstrap={bootstrap}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

bootstrap=True, Accuracy: 0.865
bootstrap=False, Accuracy: 0.800

The key steps in this example are:

  1. Generate a synthetic classification dataset with informative and redundant features
  2. Split the data into train and test sets
  3. Create BaggingClassifier instances with different bootstrap values
  4. Train models and evaluate their accuracy on the test set

Some tips and heuristics for setting bootstrap:

Issues to consider:



See Also