SKLearner Home | About | Contact | Examples

Configure BaggingClassifier "bootstrap_features" Parameter

The bootstrap_features parameter in scikit-learn’s BaggingClassifier controls whether features are sampled with replacement when training base estimators.

Bagging (Bootstrap Aggregating) is an ensemble method that combines predictions from multiple models to reduce variance and improve generalization. The bootstrap_features parameter determines if and how features are randomly sampled for each base estimator.

When bootstrap_features is True, features are sampled with replacement, allowing some features to be selected multiple times while others may not be selected at all. This increases diversity among base estimators, potentially improving the ensemble’s performance.

The default value for bootstrap_features is False, meaning all features are used for each base estimator. Setting it to True enables feature sampling, which can be beneficial for high-dimensional datasets or when feature correlations exist.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_repeated=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different bootstrap_features values
bootstrap_features_values = [False, True]
accuracies = []

for bootstrap_features in bootstrap_features_values:
    bagging = BaggingClassifier(estimator=DecisionTreeClassifier(),
                                n_estimators=100,
                                bootstrap_features=bootstrap_features,
                                random_state=42)
    bagging.fit(X_train, y_train)
    y_pred = bagging.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"bootstrap_features={bootstrap_features}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

bootstrap_features=False, Accuracy: 0.900
bootstrap_features=True, Accuracy: 0.915

The key steps in this example are:

  1. Generate a synthetic classification dataset with informative, redundant, and repeated features
  2. Split the data into train and test sets
  3. Train BaggingClassifier models with different bootstrap_features values
  4. Evaluate the accuracy of each model on the test set

Tips for setting bootstrap_features:

Issues to consider:



See Also