SKLearner Home | About | Contact | Examples

Configure RandomForestClassifier "bootstrap" Parameter

The bootstrap parameter in scikit-learn’s RandomForestClassifier determines whether bootstrap sampling is used when building trees.

Bootstrap sampling involves creating multiple datasets by repeatedly sampling with replacement from the original dataset. This introduces randomness and diversity into the ensemble.

By default, bootstrap is set to True, meaning each tree is trained on a bootstrapped dataset. Setting it to False means each tree is trained on the entire original dataset.

Using bootstrap sampling (bootstrap=True) generally improves performance by reducing variance, but it can slightly increase bias. Disabling it (bootstrap=False) may be beneficial for high-dimensional datasets.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_features=10,
                           n_informative=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with bootstrap=True and bootstrap=False
rf_bootstrap = RandomForestClassifier(bootstrap=True, random_state=42)
rf_no_bootstrap = RandomForestClassifier(bootstrap=False, random_state=42)

rf_bootstrap.fit(X_train, y_train)
rf_no_bootstrap.fit(X_train, y_train)

# Evaluate accuracy
y_pred_bootstrap = rf_bootstrap.predict(X_test)
y_pred_no_bootstrap = rf_no_bootstrap.predict(X_test)

accuracy_bootstrap = accuracy_score(y_test, y_pred_bootstrap)
accuracy_no_bootstrap = accuracy_score(y_test, y_pred_no_bootstrap)

print(f"Accuracy with bootstrap: {accuracy_bootstrap:.3f}")
print(f"Accuracy without bootstrap: {accuracy_no_bootstrap:.3f}")

Running the example gives an output like:

Accuracy with bootstrap: 0.855
Accuracy without bootstrap: 0.850

The key steps in this example are:

  1. Generate a synthetic multiclass classification dataset
  2. Split the data into train and test sets
  3. Train RandomForestClassifier with bootstrap=True and bootstrap=False
  4. Evaluate the accuracy of each model on the test set

Tips and heuristics for setting bootstrap:

Issues to consider:



See Also