SKLearner Home | About | Contact | Examples

Configure ExtraTreesClassifier "bootstrap" Parameter

The bootstrap parameter in scikit-learn’s ExtraTreesClassifier determines whether bootstrap samples are used when building trees.

Extra Trees, short for Extremely Randomized Trees, is an ensemble learning method similar to Random Forests. It builds multiple decision trees and aggregates their predictions to improve overall performance and reduce overfitting.

The bootstrap parameter controls whether individual trees are trained on bootstrap samples (random samples with replacement) of the training data. When True, each tree uses a random subset of the data, introducing more diversity among trees. When False, the whole dataset is used for each tree.

The default value for bootstrap is False in ExtraTreesClassifier.

In practice, both True and False are commonly used, depending on the specific problem and dataset characteristics.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different bootstrap values
bootstrap_values = [False, True]
accuracies = []

for bootstrap in bootstrap_values:
    etc = ExtraTreesClassifier(n_estimators=100, random_state=42, bootstrap=bootstrap)
    etc.fit(X_train, y_train)
    y_pred = etc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"bootstrap={bootstrap}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

bootstrap=False, Accuracy: 0.845
bootstrap=True, Accuracy: 0.825

The key steps in this example are:

  1. Generate a synthetic multi-class classification dataset
  2. Split the data into train and test sets
  3. Train ExtraTreesClassifier models with different bootstrap values
  4. Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting the bootstrap parameter:

Issues to consider:



See Also