Configure BaggingClassifier "oob_score" Parameter

The oob_score parameter in scikit-learn’s BaggingClassifier determines whether to use out-of-bag samples to estimate the generalization accuracy.

Bagging (Bootstrap Aggregating) is an ensemble method that creates multiple subsets of the training data, trains a base estimator on each subset, and combines their predictions. Out-of-bag (OOB) samples are those not used for training a particular base estimator.

When oob_score is set to True, the classifier uses OOB samples to estimate the generalization accuracy without the need for a separate validation set.

The default value for oob_score is False. Setting it to True is common when you want to get an unbiased estimate of the model’s performance without using a separate validation set.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
                           n_redundant=0, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train BaggingClassifier with oob_score=False
bc_without_oob = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=100,
                                   random_state=42, oob_score=False)
bc_without_oob.fit(X_train, y_train)

# Create and train BaggingClassifier with oob_score=True
bc_with_oob = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=100,
                                random_state=42, oob_score=True)
bc_with_oob.fit(X_train, y_train)

# Evaluate models
y_pred_without_oob = bc_without_oob.predict(X_test)
y_pred_with_oob = bc_with_oob.predict(X_test)

print("BaggingClassifier without OOB:")
print(f"Test accuracy: {accuracy_score(y_test, y_pred_without_oob):.4f}")

print("\nBaggingClassifier with OOB:")
print(f"Test accuracy: {accuracy_score(y_test, y_pred_with_oob):.4f}")
print(f"OOB score: {bc_with_oob.oob_score_:.4f}")

Running the example gives an output like:

BaggingClassifier without OOB:
Test accuracy: 0.8900

BaggingClassifier with OOB:
Test accuracy: 0.8900
OOB score: 0.8662

The key steps in this example are:

Generate a synthetic classification dataset
Split the data into train and test sets
Create two BaggingClassifier instances, one with oob_score=False and another with oob_score=True
Train both models and evaluate their performance on the test set
For the model with oob_score=True, report the OOB score

Some tips for using oob_score:

Enable oob_score when you want to get an estimate of model performance without a separate validation set
OOB score can be used for model selection or hyperparameter tuning
OOB estimation is generally less biased than using the training error

Issues to consider:

Enabling oob_score increases computational cost and memory usage
OOB score may be less reliable for small datasets or with a small number of base estimators
The OOB estimate tends to be pessimistic and may underestimate the true performance

See Also