Configure BaggingRegressor "oob_score" Parameter

The oob_score parameter in scikit-learn’s BaggingRegressor enables out-of-bag (OOB) estimation of the generalization error.

Bagging (Bootstrap Aggregating) is an ensemble method that fits multiple base estimators on random subsets of the original dataset. The oob_score parameter allows for model evaluation using samples not used in training individual estimators.

When oob_score is set to True, the model computes an additional score using only the samples that were not used in the training of each base estimator. This provides an unbiased estimate of the model’s performance without the need for a separate validation set.

The default value for oob_score is False. It’s commonly set to True when you want to get an estimate of the model’s performance without using a separate validation set or cross-validation.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingRegressor
from sklearn.metrics import r2_score

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with oob_score=False (default)
br_no_oob = BaggingRegressor(random_state=42)
br_no_oob.fit(X_train, y_train)
y_pred_no_oob = br_no_oob.predict(X_test)
r2_no_oob = r2_score(y_test, y_pred_no_oob)

# Train with oob_score=True
br_oob = BaggingRegressor(oob_score=True, random_state=42)
br_oob.fit(X_train, y_train)
y_pred_oob = br_oob.predict(X_test)
r2_oob = r2_score(y_test, y_pred_oob)

print(f"R-squared (oob_score=False): {r2_no_oob:.3f}")
print(f"R-squared (oob_score=True): {r2_oob:.3f}")
print(f"OOB Score: {br_oob.oob_score_:.3f}")

Running the example gives an output like:

R-squared (oob_score=False): 0.809
R-squared (oob_score=True): 0.809

The key steps in this example are:

Generate a synthetic regression dataset
Split the data into train and test sets
Train two BaggingRegressor models, one with oob_score=False and one with oob_score=True
Evaluate both models using R-squared score on the test set
Print the OOB score for the model with oob_score=True

Tips for using oob_score:

Enable oob_score when you want to get a performance estimate without a separate validation set
Use OOB score as a quick way to tune hyperparameters without cross-validation
Compare OOB score with test set performance to check for overfitting

Issues to consider:

Enabling oob_score increases computational cost and memory usage
OOB score may be less reliable for small datasets or with few base estimators
The OOB estimate tends to be pessimistic compared to cross-validation estimates

See Also