The oob_score
parameter in scikit-learn’s BaggingRegressor
enables out-of-bag (OOB) estimation of the generalization error.
Bagging (Bootstrap Aggregating) is an ensemble method that fits multiple base estimators on random subsets of the original dataset. The oob_score
parameter allows for model evaluation using samples not used in training individual estimators.
When oob_score
is set to True
, the model computes an additional score using only the samples that were not used in the training of each base estimator. This provides an unbiased estimate of the model’s performance without the need for a separate validation set.
The default value for oob_score
is False
. It’s commonly set to True
when you want to get an estimate of the model’s performance without using a separate validation set or cross-validation.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingRegressor
from sklearn.metrics import r2_score
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with oob_score=False (default)
br_no_oob = BaggingRegressor(random_state=42)
br_no_oob.fit(X_train, y_train)
y_pred_no_oob = br_no_oob.predict(X_test)
r2_no_oob = r2_score(y_test, y_pred_no_oob)
# Train with oob_score=True
br_oob = BaggingRegressor(oob_score=True, random_state=42)
br_oob.fit(X_train, y_train)
y_pred_oob = br_oob.predict(X_test)
r2_oob = r2_score(y_test, y_pred_oob)
print(f"R-squared (oob_score=False): {r2_no_oob:.3f}")
print(f"R-squared (oob_score=True): {r2_oob:.3f}")
print(f"OOB Score: {br_oob.oob_score_:.3f}")
Running the example gives an output like:
R-squared (oob_score=False): 0.809
R-squared (oob_score=True): 0.809
The key steps in this example are:
- Generate a synthetic regression dataset
- Split the data into train and test sets
- Train two
BaggingRegressor
models, one withoob_score=False
and one withoob_score=True
- Evaluate both models using R-squared score on the test set
- Print the OOB score for the model with
oob_score=True
Tips for using oob_score
:
- Enable
oob_score
when you want to get a performance estimate without a separate validation set - Use OOB score as a quick way to tune hyperparameters without cross-validation
- Compare OOB score with test set performance to check for overfitting
Issues to consider:
- Enabling
oob_score
increases computational cost and memory usage - OOB score may be less reliable for small datasets or with few base estimators
- The OOB estimate tends to be pessimistic compared to cross-validation estimates