Configure ExtraTreesRegressor "oob_score" Parameter

The oob_score parameter in scikit-learn’s ExtraTreesRegressor enables out-of-bag (OOB) error estimation during training.

Extra Trees (Extremely Randomized Trees) is an ensemble method similar to Random Forests, but with additional randomization in the tree-building process. It creates multiple decision trees and aggregates their predictions.

The oob_score parameter, when set to True, uses samples not selected during bootstrap to estimate the generalization accuracy. This provides an unbiased estimate of the model’s performance without needing a separate validation set.

By default, oob_score is set to False. It’s commonly enabled when you want to monitor model performance during training without using a separate validation set.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and fit ExtraTreesRegressor with oob_score=True
et_oob = ExtraTreesRegressor(n_estimators=100, bootstrap=True, oob_score=True, random_state=42)
et_oob.fit(X_train, y_train)

# Print OOB score
print(f"OOB Score: {et_oob.oob_score_:.3f}")

# Evaluate on test set
y_pred = et_oob.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Test MSE: {mse:.3f}")

# Compare with model without OOB scoring
et_no_oob = ExtraTreesRegressor(n_estimators=100, bootstrap=True, oob_score=False, random_state=42)
et_no_oob.fit(X_train, y_train)
y_pred_no_oob = et_no_oob.predict(X_test)
mse_no_oob = mean_squared_error(y_test, y_pred_no_oob)
print(f"Test MSE (without OOB): {mse_no_oob:.3f}")

Running the example gives an output like:

OOB Score: 0.859
Test MSE: 2122.805
Test MSE (without OOB): 2122.805

Key steps in this example:

Generate a synthetic regression dataset
Split data into train and test sets
Create ExtraTreesRegressor with oob_score=True
Fit model and print OOB score
Compare OOB score with test set performance
Create and evaluate model with oob_score=False for comparison

Tips for using oob_score:

Enable when you want to monitor model performance without a separate validation set
OOB score is typically slightly pessimistic compared to the true test error
Increases computation time and memory usage, especially for large datasets

Issues to consider:

OOB estimation may be less reliable for small datasets or models with few trees
OOB score doesn’t replace proper cross-validation for final model evaluation
Enabling oob_score slightly increases training time and memory usage

See Also