Get RandomForestClassifier "oob_decision_function_" Attribute

The RandomForestClassifier in scikit-learn is an ensemble learning algorithm that combines multiple decision trees to make robust predictions. One of its key features is the ability to estimate the generalization performance using out-of-bag (OOB) samples, which are the samples not used for training each individual tree.

The oob_decision_function_ attribute of a fitted RandomForestClassifier is a 2D array that stores the class probabilities or decision function values for each OOB sample. Each row corresponds to an input sample, and each column represents a class. The values in this array can be used to assess the confidence of the classifier’s predictions and to compute various performance metrics.

Accessing the oob_decision_function_ attribute can provide insights into the behavior of the random forest classifier and help in evaluating its performance. By examining the OOB decision function values, you can identify samples that are difficult to classify, assess the confidence of predictions, and calculate custom evaluation metrics that take into account the class probabilities.

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
import numpy as np

# Generate a synthetic multiclass classification dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_features=10, n_informative=6,
                           n_redundant=2, random_state=42, shuffle=False)

# Initialize a RandomForestClassifier with OOB estimation enabled
rf = RandomForestClassifier(n_estimators=100, oob_score=True, random_state=42)

# Fit the classifier on the dataset
rf.fit(X, y)

# Access the oob_decision_function_ attribute
oob_decision_function = rf.oob_decision_function_

# Print the shape of the oob_decision_function_ array
print(f"Shape of oob_decision_function_: {oob_decision_function.shape}")

# Calculate the class probabilities for each OOB sample
oob_probabilities = rf.oob_decision_function_ / rf.oob_decision_function_.sum(axis=1)[:, np.newaxis]

# Print the OOB class probabilities for the first few samples
print("OOB class probabilities:")
print(oob_probabilities[:5])

Running the example gives an output like:

Shape of oob_decision_function_: (1000, 3)
OOB class probabilities:
[[0.55555556 0.11111111 0.33333333]
 [0.75862069 0.03448276 0.20689655]
 [0.74418605 0.13953488 0.11627907]
 [0.31578947 0.34210526 0.34210526]
 [0.7        0.13333333 0.16666667]]

The key steps in this example are:

Generate a synthetic multiclass classification dataset using make_classification.
Initialize a RandomForestClassifier with oob_score=True to enable OOB estimation and fit it on the dataset.
Access the oob_decision_function_ attribute, which contains the class probabilities or decision function values for each OOB sample.
Print the shape of the oob_decision_function_ array to understand its structure.
Calculate the class probabilities for each OOB sample by normalizing the decision function values and print the probabilities for the first few samples.

See Also