Get RandomForestClassifier "estimators_" Attribute

The RandomForestClassifier in scikit-learn is a powerful ensemble learning algorithm that combines multiple decision trees to make predictions. It builds a forest of decision trees, each trained on a random subset of the data and features, and aggregates their predictions to make the final classification.

The estimators_ attribute of a fitted RandomForestClassifier is a list that stores all the individual decision tree classifiers that make up the forest. Each element in this list is a fitted DecisionTreeClassifier object that has been trained on a bootstrap sample of the original training set.

Accessing the estimators_ attribute can be useful for understanding and visualizing how the random forest makes its predictions. By inspecting individual trees, you can gain insights into which features are most important and how the trees partition the feature space. Additionally, you can evaluate the performance of individual trees to identify strong and weak learners in the ensemble.

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=4, n_informative=2, n_redundant=0,
                           random_state=42, shuffle=False)

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize a RandomForestClassifier with a small number of trees
rf = RandomForestClassifier(n_estimators=10, random_state=42)

# Fit the classifier on the training data
rf.fit(X_train, y_train)

# Access the estimators_ attribute and print some information about the first decision tree
first_tree = rf.estimators_[0]
print(f"Depth of the first tree: {first_tree.get_depth()}")
print(f"Number of leaves in the first tree: {first_tree.get_n_leaves()}")

# Calculate and print the accuracy of the first decision tree on the test set
first_tree_accuracy = first_tree.score(X_test, y_test)
print(f"Accuracy of the first tree on the test set: {first_tree_accuracy:.3f}")

Running the example gives an output like:

Depth of the first tree: 15
Number of leaves in the first tree: 66
Accuracy of the first tree on the test set: 0.890

The key steps in this example are:

Generate a synthetic binary classification dataset using make_classification and split it into train and test sets.
Initialize a RandomForestClassifier with a small number of trees for demonstration purposes and fit it on the training data.
Access the estimators_ attribute to get the list of fitted decision trees and store the first tree in the first_tree variable.
Print the depth and number of leaves of the first decision tree using the get_depth() and get_n_leaves() methods.
Calculate the accuracy of the first decision tree on the test set using the score() method and print the result.

See Also