Get RandomForestClassifier "classes_" Attribute

The RandomForestClassifier in scikit-learn is an ensemble learning algorithm that combines multiple decision trees to make predictions. It builds a forest of decision trees, each trained on a random subset of the data and features, and aggregates their predictions to make the final classification.

The classes_ attribute of a fitted RandomForestClassifier is a numpy array that contains the unique class labels from the training dataset. The order of the class labels in classes_ corresponds to the order of the predicted class probabilities and predicted class labels returned by the predict_proba() and predict() methods, respectively.

Accessing the classes_ attribute can be useful for interpreting the predicted class labels and for creating evaluation metrics like a confusion matrix. By knowing the mapping between the integer-encoded class labels and their original values, you can provide more meaningful outputs and visualizations.

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Generate a synthetic multiclass classification dataset
X, y = make_classification(n_samples=1000, n_clusters_per_class=1, n_classes=3, n_features=4, n_informative=2,
                           n_redundant=0, random_state=42, shuffle=False)

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize a RandomForestClassifier
rf = RandomForestClassifier(n_estimators=100, random_state=42)

# Fit the classifier on the training data
rf.fit(X_train, y_train)

# Access the classes_ attribute and print its value
print(f"Unique class labels: {rf.classes_}")

# Make predictions on the test set
y_pred = rf.predict(X_test)

# Print the predicted class labels
print(f"Predicted class labels: {y_pred}")

Running the example gives an output like:

Unique class labels: [0 1 2]
Predicted class labels: [1 2 2 1 1 2 1 1 2 0 0 0 1 2 2 2 0 2 2 0 1 0 1 1 0 0 0 2 2 2 1 0 1 1 1 0 2
 1 2 0 0 0 0 2 0 0 2 1 1 2 1 0 0 1 0 0 1 0 0 2 0 0 1 0 1 2 1 2 1 1 0 1 0 2
 0 0 1 0 1 0 2 0 0 0 0 1 2 2 2 0 1 1 1 0 2 1 2 0 0 2 2 1 2 0 1 1 1 2 0 2 1
 0 2 2 0 1 0 2 1 0 2 1 2 0 0 1 1 1 2 2 1 2 0 2 0 0 0 0 0 0 0 0 2 0 2 2 2 1
 2 0 1 0 1 1 2 1 0 2 2 2 1 2 1 1 2 2 0 2 0 0 1 1 0 2 1 1 0 1 2 2 1 2 1 1 2
 0 1 1 2 2 2 2 1 0 2 1 1 0 1 1]

The key steps in this example are:

Generate a synthetic multiclass classification dataset using make_classification and split it into train and test sets.
Initialize a RandomForestClassifier and fit it on the training data.
Access the classes_ attribute to get the unique class labels from the training dataset and print its value.
Make predictions on the test set using the fitted classifier.
Print the predicted class labels and observe how they correspond to the class labels in classes_.

See Also