The RandomForestClassifier
in scikit-learn is an ensemble learning algorithm that combines multiple decision trees to make predictions. It builds a forest of decision trees, each trained on a random subset of the data and features, and aggregates their predictions to make the final classification.
The classes_
attribute of a fitted RandomForestClassifier
is a numpy array that contains the unique class labels from the training dataset. The order of the class labels in classes_
corresponds to the order of the predicted class probabilities and predicted class labels returned by the predict_proba()
and predict()
methods, respectively.
Accessing the classes_
attribute can be useful for interpreting the predicted class labels and for creating evaluation metrics like a confusion matrix. By knowing the mapping between the integer-encoded class labels and their original values, you can provide more meaningful outputs and visualizations.
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
# Generate a synthetic multiclass classification dataset
X, y = make_classification(n_samples=1000, n_clusters_per_class=1, n_classes=3, n_features=4, n_informative=2,
n_redundant=0, random_state=42, shuffle=False)
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize a RandomForestClassifier
rf = RandomForestClassifier(n_estimators=100, random_state=42)
# Fit the classifier on the training data
rf.fit(X_train, y_train)
# Access the classes_ attribute and print its value
print(f"Unique class labels: {rf.classes_}")
# Make predictions on the test set
y_pred = rf.predict(X_test)
# Print the predicted class labels
print(f"Predicted class labels: {y_pred}")
Running the example gives an output like:
Unique class labels: [0 1 2]
Predicted class labels: [1 2 2 1 1 2 1 1 2 0 0 0 1 2 2 2 0 2 2 0 1 0 1 1 0 0 0 2 2 2 1 0 1 1 1 0 2
1 2 0 0 0 0 2 0 0 2 1 1 2 1 0 0 1 0 0 1 0 0 2 0 0 1 0 1 2 1 2 1 1 0 1 0 2
0 0 1 0 1 0 2 0 0 0 0 1 2 2 2 0 1 1 1 0 2 1 2 0 0 2 2 1 2 0 1 1 1 2 0 2 1
0 2 2 0 1 0 2 1 0 2 1 2 0 0 1 1 1 2 2 1 2 0 2 0 0 0 0 0 0 0 0 2 0 2 2 2 1
2 0 1 0 1 1 2 1 0 2 2 2 1 2 1 1 2 2 0 2 0 0 1 1 0 2 1 1 0 1 2 2 1 2 1 1 2
0 1 1 2 2 2 2 1 0 2 1 1 0 1 1]
The key steps in this example are:
- Generate a synthetic multiclass classification dataset using
make_classification
and split it into train and test sets. - Initialize a
RandomForestClassifier
and fit it on the training data. - Access the
classes_
attribute to get the unique class labels from the training dataset and print its value. - Make predictions on the test set using the fitted classifier.
- Print the predicted class labels and observe how they correspond to the class labels in
classes_
.