The RandomForestClassifier
in scikit-learn is an ensemble learning algorithm that combines multiple decision trees to make predictions. It builds a forest of decision trees, each trained on a random subset of the data and features, and aggregates their predictions to make the final classification.
The n_classes_
attribute of a fitted RandomForestClassifier
is an integer that represents the number of unique class labels in the target variable. It is determined during the fitting process based on the training data.
Accessing the n_classes_
attribute can be useful when you need to know the number of classes in your classification problem. This information is often required for downstream tasks such as creating confusion matrices, performing multi-class evaluation, or setting up other classifiers that require the number of classes as a parameter.
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
# Generate a synthetic multiclass classification dataset
X, y = make_classification(n_samples=1000, n_classes=4, n_features=10, n_informative=6,
n_redundant=2, random_state=42, shuffle=False)
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize a RandomForestClassifier
rf = RandomForestClassifier(n_estimators=100, random_state=42)
# Fit the classifier on the training data
rf.fit(X_train, y_train)
# Access the n_classes_ attribute and print its value
print(f"Number of classes: {rf.n_classes_}")
Running the example gives an output like:
Number of classes: 4
The key steps in this example are:
- Generate a synthetic multiclass classification dataset using
make_classification
and split it into train and test sets. - Initialize a
RandomForestClassifier
and fit it on the training data. - Access the
n_classes_
attribute of the fitted classifier and print its value.