The RandomForestClassifier
in scikit-learn is a powerful ensemble learning algorithm that combines multiple decision trees to make predictions. It builds a forest of decision trees, each trained on a random subset of the data and features, and aggregates their predictions to make the final classification.
The estimators_
attribute of a fitted RandomForestClassifier
(formally base_estimator_
) is the child algorithm used to create the ensemble of models.
Accessing the estimators_
attribute can be useful for checking which base estimator was used in the construction of the model.
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=4, n_informative=2, n_redundant=0,
random_state=42, shuffle=False)
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize a RandomForestClassifier with a small number of trees
rf = RandomForestClassifier(n_estimators=10, random_state=42)
# Fit the classifier on the training data
rf.fit(X_train, y_train)
# Access the estimator_ attribute
first_tree = rf.estimator
# Report base estimator parameters
print(first_tree.get_params())
Running the example gives an output like:
{'ccp_alpha': 0.0, 'class_weight': None, 'criterion': 'gini', 'max_depth': None, 'max_features': None, 'max_leaf_nodes': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'monotonic_cst': None, 'random_state': None, 'splitter': 'best'}
The key steps in this example are:
- Generate a synthetic binary classification dataset using
make_classification
and split it into train and test sets. - Initialize a
RandomForestClassifier
with a small number of trees for demonstration purposes and fit it on the training data. - Access the
estimator_
attribute to get the base estimator used in the ensemble. - Access and print the configuration details of the base estimator.