RandomForestClassifier is an ensemble classification algorithm that constructs multiple decision trees during training and outputs the mode of the classes predicted by individual trees.
Key hyperparameters include n_estimators
(number of trees), max_depth
(maximum depth of each tree), and min_samples_split
(minimum number of samples required to split a node).
This algorithm is suitable for both binary and multi-class classification problems.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# generate binary classification dataset
X, y = make_classification(n_samples=100, n_features=5, n_classes=2, random_state=1)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create model
model = RandomForestClassifier()
# fit model
model.fit(X_train, y_train)
# evaluate model
yhat = model.predict(X_test)
acc = accuracy_score(y_test, yhat)
print('Accuracy: %.3f' % acc)
# make a prediction
row = [[-1.10325445, -0.49821356, -0.05962247, -0.89224592, -0.70158632]]
yhat = model.predict(row)
print('Predicted: %d' % yhat[0])
Running the example gives an output like:
Accuracy: 0.950
Predicted: 0
Generate a synthetic binary classification dataset using the
make_classification()
function, specifying the number of samples, features, and classes. Use a fixed random seed for reproducibility. Split the dataset into training and testing sets usingtrain_test_split()
.Instantiate a
RandomForestClassifier
with default hyperparameters. Fit the model on the training data using thefit()
method.Evaluate the model by making predictions on the test set and calculating the accuracy score using
accuracy_score
.Make a prediction on a new data sample using the
predict()
method.