ExtraTreeClassifier
is an extremely randomized tree algorithm used for classification problems. It builds a single tree where splits are chosen randomly, which can result in high variance but also simplicity in the model.
The key hyperparameters of ExtraTreeClassifier
include the criterion
(which measures the quality of a split), splitter
(which strategy to use when splitting a node), and max_depth
(the maximum depth of the tree).
The algorithm is appropriate for various classification tasks.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.tree import ExtraTreeClassifier
from sklearn.metrics import accuracy_score
# generate a synthetic binary classification dataset
X, y = make_classification(n_samples=100, n_features=5, n_classes=2, random_state=1)
# split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create the ExtraTreeClassifier model
model = ExtraTreeClassifier()
# fit the model on the training set
model.fit(X_train, y_train)
# make predictions on the test set
yhat = model.predict(X_test)
acc = accuracy_score(y_test, yhat)
print('Accuracy: %.3f' % acc)
# make a prediction on a new sample
row = [[-1.10325445, -0.49821356, -0.05962247, -0.89224592, -0.70158632]]
yhat = model.predict(row)
print('Predicted: %d' % yhat[0])
Running the example gives an output like:
Accuracy: 0.900
Predicted: 0
The steps are as follows:
Generate a synthetic binary classification dataset using
make_classification()
with specific parameters for reproducibility. The dataset is split into training and test sets usingtrain_test_split()
.Instantiate an
ExtraTreeClassifier
model with default hyperparameters. The model is then fit on the training data using thefit()
method.Evaluate the model’s performance by predicting the test set and calculating the accuracy score.
Make a single prediction on a new data sample using the
predict()
method.
This example demonstrates the basic steps to set up and use ExtraTreeClassifier
for a simple binary classification task, highlighting the ease of use and performance capabilities of this scikit-learn algorithm.