Decision Tree is a versatile classification algorithm that can handle both binary and multi-class classification problems. It constructs a tree-like model of decisions based on the input features.
The key hyperparameters of DecisionTreeClassifier
include the criterion
(which measures the quality of a split), max_depth
(the maximum depth of the tree), and min_samples_split
(the minimum number of samples required to split an internal node).
The algorithm is appropriate for binary and multi-class classification problems.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# generate a synthetic dataset
X, y = make_classification(n_samples=100, n_features=5, n_classes=2, random_state=1)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create the DecisionTreeClassifier model
model = DecisionTreeClassifier()
# fit the model on the training data
model.fit(X_train, y_train)
# evaluate the model on the test data
yhat = model.predict(X_test)
acc = accuracy_score(y_test, yhat)
print('Accuracy: %.3f' % acc)
# make a prediction on a new sample
row = [[-1.10325445, -0.49821356, -0.05962247, -0.89224592, -0.70158632]]
yhat = model.predict(row)
print('Predicted: %d' % yhat[0])
Running the example gives an output like:
Accuracy: 0.900
Predicted: 0
The steps are as follows:
Generate synthetic dataset:
- Create a dataset using
make_classification()
, with specified samples and classes. - Split the dataset into training and testing sets using
train_test_split()
.
- Create a dataset using
Create and fit model:
- Instantiate
DecisionTreeClassifier
with default hyperparameters. - Fit the model on the training data using
fit()
.
- Instantiate
Evaluate model:
- Predict on the test data using
predict()
. - Calculate accuracy with
accuracy_score()
.
- Predict on the test data using
Make a single prediction:
- Pass a new sample to
predict()
and print the result.
- Pass a new sample to
This example demonstrates the straightforward application of DecisionTreeClassifier
for classification tasks, highlighting its ease of use and interpretability in scikit-learn.