SKLearner Home | About | Contact | Examples

Configure DecisionTreeClassifier "criterion" Parameter

The DecisionTreeClassifier is a non-parametric supervised learning algorithm used for classification tasks. It learns decision rules from features to predict the target class.

The criterion parameter determines the function used to measure the quality of a split at each node of the tree. It influences how the tree is built and can impact the model’s performance.

The default value for criterion is "gini", which refers to the Gini impurity. An alternative option is "entropy", which uses information gain as the splitting criterion.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_features=10,
                           n_informative=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different criterion values
criteria = ['gini', 'entropy']
accuracies = []

for criterion in criteria:
    dt = DecisionTreeClassifier(criterion=criterion, random_state=42)
    dt.fit(X_train, y_train)
    y_pred = dt.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"criterion={criterion}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

criterion=gini, Accuracy: 0.785
criterion=entropy, Accuracy: 0.775

The key steps in this example are:

  1. Generate a synthetic multiclass classification dataset
  2. Split the data into train and test sets
  3. Train DecisionTreeClassifier models with "gini" and "entropy" criteria
  4. Evaluate the accuracy of each model on the test set

Tips and heuristics for choosing between "gini" and "entropy":

Issues to consider:



See Also