SKLearner Home | About | Contact | Examples

Configure ExtraTreesClassifier "criterion" Parameter

The criterion parameter in scikit-learn’s ExtraTreesClassifier determines the function used to measure the quality of a split.

ExtraTreesClassifier is an ensemble method that fits a number of randomized decision trees (extra-trees) on various sub-samples of the dataset. It uses averaging to improve predictive accuracy and control over-fitting.

The criterion parameter affects how the algorithm decides on the best split at each node. It influences the tree structure and, consequently, the model’s performance and generalization ability.

The default value for criterion is “gini”. The main alternative is “entropy”, which uses information gain instead of the Gini impurity.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different criterion values
criterion_values = ['gini', 'entropy']
accuracies = []

for criterion in criterion_values:
    etc = ExtraTreesClassifier(n_estimators=100, criterion=criterion, random_state=42)
    etc.fit(X_train, y_train)
    y_pred = etc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"criterion={criterion}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

criterion=gini, Accuracy: 0.845
criterion=entropy, Accuracy: 0.870

The key steps in this example are:

  1. Generate a synthetic multi-class classification dataset
  2. Split the data into train and test sets
  3. Train ExtraTreesClassifier models with different criterion values
  4. Evaluate the accuracy of each model on the test set

Tips and heuristics for setting criterion:

Issues to consider:



See Also