SKLearner Home | About | Contact | Examples

Configure RandomForestClassifier "criterion" Parameter

The criterion parameter in scikit-learn’s RandomForestClassifier determines the impurity measure used to split nodes when building the decision trees in the forest.

There are two options for this parameter: “gini” for Gini impurity and “entropy” for information gain. Gini impurity measures the probability of misclassifying a randomly chosen element if it were labeled randomly according to the class distribution. Information gain measures the decrease in entropy after splitting a node based on an attribute.

The default value for criterion is “gini”.

In practice, there is often little difference between the two criteria in terms of model performance. Gini impurity is slightly faster to compute, while entropy may create trees that are slightly shorter and easier to interpret.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
                           n_redundant=0, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different criterion values
criteria = ['gini', 'entropy']
accuracies = []

for criterion in criteria:
    rf = RandomForestClassifier(criterion=criterion, random_state=42)
    rf.fit(X_train, y_train)
    y_pred = rf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"criterion={criterion}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

criterion=gini, Accuracy: 0.920
criterion=entropy, Accuracy: 0.920

The key steps in this example are:

  1. Generate a synthetic binary classification dataset
  2. Split the data into train and test sets
  3. Train RandomForestClassifier models with both “gini” and “entropy” criteria
  4. Evaluate the accuracy of each model on the test set

Some tips and heuristics for choosing the criterion:

Issues to consider:



See Also