SKLearner Home | About | Contact | Examples

Configure ExtraTreesClassifier "min_impurity_decrease" Parameter

The min_impurity_decrease parameter in scikit-learn’s ExtraTreesClassifier controls the threshold for node splitting based on the decrease in impurity.

ExtraTreesClassifier is an ensemble method that builds multiple decision trees using bootstrap samples and random feature selection. It differs from Random Forest in how it selects split points, leading to increased randomness and often better generalization.

The min_impurity_decrease parameter sets a minimum threshold for the impurity decrease required to split a node. If the impurity decrease is below this threshold, the node will not be split further, effectively pruning the tree.

The default value for min_impurity_decrease is 0.0, which means no early stopping based on impurity decrease. In practice, small positive values (e.g., 1e-7 to 1e-3) are often used to control tree growth and prevent overfitting.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different min_impurity_decrease values
min_impurity_values = [0.0, 1e-5, 1e-4, 1e-3, 1e-2]
accuracies = []

for value in min_impurity_values:
    etc = ExtraTreesClassifier(n_estimators=100, min_impurity_decrease=value, random_state=42)
    etc.fit(X_train, y_train)
    y_pred = etc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"min_impurity_decrease={value}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

min_impurity_decrease=0.0, Accuracy: 0.925
min_impurity_decrease=1e-05, Accuracy: 0.915
min_impurity_decrease=0.0001, Accuracy: 0.920
min_impurity_decrease=0.001, Accuracy: 0.905
min_impurity_decrease=0.01, Accuracy: 0.800

The key steps in this example are:

  1. Generate a synthetic binary classification dataset with informative and noise features
  2. Split the data into train and test sets
  3. Train ExtraTreesClassifier models with different min_impurity_decrease values
  4. Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting min_impurity_decrease:

Issues to consider:



See Also