SKLearner Home | About | Contact | Examples

Configure ExtraTreesClassifier "max_leaf_nodes" Parameter

The max_leaf_nodes parameter in scikit-learn’s ExtraTreesClassifier controls the maximum number of leaf nodes in each tree, effectively limiting the complexity of the model.

ExtraTreesClassifier is an ensemble method that builds multiple decision trees using random subsets of features and samples. It differs from Random Forest in its tree-building process, which introduces more randomness.

The max_leaf_nodes parameter sets an upper bound on the number of leaf nodes in each tree. This can help prevent overfitting by limiting the depth and complexity of individual trees in the ensemble.

By default, max_leaf_nodes is set to None, which allows trees to grow until all leaves are pure or contain fewer than min_samples_split samples. Common values range from 10 to several hundred, depending on the dataset size and complexity.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different max_leaf_nodes values
max_leaf_nodes_values = [10, 50, 100, None]
accuracies = []

for max_nodes in max_leaf_nodes_values:
    etc = ExtraTreesClassifier(n_estimators=100, max_leaf_nodes=max_nodes, random_state=42)
    etc.fit(X_train, y_train)
    y_pred = etc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"max_leaf_nodes={max_nodes}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

max_leaf_nodes=10, Accuracy: 0.765
max_leaf_nodes=50, Accuracy: 0.810
max_leaf_nodes=100, Accuracy: 0.830
max_leaf_nodes=None, Accuracy: 0.845

The key steps in this example are:

  1. Generate a synthetic multi-class classification dataset
  2. Split the data into train and test sets
  3. Train ExtraTreesClassifier models with different max_leaf_nodes values
  4. Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting max_leaf_nodes:

Issues to consider:



See Also