SKLearner Home | About | Contact | Examples

Configure GradientBoostingClassifier "max_leaf_nodes" Parameter

The max_leaf_nodes parameter in scikit-learn’s GradientBoostingClassifier controls the maximum number of leaf nodes in each individual decision tree.

Gradient Boosting is an ensemble learning method that sequentially adds decision trees to correct the errors made by the previous trees. The max_leaf_nodes parameter limits the complexity of each tree.

Smaller values of max_leaf_nodes lead to shallower trees, reducing the model’s capacity to fit complex relationships. Larger values allow more complex trees, potentially improving performance but also increasing the risk of overfitting.

The default value for max_leaf_nodes is None, which means there is no limit to the number of leaf nodes.

In practice, values between 10 and 100 are commonly used depending on the size and complexity of the dataset.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_informative=5,
                           n_redundant=0, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different max_leaf_nodes values
max_leaf_nodes_values = [10, 50, 100, None]
accuracies = []

for max_leaf in max_leaf_nodes_values:
    gb = GradientBoostingClassifier(max_leaf_nodes=max_leaf, random_state=42)
    gb.fit(X_train, y_train)
    y_pred = gb.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"max_leaf_nodes={max_leaf}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

max_leaf_nodes=10, Accuracy: 0.785
max_leaf_nodes=50, Accuracy: 0.785
max_leaf_nodes=100, Accuracy: 0.785
max_leaf_nodes=None, Accuracy: 0.785

The key steps in this example are:

  1. Generate a synthetic multiclass classification dataset
  2. Split the data into train and test sets
  3. Train GradientBoostingClassifier models with different max_leaf_nodes values
  4. Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting max_leaf_nodes:

Issues to consider:



See Also