SKLearner Home | About | Contact | Examples

Configure GradientBoostingClassifier "max_depth" Parameter

The max_depth parameter in scikit-learn’s GradientBoostingClassifier controls the maximum depth of the individual regression estimators (decision trees) in the ensemble.

Gradient Boosting is an ensemble learning method that sequentially trains decision trees to minimize the residual error of the previous trees. The max_depth parameter determines the complexity of each tree.

Smaller max_depth values (shallower trees) lead to a simpler model that may underfit, while larger values (deeper trees) increase the model’s complexity and risk of overfitting. The optimal value depends on the dataset.

The default max_depth value is 3.

In practice, values between 3 and 10 are commonly used, with 5 being a good starting point for many datasets.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_features=10,
                           n_informative=5, n_redundant=0, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different max_depth values
max_depth_values = [2, 3, 5, 10]
accuracies = []

for depth in max_depth_values:
    gb = GradientBoostingClassifier(max_depth=depth, random_state=42)
    gb.fit(X_train, y_train)
    y_pred = gb.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"max_depth={depth}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

max_depth=2, Accuracy: 0.795
max_depth=3, Accuracy: 0.835
max_depth=5, Accuracy: 0.850
max_depth=10, Accuracy: 0.780

The key steps in this example are:

  1. Generate a synthetic multiclass classification dataset with informative and noise features
  2. Split the data into train and test sets
  3. Train GradientBoostingClassifier models with different max_depth values
  4. Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting max_depth:

Issues to consider:



See Also