SKLearner Home | About | Contact | Examples

Configure HistGradientBoostingRegressor "max_depth" Parameter

The max_depth parameter in scikit-learn’s HistGradientBoostingRegressor controls the maximum depth of the trees in the ensemble.

HistGradientBoostingRegressor is an efficient implementation of gradient boosting that uses histogram-based algorithms for faster training. It builds an ensemble of decision trees sequentially, with each tree correcting the errors of the previous ones.

The max_depth parameter limits how deep each tree can grow. Deeper trees can capture more complex patterns but may lead to overfitting, while shallower trees may generalize better but might underfit.

By default, max_depth is set to None, allowing trees to grow until all leaves are pure or contain less than min_samples_leaf samples. Common values range from 3 to 10, depending on the complexity of the data.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different max_depth values
max_depth_values = [None, 3, 5, 10]
mse_scores = []

for depth in max_depth_values:
    hgbr = HistGradientBoostingRegressor(max_depth=depth, random_state=42)
    hgbr.fit(X_train, y_train)
    y_pred = hgbr.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"max_depth={depth}, MSE: {mse:.3f}")

Running the example gives an output like:

max_depth=None, MSE: 1023.074
max_depth=3, MSE: 1122.012
max_depth=5, MSE: 984.290
max_depth=10, MSE: 1044.807

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Train HistGradientBoostingRegressor models with different max_depth values
  4. Evaluate the mean squared error of each model on the test set

Some tips and heuristics for setting max_depth:

Issues to consider:



See Also