Configure GradientBoostingRegressor "max_features" Parameter

The max_features parameter in scikit-learn’s GradientBoostingRegressor controls the number of features to consider when looking for the best split.

GradientBoostingRegressor builds an ensemble of trees for regression tasks by iteratively reducing residual errors. The max_features parameter specifies the number of features to consider at each split point.

Generally, setting max_features can help in reducing overfitting by limiting the number of features considered for splitting. However, smaller values may increase bias, so a balance is needed.

The default value for max_features is None, meaning all features are considered. Common values include “sqrt” and “log2”.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different max_features values
max_features_values = [None, "sqrt", "log2"]
errors = []

for mf in max_features_values:
    gbr = GradientBoostingRegressor(max_features=mf, random_state=42)
    gbr.fit(X_train, y_train)
    y_pred = gbr.predict(X_test)
    error = mean_squared_error(y_test, y_pred)
    errors.append(error)
    print(f"max_features={mf}, Mean Squared Error: {error:.3f}")

Running the example gives an output like:

max_features=None, Mean Squared Error: 1234.753
max_features=sqrt, Mean Squared Error: 1026.690
max_features=log2, Mean Squared Error: 1026.690

The key steps in this example are:

Generate a synthetic regression dataset with a defined number of features and noise.
Split the data into training and testing sets.
Train GradientBoostingRegressor models with various max_features values.
Evaluate the mean squared error of each model on the test set.

Some tips and heuristics for setting max_features:

Default value (None) considers all features and is generally a good starting point.
Use “auto” to let the algorithm decide the best number of features.
“sqrt” and “log2” often help in reducing overfitting by limiting the number of features considered.
Smaller values reduce computation time but may increase bias.

Issues to consider:

The optimal value of max_features is data-dependent and may require experimentation.
Using too few features can cause underfitting, while too many might lead to overfitting.
Balancing between bias, variance, and computational cost is key in choosing max_features.

See Also