SKLearner Home | About | Contact | Examples

Configure GradientBoostingRegressor "max_features" Parameter

The max_features parameter in scikit-learn’s GradientBoostingRegressor controls the number of features to consider when looking for the best split.

GradientBoostingRegressor builds an ensemble of trees for regression tasks by iteratively reducing residual errors. The max_features parameter specifies the number of features to consider at each split point.

Generally, setting max_features can help in reducing overfitting by limiting the number of features considered for splitting. However, smaller values may increase bias, so a balance is needed.

The default value for max_features is None, meaning all features are considered. Common values include “sqrt” and “log2”.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different max_features values
max_features_values = [None, "sqrt", "log2"]
errors = []

for mf in max_features_values:
    gbr = GradientBoostingRegressor(max_features=mf, random_state=42)
    gbr.fit(X_train, y_train)
    y_pred = gbr.predict(X_test)
    error = mean_squared_error(y_test, y_pred)
    errors.append(error)
    print(f"max_features={mf}, Mean Squared Error: {error:.3f}")

Running the example gives an output like:

max_features=None, Mean Squared Error: 1234.753
max_features=sqrt, Mean Squared Error: 1026.690
max_features=log2, Mean Squared Error: 1026.690

The key steps in this example are:

  1. Generate a synthetic regression dataset with a defined number of features and noise.
  2. Split the data into training and testing sets.
  3. Train GradientBoostingRegressor models with various max_features values.
  4. Evaluate the mean squared error of each model on the test set.

Some tips and heuristics for setting max_features:

Issues to consider:



See Also