SKLearner Home | About | Contact | Examples

Configure GradientBoostingRegressor "n_estimators" Parameter

The n_estimators parameter in scikit-learn’s GradientBoostingRegressor controls the number of boosting stages (trees) in the ensemble.

Gradient Boosting is an ensemble technique that builds trees sequentially to minimize the residual errors of previous trees. The n_estimators parameter determines how many boosting stages are run.

Generally, using more boosting stages reduces the model’s bias and can improve performance, but it also increases the risk of overfitting and computational cost. The default value for n_estimators is 100.

In practice, values between 100 and 1000 are commonly used depending on the dataset size and complexity.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different n_estimators values
n_estimators_values = [50, 100, 200, 500]
mse_values = []

for n in n_estimators_values:
    gbr = GradientBoostingRegressor(n_estimators=n, random_state=42)
    gbr.fit(X_train, y_train)
    y_pred = gbr.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_values.append(mse)
    print(f"n_estimators={n}, MSE: {mse:.3f}")

Running the example gives an output like:

n_estimators=50, MSE: 2394.832
n_estimators=100, MSE: 1234.753
n_estimators=200, MSE: 881.858
n_estimators=500, MSE: 776.403

The key steps in this example are:

  1. Generate a synthetic regression dataset with informative and noise features.
  2. Split the data into train and test sets.
  3. Train GradientBoostingRegressor models with different n_estimators values.
  4. Evaluate the mean squared error of each model on the test set.

Some tips and heuristics for setting n_estimators:

Issues to consider:



See Also