SKLearner Home | About | Contact | Examples

Configure GradientBoostingRegressor "learning_rate" Parameter

The learning_rate parameter in scikit-learn’s GradientBoostingRegressor controls the contribution of each tree to the final model.

Gradient Boosting is an ensemble learning method that builds models sequentially, with each new model attempting to correct errors made by the previous models. The learning_rate parameter determines the weight of each individual tree’s prediction in the final ensemble.

Generally, using a lower learning_rate value leads to a more robust model but requires more trees to achieve the same performance. Higher learning_rate values can speed up training but may lead to overfitting if too high.

The default value for learning_rate is 0.1.

In practice, values between 0.01 and 0.3 are commonly used depending on the size and complexity of the dataset.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different learning_rate values
learning_rate_values = [0.01, 0.1, 0.2, 0.3]
errors = []

for lr in learning_rate_values:
    gbr = GradientBoostingRegressor(learning_rate=lr, random_state=42)
    gbr.fit(X_train, y_train)
    y_pred = gbr.predict(X_test)
    error = mean_squared_error(y_test, y_pred)
    errors.append(error)
    print(f"learning_rate={lr}, Mean Squared Error: {error:.3f}")

Running the example gives an output like:

learning_rate=0.01, Mean Squared Error: 8323.362
learning_rate=0.1, Mean Squared Error: 1234.753
learning_rate=0.2, Mean Squared Error: 1002.691
learning_rate=0.3, Mean Squared Error: 1130.817

The key steps in this example are:

  1. Generate a synthetic regression dataset with relevant features
  2. Split the data into train and test sets
  3. Train GradientBoostingRegressor models with different learning_rate values
  4. Evaluate the mean squared error of each model on the test set

Some tips and heuristics for setting learning_rate:

Issues to consider:



See Also