SKLearner Home | About | Contact | Examples

Configure GradientBoostingRegressor "init" Parameter

The init parameter in scikit-learn’s GradientBoostingRegressor allows you to set an initial model for the boosting process.

Gradient Boosting is a machine learning technique for regression and classification problems, which builds models sequentially to correct the errors of the previous models. The init parameter determines the initial model that the boosting process starts with.

The default value for init is None, which means the initial model is a simple mean prediction for regression problems. Common values for init include other regressors like DummyRegressor or LinearRegression.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.dummy import DummyRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define initial models
init_models = [None, DummyRegressor(strategy="mean"), LinearRegression()]
init_model_names = ["None", "DummyRegressor", "LinearRegression"]

for init, name in zip(init_models, init_model_names):
    gbr = GradientBoostingRegressor(init=init, random_state=42)
    gbr.fit(X_train, y_train)
    y_pred = gbr.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    print(f"init={name}, Mean Squared Error: {mse:.3f}")

Running the example gives an output like:

init=None, Mean Squared Error: 1234.753
init=DummyRegressor, Mean Squared Error: 1234.753
init=LinearRegression, Mean Squared Error: 0.010

The key steps in this example are:

  1. Generate a synthetic regression dataset.
  2. Split the data into train and test sets.
  3. Train GradientBoostingRegressor models with different init values.
  4. Evaluate the mean squared error of each model on the test set.

Some tips and heuristics for setting init:

Issues to consider:



See Also