SKLearner Home | About | Contact | Examples

Configure GradientBoostingRegressor "criterion" Parameter

The criterion parameter in scikit-learn’s GradientBoostingRegressor controls the function to measure the quality of a split.

Gradient Boosting is an ensemble learning method that builds a predictive model by combining multiple weak learners, typically decision trees, in a stage-wise fashion. The criterion parameter specifies the metric used to evaluate the quality of a split in the decision trees.

The default value for criterion is “friedman_mse”.

In practice, common values for criterion include “friedman_mse” and “squared_error”. “friedman_mse” is usually preferred due to its efficiency and performance, while “squared_error” can be useful in different scenarios depending on the data characteristics.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different criterion values
criterion_values = ['friedman_mse', 'squared_error']
errors = []

for criterion in criterion_values:
    gbr = GradientBoostingRegressor(criterion=criterion, random_state=42)
    gbr.fit(X_train, y_train)
    y_pred = gbr.predict(X_test)
    error = mean_squared_error(y_test, y_pred)
    errors.append(error)
    print(f"criterion={criterion}, MSE: {error:.3f}")

Running the example gives an output like:

criterion=friedman_mse, MSE: 1234.753
criterion=squared_error, MSE: 1234.753

The key steps in this example are:

  1. Generate a synthetic regression dataset.
  2. Split the data into train and test sets.
  3. Train GradientBoostingRegressor models with different criterion values.
  4. Evaluate and compare the mean squared error for each model on the test set.

Some tips and heuristics for setting criterion:

Issues to consider:



See Also