SKLearner Home | About | Contact | Examples

Configure HistGradientBoostingRegressor "loss" Parameter

The loss parameter in scikit-learn’s HistGradientBoostingRegressor determines the loss function used to measure the error between predicted and true values during training.

HistGradientBoostingRegressor is a gradient boosting algorithm that uses histogram-based decision trees for faster training. The loss parameter defines how the model penalizes prediction errors.

The loss parameter affects the model’s optimization process and can impact its performance on different types of regression problems.

The default value for loss is ‘squared_error’. Other options include ‘absolute_error’, ‘poisson’, and ‘gamma’.

In practice, ‘squared_error’ is commonly used for general regression tasks, while ‘absolute_error’ may be preferred for robustness to outliers.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
X = np.abs(X)
y = np.abs(y)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different loss functions
loss_functions = ['squared_error', 'absolute_error', 'gamma', 'poisson']
mse_scores = []

for loss in loss_functions:
    model = HistGradientBoostingRegressor(loss=loss, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"Loss function: {loss}, MSE: {mse:.4f}")

# Find best performing loss function
best_loss = loss_functions[np.argmin(mse_scores)]
print(f"\nBest performing loss function: {best_loss}")

Running the example gives an output like:

Loss function: squared_error, MSE: 6025.7505
Loss function: absolute_error, MSE: 5666.4799
Loss function: gamma, MSE: 6378.0855
Loss function: poisson, MSE: 5993.6996

Best performing loss function: absolute_error

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Train HistGradientBoostingRegressor models with different loss functions
  4. Evaluate the mean squared error of each model on the test set
  5. Identify the best performing loss function

Some tips for choosing the appropriate loss function:

Issues to consider:



See Also