SKLearner Home | About | Contact | Examples

Configure HistGradientBoostingRegressor "l2_regularization" Parameter

The l2_regularization parameter in scikit-learn’s HistGradientBoostingRegressor controls the strength of L2 regularization applied to the model’s leaf values.

HistGradientBoostingRegressor is a histogram-based gradient boosting algorithm that builds an ensemble of decision trees sequentially. It’s designed for efficiency and can handle large datasets.

The l2_regularization parameter adds a penalty term to the loss function, discouraging large leaf values. This helps prevent overfitting by reducing the model’s complexity.

The default value for l2_regularization is 0, which means no regularization. Typical values range from 0 to 1, but can be higher depending on the dataset and problem.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different l2_regularization values
l2_values = [0, 0.1, 1, 10]
mse_scores = []

for l2 in l2_values:
    hgbr = HistGradientBoostingRegressor(l2_regularization=l2, random_state=42)
    hgbr.fit(X_train, y_train)
    y_pred = hgbr.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"l2_regularization={l2}, MSE: {mse:.3f}")

# Find best l2_regularization value
best_l2 = l2_values[np.argmin(mse_scores)]
print(f"Best l2_regularization value: {best_l2}")

Running the example gives an output like:

l2_regularization=0, MSE: 3073.589
l2_regularization=0.1, MSE: 3053.149
l2_regularization=1, MSE: 3327.394
l2_regularization=10, MSE: 3430.270
Best l2_regularization value: 0.1

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Train HistGradientBoostingRegressor models with different l2_regularization values
  4. Evaluate the mean squared error of each model on the test set
  5. Identify the best l2_regularization value based on the lowest MSE

Tips for setting l2_regularization:

Issues to consider:



See Also