Configure HistGradientBoostingClassifier "l2_regularization" Parameter

The l2_regularization parameter in scikit-learn’s HistGradientBoostingClassifier controls the strength of L2 regularization applied to the model’s leaf values.

L2 regularization, also known as ridge regularization, adds a penalty term to the loss function that is proportional to the square of the feature weights. This helps prevent overfitting by discouraging large weights.

Increasing l2_regularization makes the model more conservative, potentially reducing overfitting at the cost of underfitting if set too high. Decreasing it allows the model to fit the training data more closely, which may improve performance on simple datasets but risks overfitting on complex ones.

The default value for l2_regularization is 0.0, which means no regularization is applied.

In practice, values between 0.01 and 10 are commonly used, depending on the dataset’s characteristics and the desired trade-off between bias and variance.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import accuracy_score, log_loss
import numpy as np

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different l2_regularization values
l2_values = [0.0, 0.1, 1.0, 10.0]
results = []

for l2 in l2_values:
    hgbc = HistGradientBoostingClassifier(l2_regularization=l2, random_state=42)
    hgbc.fit(X_train, y_train)
    y_pred = hgbc.predict(X_test)
    y_pred_proba = hgbc.predict_proba(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    logloss = log_loss(y_test, y_pred_proba)
    results.append((l2, accuracy, logloss))
    print(f"l2_regularization={l2}, Accuracy: {accuracy:.3f}, Log Loss: {logloss:.3f}")

Running the example gives an output like:

l2_regularization=0.0, Accuracy: 0.925, Log Loss: 0.170
l2_regularization=0.1, Accuracy: 0.925, Log Loss: 0.180
l2_regularization=1.0, Accuracy: 0.920, Log Loss: 0.182
l2_regularization=10.0, Accuracy: 0.905, Log Loss: 0.229

The key steps in this example are:

Generate a synthetic binary classification dataset with informative and noise features
Split the data into train and test sets
Train HistGradientBoostingClassifier models with different l2_regularization values
Evaluate each model’s accuracy and log loss on the test set

Some tips and heuristics for setting l2_regularization:

Start with the default value of 0.0 and gradually increase it if overfitting is observed
Use cross-validation to find the optimal value for your specific dataset
Consider the trade-off between model complexity and regularization strength

Issues to consider:

The optimal regularization strength depends on the dataset’s size, complexity, and noise level
Too little regularization may lead to overfitting, while too much can cause underfitting
The effect of L2 regularization may be less pronounced for datasets with few features or large sample sizes

See Also