SKLearner Home | About | Contact | Examples

Configure HistGradientBoostingClassifier "loss" Parameter

The loss parameter in scikit-learn’s HistGradientBoostingClassifier determines the loss function used to fit the model.

HistGradientBoostingClassifier is a gradient boosting algorithm that uses histogram-based decision trees. It’s designed for efficiency and can handle large datasets with high-dimensional features.

The loss parameter specifies the objective function used during training. Different loss functions can affect the model’s performance and behavior, especially in handling imbalanced datasets or outliers.

The default value for loss is ’log_loss’ for binary and multi-class classification.

In practice, ’log_loss’ is commonly used for balanced datasets.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import accuracy_score, log_loss

# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different loss functions
loss_functions = ['log_loss']
results = []

for loss in loss_functions:
    clf = HistGradientBoostingClassifier(loss=loss, random_state=42)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    y_pred_proba = clf.predict_proba(X_test)

    accuracy = accuracy_score(y_test, y_pred)
    logloss = log_loss(y_test, y_pred_proba)

    results.append((loss, accuracy, logloss))
    print(f"Loss: {loss}, Accuracy: {accuracy:.4f}, Log Loss: {logloss:.4f}")

Running the example gives an output like:

Loss: log_loss, Accuracy: 0.9425, Log Loss: 0.1669

The key steps in this example are:

  1. Generate a synthetic binary classification dataset
  2. Split the data into train and test sets
  3. Train HistGradientBoostingClassifier models with different loss functions
  4. Evaluate each model’s accuracy and log loss on the test set

Some tips and heuristics for choosing the loss parameter:

Issues to consider:



See Also