Configure HistGradientBoostingClassifier "scoring" Parameter

The scoring parameter in scikit-learn’s HistGradientBoostingClassifier determines the metric used to evaluate the model’s performance during training and early stopping.

HistGradientBoostingClassifier is an implementation of gradient boosting that uses histogram-based decision trees. It’s designed for efficiency and can handle large datasets.

The scoring parameter allows you to specify which metric to optimize during training. This affects how the model is fit and can lead to different final models depending on the chosen metric.

By default, scoring is set to ’loss’, which uses the model’s built-in loss function. Common alternatives include ‘accuracy’, ‘f1’, ‘roc_auc’, and ‘average_precision’.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score

# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different scoring metrics
scoring_metrics = ['loss', 'accuracy', 'roc_auc']
results = {}

for metric in scoring_metrics:
    clf = HistGradientBoostingClassifier(scoring=metric, random_state=42)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    y_pred_proba = clf.predict_proba(X_test)[:, 1]

    results[metric] = {
        'accuracy': accuracy_score(y_test, y_pred),
        'f1': f1_score(y_test, y_pred),
        'roc_auc': roc_auc_score(y_test, y_pred_proba)
    }

for metric, scores in results.items():
    print(f"Scoring: {metric}")
    for score_name, score_value in scores.items():
        print(f"  {score_name}: {score_value:.3f}")

Running the example gives an output like:

Scoring: loss
  accuracy: 0.943
  f1: 0.941
  roc_auc: 0.983
Scoring: accuracy
  accuracy: 0.943
  f1: 0.941
  roc_auc: 0.983
Scoring: roc_auc
  accuracy: 0.943
  f1: 0.941
  roc_auc: 0.983

The key steps in this example are:

Generate a synthetic binary classification dataset
Split the data into train and test sets
Train HistGradientBoostingClassifier models with different scoring metrics
Evaluate each model using multiple performance metrics
Compare the results to see how different scoring metrics affect overall performance

Tips and heuristics for choosing the scoring metric:

Choose a metric that aligns with your problem’s goals (e.g., ‘roc_auc’ for ranking, ‘f1’ for balanced precision and recall)
Consider the class balance in your dataset when selecting a metric
Use cross-validation to get a more robust estimate of performance across different scoring metrics

Issues to consider:

Different scoring metrics may lead to models with varying trade-offs between precision and recall
The choice of scoring metric can significantly impact the final model’s performance characteristics
Some metrics may be more sensitive to class imbalance than others
The computational cost may vary depending on the chosen scoring metric

See Also