SKLearner Home | About | Contact | Examples

Configure HistGradientBoostingClassifier "scoring" Parameter

The scoring parameter in scikit-learn’s HistGradientBoostingClassifier determines the metric used to evaluate the model’s performance during training and early stopping.

HistGradientBoostingClassifier is an implementation of gradient boosting that uses histogram-based decision trees. It’s designed for efficiency and can handle large datasets.

The scoring parameter allows you to specify which metric to optimize during training. This affects how the model is fit and can lead to different final models depending on the chosen metric.

By default, scoring is set to ’loss’, which uses the model’s built-in loss function. Common alternatives include ‘accuracy’, ‘f1’, ‘roc_auc’, and ‘average_precision’.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score

# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different scoring metrics
scoring_metrics = ['loss', 'accuracy', 'roc_auc']
results = {}

for metric in scoring_metrics:
    clf = HistGradientBoostingClassifier(scoring=metric, random_state=42)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    y_pred_proba = clf.predict_proba(X_test)[:, 1]

    results[metric] = {
        'accuracy': accuracy_score(y_test, y_pred),
        'f1': f1_score(y_test, y_pred),
        'roc_auc': roc_auc_score(y_test, y_pred_proba)
    }

for metric, scores in results.items():
    print(f"Scoring: {metric}")
    for score_name, score_value in scores.items():
        print(f"  {score_name}: {score_value:.3f}")

Running the example gives an output like:

Scoring: loss
  accuracy: 0.943
  f1: 0.941
  roc_auc: 0.983
Scoring: accuracy
  accuracy: 0.943
  f1: 0.941
  roc_auc: 0.983
Scoring: roc_auc
  accuracy: 0.943
  f1: 0.941
  roc_auc: 0.983

The key steps in this example are:

  1. Generate a synthetic binary classification dataset
  2. Split the data into train and test sets
  3. Train HistGradientBoostingClassifier models with different scoring metrics
  4. Evaluate each model using multiple performance metrics
  5. Compare the results to see how different scoring metrics affect overall performance

Tips and heuristics for choosing the scoring metric:

Issues to consider:



See Also