The scoring
parameter in scikit-learn’s HistGradientBoostingClassifier
determines the metric used to evaluate the model’s performance during training and early stopping.
HistGradientBoostingClassifier is an implementation of gradient boosting that uses histogram-based decision trees. It’s designed for efficiency and can handle large datasets.
The scoring
parameter allows you to specify which metric to optimize during training. This affects how the model is fit and can lead to different final models depending on the chosen metric.
By default, scoring
is set to ’loss’, which uses the model’s built-in loss function. Common alternatives include ‘accuracy’, ‘f1’, ‘roc_auc’, and ‘average_precision’.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score
# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
n_redundant=5, n_classes=2, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different scoring metrics
scoring_metrics = ['loss', 'accuracy', 'roc_auc']
results = {}
for metric in scoring_metrics:
clf = HistGradientBoostingClassifier(scoring=metric, random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
y_pred_proba = clf.predict_proba(X_test)[:, 1]
results[metric] = {
'accuracy': accuracy_score(y_test, y_pred),
'f1': f1_score(y_test, y_pred),
'roc_auc': roc_auc_score(y_test, y_pred_proba)
}
for metric, scores in results.items():
print(f"Scoring: {metric}")
for score_name, score_value in scores.items():
print(f" {score_name}: {score_value:.3f}")
Running the example gives an output like:
Scoring: loss
accuracy: 0.943
f1: 0.941
roc_auc: 0.983
Scoring: accuracy
accuracy: 0.943
f1: 0.941
roc_auc: 0.983
Scoring: roc_auc
accuracy: 0.943
f1: 0.941
roc_auc: 0.983
The key steps in this example are:
- Generate a synthetic binary classification dataset
- Split the data into train and test sets
- Train
HistGradientBoostingClassifier
models with different scoring metrics - Evaluate each model using multiple performance metrics
- Compare the results to see how different scoring metrics affect overall performance
Tips and heuristics for choosing the scoring metric:
- Choose a metric that aligns with your problem’s goals (e.g., ‘roc_auc’ for ranking, ‘f1’ for balanced precision and recall)
- Consider the class balance in your dataset when selecting a metric
- Use cross-validation to get a more robust estimate of performance across different scoring metrics
Issues to consider:
- Different scoring metrics may lead to models with varying trade-offs between precision and recall
- The choice of scoring metric can significantly impact the final model’s performance characteristics
- Some metrics may be more sensitive to class imbalance than others
- The computational cost may vary depending on the chosen scoring metric