Configure HistGradientBoostingClassifier "class_weight" Parameter

The class_weight parameter in scikit-learn’s HistGradientBoostingClassifier helps address class imbalance issues in classification tasks.

HistGradientBoostingClassifier is a fast, histogram-based implementation of gradient boosting. It builds an ensemble of decision trees sequentially, with each tree correcting errors made by the previous ones.

The class_weight parameter adjusts the importance of classes during training. It can help the model pay more attention to minority classes, improving overall performance on imbalanced datasets.

By default, class_weight is set to None, treating all classes equally. Common options include ‘balanced’ (automatically adjusts weights inversely proportional to class frequencies) or a dictionary specifying custom weights for each class.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import f1_score

# Generate imbalanced dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[0.9, 0.1],
                           n_informative=3, n_redundant=1, flip_y=0, random_state=42)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train models with different class_weight settings
class_weights = [None, 'balanced', {0: 1, 1: 9}]
for weights in class_weights:
    clf = HistGradientBoostingClassifier(class_weight=weights, random_state=42)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    f1 = f1_score(y_test, y_pred)
    print(f"class_weight={weights}, F1 Score: {f1:.3f}")

Running this example produces output similar to:

class_weight=None, F1 Score: 0.852
class_weight=balanced, F1 Score: 0.836
class_weight={0: 1, 1: 9}, F1 Score: 0.836

Key steps in this example:

Generate an imbalanced binary classification dataset
Split the data into train and test sets
Train HistGradientBoostingClassifier models with different class_weight settings
Evaluate each model’s performance using F1 score

Tips for setting class_weight:

Use ‘balanced’ when you want automatic weight calculation
Calculate custom weights as (1 - fraction_of_samples) for each class
Always use cross-validation when tuning class_weight

Issues to consider:

Adjusting class weights may increase training time
Extreme weight adjustments can lead to overfitting
There’s often a trade-off between precision and recall when modifying class weights

See Also