SKLearner Home | About | Contact | Examples

Configure HistGradientBoostingClassifier "interaction_cst" Parameter

The interaction_cst parameter in scikit-learn’s HistGradientBoostingClassifier controls which features are allowed to interact in the tree-based model.

Histogram-based Gradient Boosting is an efficient implementation of gradient boosting that uses binning to speed up training. The interaction_cst parameter allows you to specify constraints on feature interactions, potentially improving model interpretability and performance.

By default, interaction_cst is set to None, allowing all features to interact freely. Common configurations include pairwise constraints or more complex groupings based on domain knowledge.

Specifying interaction constraints can lead to more interpretable models and may improve generalization by reducing overfitting, especially when there’s prior knowledge about feature relationships.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=4,
                           n_redundant=0, n_clusters_per_class=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different interaction_cst configurations
configs = [
    ("No constraints", None),
    ("Pairwise constraints", [[0, 1], [2, 3], [3, 4]]),
    ("Complex constraints", [[0, 1, 2], [2, 3, 4]])
]

for name, interaction_cst in configs:
    model = HistGradientBoostingClassifier(interaction_cst=interaction_cst, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"{name}: Accuracy = {accuracy:.3f}")

Running the example gives an output like:

No constraints: Accuracy = 0.905
Pairwise constraints: Accuracy = 0.790
Complex constraints: Accuracy = 0.855

The key steps in this example are:

  1. Generate a synthetic classification dataset with potentially interacting features
  2. Split the data into train and test sets
  3. Train HistGradientBoostingClassifier models with different interaction_cst configurations
  4. Evaluate the accuracy of each model on the test set

Some tips for effectively using interaction_cst:

Issues to consider when using interaction_cst:



See Also