Configure HistGradientBoostingClassifier "interaction_cst" Parameter

The interaction_cst parameter in scikit-learn’s HistGradientBoostingClassifier controls which features are allowed to interact in the tree-based model.

Histogram-based Gradient Boosting is an efficient implementation of gradient boosting that uses binning to speed up training. The interaction_cst parameter allows you to specify constraints on feature interactions, potentially improving model interpretability and performance.

By default, interaction_cst is set to None, allowing all features to interact freely. Common configurations include pairwise constraints or more complex groupings based on domain knowledge.

Specifying interaction constraints can lead to more interpretable models and may improve generalization by reducing overfitting, especially when there’s prior knowledge about feature relationships.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=4,
                           n_redundant=0, n_clusters_per_class=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different interaction_cst configurations
configs = [
    ("No constraints", None),
    ("Pairwise constraints", [[0, 1], [2, 3], [3, 4]]),
    ("Complex constraints", [[0, 1, 2], [2, 3, 4]])
]

for name, interaction_cst in configs:
    model = HistGradientBoostingClassifier(interaction_cst=interaction_cst, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"{name}: Accuracy = {accuracy:.3f}")

Running the example gives an output like:

No constraints: Accuracy = 0.905
Pairwise constraints: Accuracy = 0.790
Complex constraints: Accuracy = 0.855

The key steps in this example are:

Generate a synthetic classification dataset with potentially interacting features
Split the data into train and test sets
Train HistGradientBoostingClassifier models with different interaction_cst configurations
Evaluate the accuracy of each model on the test set

Some tips for effectively using interaction_cst:

Use domain knowledge to identify potential feature interactions
Start with pairwise constraints and gradually increase complexity
Monitor performance changes when adding or removing constraints

Issues to consider when using interaction_cst:

Constraints can improve interpretability but may limit model flexibility
Overly strict constraints might lead to underfitting
The impact on training time depends on the number and complexity of constraints

See Also