The interaction_cst
parameter in scikit-learn’s HistGradientBoostingClassifier
controls which features are allowed to interact in the tree-based model.
Histogram-based Gradient Boosting is an efficient implementation of gradient boosting that uses binning to speed up training. The interaction_cst
parameter allows you to specify constraints on feature interactions, potentially improving model interpretability and performance.
By default, interaction_cst
is set to None
, allowing all features to interact freely. Common configurations include pairwise constraints or more complex groupings based on domain knowledge.
Specifying interaction constraints can lead to more interpretable models and may improve generalization by reducing overfitting, especially when there’s prior knowledge about feature relationships.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=4,
n_redundant=0, n_clusters_per_class=2, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different interaction_cst configurations
configs = [
("No constraints", None),
("Pairwise constraints", [[0, 1], [2, 3], [3, 4]]),
("Complex constraints", [[0, 1, 2], [2, 3, 4]])
]
for name, interaction_cst in configs:
model = HistGradientBoostingClassifier(interaction_cst=interaction_cst, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"{name}: Accuracy = {accuracy:.3f}")
Running the example gives an output like:
No constraints: Accuracy = 0.905
Pairwise constraints: Accuracy = 0.790
Complex constraints: Accuracy = 0.855
The key steps in this example are:
- Generate a synthetic classification dataset with potentially interacting features
- Split the data into train and test sets
- Train
HistGradientBoostingClassifier
models with differentinteraction_cst
configurations - Evaluate the accuracy of each model on the test set
Some tips for effectively using interaction_cst
:
- Use domain knowledge to identify potential feature interactions
- Start with pairwise constraints and gradually increase complexity
- Monitor performance changes when adding or removing constraints
Issues to consider when using interaction_cst
:
- Constraints can improve interpretability but may limit model flexibility
- Overly strict constraints might lead to underfitting
- The impact on training time depends on the number and complexity of constraints