Configure HistGradientBoostingRegressor "interaction_cst" Parameter

The interaction_cst parameter in scikit-learn’s HistGradientBoostingRegressor allows you to control which features are allowed to interact in the trees.

Histogram-based Gradient Boosting is an efficient implementation of gradient boosting that uses binning to reduce training time and memory usage. It builds an ensemble of decision trees in a sequential manner, with each tree correcting the errors of the previous ones.

The interaction_cst parameter defines interaction constraints between features. It allows you to specify which features are allowed to be used together for splitting nodes in the decision trees.

By default, interaction_cst is set to None, which means all features can interact. You can specify constraints as a list of lists, where each sublist contains feature indices that are allowed to interact.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset with interacting features
X, y = make_regression(n_samples=1000, n_features=5, n_informative=3,
                       noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different interaction_cst values
interaction_cst_values = [None, [[0, 1], [2, 3, 4]]]
mse_scores = []

for cst in interaction_cst_values:
    model = HistGradientBoostingRegressor(interaction_cst=cst, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"interaction_cst={cst}, MSE: {mse:.3f}")

Running the example gives an output like:

interaction_cst=None, MSE: 50.119
interaction_cst=[[0, 1], [2, 3, 4]], MSE: 56.076

The key steps in this example are:

Generate a synthetic regression dataset with potential feature interactions
Split the data into train and test sets
Train HistGradientBoostingRegressor models with different interaction_cst values
Evaluate the mean squared error of each model on the test set

Tips for setting interaction_cst:

Use domain knowledge to determine which features should interact
Start with no constraints and gradually add them to see the impact on model performance
Consider the trade-off between model flexibility and interpretability

Issues to consider:

Overly restrictive constraints may lead to underfitting
Interaction constraints can impact model performance and training time
The effectiveness of constraints depends on the true underlying relationships in the data

See Also