Configure HistGradientBoostingRegressor "monotonic_cst" Parameter

The monotonic_cst parameter in scikit-learn’s HistGradientBoostingRegressor allows you to enforce monotonic constraints on the relationship between features and the target variable.

Monotonic constraints ensure that the predicted output either increases or decreases monotonically with respect to a given feature. This can be useful when you have domain knowledge about the expected relationship between features and the target.

The monotonic_cst parameter accepts a dictionary or list specifying the constraint for each feature. Use 1 for a positive constraint, -1 for a negative constraint, and 0 for no constraint.

By default, monotonic_cst=None, which means no monotonicity constraints are applied.

Common configurations include applying constraints to a subset of features or to all features based on domain knowledge.

from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
np.random.seed(42)
n_samples = 1000
X = np.random.rand(n_samples, 3)
y = 2 * X[:, 0] - 3 * X[:, 1] + 0.5 * X[:, 2] + np.random.normal(0, 0.1, n_samples)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define different monotonic constraint configurations
constraints = [
    None,
    [1, -1, 0]
]

for cst in constraints:
    model = HistGradientBoostingRegressor(monotonic_cst=cst, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    print(f"Monotonic constraints: {cst}")
    print(f"Mean Squared Error: {mse:.4f}\n")

Running the example gives an output like:

Monotonic constraints: None
Mean Squared Error: 0.0168

Monotonic constraints: [1, -1, 0]
Mean Squared Error: 0.0175

The key steps in this example are:

Generate a synthetic regression dataset with three features
Split the data into train and test sets
Create HistGradientBoostingRegressor models with different monotonic_cst configurations
Train the models and evaluate their performance using mean squared error
Compare the results to show the effect of monotonic constraints

Some tips and heuristics for setting monotonic_cst:

Use domain knowledge to determine which features should have monotonic relationships
Consider using monotonic constraints when interpretability is important
Experiment with different constraint combinations to find the best balance between model performance and desired monotonicity

Issues to consider:

Applying constraints may reduce model flexibility and potentially decrease performance
Monotonic constraints are most effective when there is a clear monotonic relationship between features and the target
Ensure that the training data supports the imposed constraints to avoid conflicts during model fitting

See Also