The monotonic_cst
parameter in scikit-learn’s HistGradientBoostingRegressor
allows you to enforce monotonic constraints on the relationship between features and the target variable.
Monotonic constraints ensure that the predicted output either increases or decreases monotonically with respect to a given feature. This can be useful when you have domain knowledge about the expected relationship between features and the target.
The monotonic_cst
parameter accepts a dictionary or list specifying the constraint for each feature. Use 1 for a positive constraint, -1 for a negative constraint, and 0 for no constraint.
By default, monotonic_cst=None
, which means no monotonicity constraints are applied.
Common configurations include applying constraints to a subset of features or to all features based on domain knowledge.
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate synthetic dataset
np.random.seed(42)
n_samples = 1000
X = np.random.rand(n_samples, 3)
y = 2 * X[:, 0] - 3 * X[:, 1] + 0.5 * X[:, 2] + np.random.normal(0, 0.1, n_samples)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define different monotonic constraint configurations
constraints = [
None,
[1, -1, 0]
]
for cst in constraints:
model = HistGradientBoostingRegressor(monotonic_cst=cst, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Monotonic constraints: {cst}")
print(f"Mean Squared Error: {mse:.4f}\n")
Running the example gives an output like:
Monotonic constraints: None
Mean Squared Error: 0.0168
Monotonic constraints: [1, -1, 0]
Mean Squared Error: 0.0175
The key steps in this example are:
- Generate a synthetic regression dataset with three features
- Split the data into train and test sets
- Create
HistGradientBoostingRegressor
models with differentmonotonic_cst
configurations - Train the models and evaluate their performance using mean squared error
- Compare the results to show the effect of monotonic constraints
Some tips and heuristics for setting monotonic_cst
:
- Use domain knowledge to determine which features should have monotonic relationships
- Consider using monotonic constraints when interpretability is important
- Experiment with different constraint combinations to find the best balance between model performance and desired monotonicity
Issues to consider:
- Applying constraints may reduce model flexibility and potentially decrease performance
- Monotonic constraints are most effective when there is a clear monotonic relationship between features and the target
- Ensure that the training data supports the imposed constraints to avoid conflicts during model fitting