The monotonic_cst
parameter in scikit-learn’s ExtraTreesRegressor
allows you to enforce monotonic constraints on the relationship between features and the target variable.
Extra Trees Regressor is an ensemble method that builds multiple randomized decision trees and averages their predictions to improve generalization and reduce overfitting. The monotonic_cst
parameter enables you to specify whether each feature should have a positive, negative, or no monotonic relationship with the target.
Monotonic constraints ensure that the predicted output always increases (or decreases) as a specific feature increases, regardless of other feature values. This can be useful when you have domain knowledge about the expected relationship between features and the target.
The default value for monotonic_cst
is None, which means no monotonic constraints are applied. When specified, it should be a list or array with values 1 (positive constraint), -1 (negative constraint), or 0 (no constraint) for each feature.
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.metrics import mean_squared_error
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=3, noise=0.1, random_state=42)
# Ensure the first feature has a positive relationship with the target
X[:, 0] = np.abs(X[:, 0])
y += X[:, 0]
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define different monotonic constraints
constraints = [
None,
[1, 0, 0], # Positive constraint on first feature
[1, -1, 0] # Positive on first, negative on second, no constraint on third
]
for cst in constraints:
etr = ExtraTreesRegressor(n_estimators=100, random_state=42, monotonic_cst=cst)
etr.fit(X_train, y_train)
y_pred = etr.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Monotonic constraints: {cst}, MSE: {mse:.4f}")
Running the example gives an output like:
Monotonic constraints: None, MSE: 11209.1497
Monotonic constraints: [1, 0, 0], MSE: 10275.3948
Monotonic constraints: [1, -1, 0], MSE: 17836.4720
The key steps in this example are:
- Generate a synthetic regression dataset with features suitable for monotonic constraints
- Split the data into train and test sets
- Create
ExtraTreesRegressor
models with differentmonotonic_cst
configurations - Train the models and evaluate their performance using mean squared error
- Compare the results of different constraint configurations
Some tips and heuristics for setting monotonic_cst
:
- Use domain knowledge to determine which features should have monotonic relationships with the target
- Start with no constraints and gradually add them to see their impact on model performance
- Consider the trade-off between enforcing constraints and potential loss in predictive power
Issues to consider:
- Enforcing monotonic constraints may reduce model flexibility and potentially decrease overall performance
- Constraints should be based on strong domain knowledge or business requirements
- Monotonic constraints may not be suitable for all types of relationships in your data