The monotonic_cst
parameter in scikit-learn’s RandomForestRegressor
allows you to specify monotonic constraints for each feature. This can be useful when you have prior knowledge that certain features have a monotonic relationship with the target variable.
RandomForestRegressor
is an ensemble learning method that combines predictions from multiple decision trees to improve regression performance. The monotonic_cst
parameter is a list that specifies the monotonic constraints for each feature, where 1 indicates an increasing constraint, -1 indicates a decreasing constraint, and 0 indicates no constraint.
By default, monotonic_cst
is set to None
, which means no monotonic constraints are applied. In practice, the parameter is set based on domain knowledge about the relationship between features and the target variable.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
# Generate synthetic dataset with monotonic relationships
X, y = make_regression(n_samples=1000, n_features=5, noise=0.1, random_state=42,
effective_rank=5, tail_strength=0.5)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different monotonic_cst values
monotonic_cst_values = [None, [1, 1, 1, 1, 1], [-1, -1, -1, -1, -1]]
results = []
for cst in monotonic_cst_values:
rf = RandomForestRegressor(n_estimators=100, monotonic_cst=cst, random_state=42)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
results.append((cst, mse, r2))
for cst, mse, r2 in results:
print(f"monotonic_cst={cst}, MSE: {mse:.3f}, R-squared: {r2:.3f}")
Running the example gives an output like:
monotonic_cst=None, MSE: 0.755, R-squared: 0.926
monotonic_cst=[1, 1, 1, 1, 1], MSE: 2.512, R-squared: 0.754
monotonic_cst=[-1, -1, -1, -1, -1], MSE: 10.691, R-squared: -0.046
The key steps in this example are:
- Generate a synthetic regression dataset with features that have monotonic relationships with the target
- Split the data into train and test sets
- Train
RandomForestRegressor
models with differentmonotonic_cst
values - Evaluate and compare the performance of each model on the test set using MSE and R-squared
Some tips and heuristics for setting monotonic_cst
:
- Use increasing constraints (1) for features that are known to have a positive monotonic relationship with the target
- Use decreasing constraints (-1) for features that are known to have a negative monotonic relationship with the target
- Only apply constraints to features where you have strong domain knowledge about their relationship with the target
Issues to consider:
- Applying monotonic constraints can improve model interpretability but may slightly reduce performance
- Incorrectly specifying constraints can lead to poor model performance
- Monotonic constraints should only be used when there is a clear monotonic relationship between features and the target based on domain knowledge