Configure RandomForestRegressor "monotonic_cst" Parameter

The monotonic_cst parameter in scikit-learn’s RandomForestRegressor allows you to specify monotonic constraints for each feature. This can be useful when you have prior knowledge that certain features have a monotonic relationship with the target variable.

RandomForestRegressor is an ensemble learning method that combines predictions from multiple decision trees to improve regression performance. The monotonic_cst parameter is a list that specifies the monotonic constraints for each feature, where 1 indicates an increasing constraint, -1 indicates a decreasing constraint, and 0 indicates no constraint.

By default, monotonic_cst is set to None, which means no monotonic constraints are applied. In practice, the parameter is set based on domain knowledge about the relationship between features and the target variable.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Generate synthetic dataset with monotonic relationships
X, y = make_regression(n_samples=1000, n_features=5, noise=0.1, random_state=42,
                       effective_rank=5, tail_strength=0.5)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different monotonic_cst values
monotonic_cst_values = [None, [1, 1, 1, 1, 1], [-1, -1, -1, -1, -1]]
results = []

for cst in monotonic_cst_values:
    rf = RandomForestRegressor(n_estimators=100, monotonic_cst=cst, random_state=42)
    rf.fit(X_train, y_train)
    y_pred = rf.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    results.append((cst, mse, r2))

for cst, mse, r2 in results:
    print(f"monotonic_cst={cst}, MSE: {mse:.3f}, R-squared: {r2:.3f}")

Running the example gives an output like:

monotonic_cst=None, MSE: 0.755, R-squared: 0.926
monotonic_cst=[1, 1, 1, 1, 1], MSE: 2.512, R-squared: 0.754
monotonic_cst=[-1, -1, -1, -1, -1], MSE: 10.691, R-squared: -0.046

The key steps in this example are:

Generate a synthetic regression dataset with features that have monotonic relationships with the target
Split the data into train and test sets
Train RandomForestRegressor models with different monotonic_cst values
Evaluate and compare the performance of each model on the test set using MSE and R-squared

Some tips and heuristics for setting monotonic_cst:

Use increasing constraints (1) for features that are known to have a positive monotonic relationship with the target
Use decreasing constraints (-1) for features that are known to have a negative monotonic relationship with the target
Only apply constraints to features where you have strong domain knowledge about their relationship with the target

Issues to consider:

Applying monotonic constraints can improve model interpretability but may slightly reduce performance
Incorrectly specifying constraints can lead to poor model performance
Monotonic constraints should only be used when there is a clear monotonic relationship between features and the target based on domain knowledge

See Also