The monotonic_cst
parameter in scikit-learn’s RandomForestClassifier
allows you to enforce monotonic constraints on the decision trees in the ensemble.
Monotonic constraints specify whether a feature has a monotonically increasing or decreasing relationship with the target variable. This can be useful when you have prior domain knowledge about the relationships between features and the target.
The monotonic_cst
parameter takes a list with a length equal to the number of features and values are either 1 or -1, indicating a monotonically increasing or decreasing relationship, respectively.
By default, monotonic_cst
is set to None
, which means no monotonic constraints are applied.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic regression dataset with monotonic relationships
X, y = make_classification(n_samples=1000, n_features=5, n_informative=5 ,
n_redundant=0, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different monotonic_cst configurations
configs = [
None, # No constraints
[1, 0, 0, 0, 0], # Monotonically increasing for feature 0
[0, -1, 0, 0, 0], # Monotonically decreasing for feature 1
[1, -1, 0, 0, 0] # Increasing for feature 0, decreasing for feature 1
]
for monotonic_cst in configs:
rf = RandomForestClassifier(n_estimators=100, random_state=42,
monotonic_cst=monotonic_cst)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print(f"monotonic_cst={monotonic_cst}, Accuracy: {acc:.3f}")
Running the example gives an output like:
monotonic_cst=None, Accuracy: 0.935
monotonic_cst=[1, 0, 0, 0, 0], Accuracy: 0.880
monotonic_cst=[0, -1, 0, 0, 0], Accuracy: 0.910
monotonic_cst=[1, -1, 0, 0, 0], Accuracy: 0.840
The key steps in this example are:
- Generate a synthetic classification dataset
- Split the data into train and test sets
- Train
RandomForestClassifier
models with differentmonotonic_cst
configurations - Evaluate the accuracy of each model on the test set
Some tips and heuristics for using monotonic_cst
:
- Consider using monotonic constraints when you have strong prior knowledge about the feature-target relationships
- Determine the direction of the constraint (increasing or decreasing) based on domain understanding
- Be aware that using constraints may slightly reduce accuracy but can lead to more interpretable models
Issues to consider:
- Monotonic constraints are a strong assumption and may not always hold perfectly in real data
- Applying incorrect constraints can lead to reduced model performance