The monotonic_cst
parameter in scikit-learn’s ExtraTreesClassifier
allows you to enforce monotonic constraints on specific features during training.
ExtraTreesClassifier
is an ensemble learning method that builds multiple decision trees using random subsets of features. It’s similar to Random Forest but with increased randomness in the tree-building process.
The monotonic_cst
parameter specifies which features should have monotonic relationships with the target variable. This can be useful when you have domain knowledge about feature-target relationships.
By default, monotonic_cst
is set to None
, meaning no monotonic constraints are applied. When used, it’s typically set as a list or array with values -1 (decreasing), 0 (no constraint), or 1 (increasing) for each feature.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.metrics import accuracy_score
import numpy as np
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=3, n_informative=3,
n_redundant=0, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different monotonic_cst configurations
configs = [
None,
[1, -1, 0],
[0, 0, 0],
[1, 0, -1]
]
for config in configs:
etc = ExtraTreesClassifier(n_estimators=100, random_state=42, monotonic_cst=config)
etc.fit(X_train, y_train)
y_pred = etc.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print(f"monotonic_cst={config}, Accuracy: {acc:.3f}")
Running the example gives an output like:
monotonic_cst=None, Accuracy: 0.955
monotonic_cst=[1, -1, 0], Accuracy: 0.845
monotonic_cst=[0, 0, 0], Accuracy: 0.955
monotonic_cst=[1, 0, -1], Accuracy: 0.910
The key steps in this example are:
- Generate a synthetic regression dataset with three features
- Split the data into train and test sets
- Train
ExtraTreesClassifier
models with differentmonotonic_cst
configurations - Evaluate the mean squared error of each model on the test set
Some tips and heuristics for setting monotonic_cst
:
- Use domain knowledge to determine which features should have monotonic relationships
- Set constraints only for features where you’re confident about the relationship
- Consider the trade-off between enforcing constraints and model flexibility
Issues to consider:
- Enforcing monotonic constraints may reduce model performance if the true relationships are not monotonic
- The impact of constraints can vary depending on the dataset and problem complexity
- Monotonic constraints may increase training time and reduce the model’s ability to capture complex patterns