The passthrough
parameter in scikit-learn’s StackingClassifier
determines whether to include the original features in the final meta-classifier.
StackingClassifier
is an ensemble method that combines multiple base classifiers by training a meta-classifier on their predictions. The passthrough
parameter controls whether the original input features are also passed to the meta-classifier.
When passthrough=True
, the meta-classifier receives both the predictions from base classifiers and the original features. This can potentially improve performance but increases the input dimensionality for the meta-classifier.
The default value for passthrough
is False. Setting it to True can be beneficial when the original features contain information not fully captured by the base classifiers.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import StackingClassifier
from sklearn.metrics import roc_auc_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
n_redundant=5, n_classes=2, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define base classifiers
base_classifiers = [
('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
('lr', LogisticRegression(random_state=42))
]
# Create StackingClassifier with passthrough=False
stacking_false = StackingClassifier(
estimators=base_classifiers,
final_estimator=LogisticRegression(),
passthrough=False,
cv=5
)
# Create StackingClassifier with passthrough=True
stacking_true = StackingClassifier(
estimators=base_classifiers,
final_estimator=LogisticRegression(),
passthrough=True,
cv=5
)
# Fit and evaluate models
for model in [stacking_false, stacking_true]:
model.fit(X_train, y_train)
y_pred_proba = model.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, y_pred_proba)
print(f"AUC score with passthrough={model.passthrough}: {auc:.3f}")
Running the example gives an output like:
AUC score with passthrough=False: 0.973
AUC score with passthrough=True: 0.979
The key steps in this example are:
- Generate a synthetic classification dataset with informative and redundant features
- Split the data into train and test sets
- Create two
StackingClassifier
models, one withpassthrough=False
and one withpassthrough=True
- Train both models and evaluate their performance using ROC AUC score
Tips for configuring the passthrough
parameter:
- Use
passthrough=True
when you suspect the original features contain valuable information not captured by base classifiers - Consider the trade-off between potential performance gain and increased computational cost
- Experiment with both options and compare performance to determine the best configuration for your specific problem
Issues to consider:
- Setting
passthrough=True
increases the input dimensionality for the meta-classifier, which may lead to overfitting on smaller datasets - The effectiveness of
passthrough=True
depends on the choice of base classifiers and meta-classifier - Using
passthrough=True
may increase training and prediction time, especially with high-dimensional datasets