The passthrough parameter in scikit-learn’s StackingClassifier determines whether to include the original features in the final meta-classifier.
StackingClassifier is an ensemble method that combines multiple base classifiers by training a meta-classifier on their predictions. The passthrough parameter controls whether the original input features are also passed to the meta-classifier.
When passthrough=True, the meta-classifier receives both the predictions from base classifiers and the original features. This can potentially improve performance but increases the input dimensionality for the meta-classifier.
The default value for passthrough is False. Setting it to True can be beneficial when the original features contain information not fully captured by the base classifiers.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import StackingClassifier
from sklearn.metrics import roc_auc_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
n_redundant=5, n_classes=2, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define base classifiers
base_classifiers = [
('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
('lr', LogisticRegression(random_state=42))
]
# Create StackingClassifier with passthrough=False
stacking_false = StackingClassifier(
estimators=base_classifiers,
final_estimator=LogisticRegression(),
passthrough=False,
cv=5
)
# Create StackingClassifier with passthrough=True
stacking_true = StackingClassifier(
estimators=base_classifiers,
final_estimator=LogisticRegression(),
passthrough=True,
cv=5
)
# Fit and evaluate models
for model in [stacking_false, stacking_true]:
model.fit(X_train, y_train)
y_pred_proba = model.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, y_pred_proba)
print(f"AUC score with passthrough={model.passthrough}: {auc:.3f}")
Running the example gives an output like:
AUC score with passthrough=False: 0.973
AUC score with passthrough=True: 0.979
The key steps in this example are:
- Generate a synthetic classification dataset with informative and redundant features
- Split the data into train and test sets
- Create two
StackingClassifiermodels, one withpassthrough=Falseand one withpassthrough=True - Train both models and evaluate their performance using ROC AUC score
Tips for configuring the passthrough parameter:
- Use
passthrough=Truewhen you suspect the original features contain valuable information not captured by base classifiers - Consider the trade-off between potential performance gain and increased computational cost
- Experiment with both options and compare performance to determine the best configuration for your specific problem
Issues to consider:
- Setting
passthrough=Trueincreases the input dimensionality for the meta-classifier, which may lead to overfitting on smaller datasets - The effectiveness of
passthrough=Truedepends on the choice of base classifiers and meta-classifier - Using
passthrough=Truemay increase training and prediction time, especially with high-dimensional datasets