Configure StackingClassifier "passthrough" Parameter

The passthrough parameter in scikit-learn’s StackingClassifier determines whether to include the original features in the final meta-classifier.

StackingClassifier is an ensemble method that combines multiple base classifiers by training a meta-classifier on their predictions. The passthrough parameter controls whether the original input features are also passed to the meta-classifier.

When passthrough=True, the meta-classifier receives both the predictions from base classifiers and the original features. This can potentially improve performance but increases the input dimensionality for the meta-classifier.

The default value for passthrough is False. Setting it to True can be beneficial when the original features contain information not fully captured by the base classifiers.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import StackingClassifier
from sklearn.metrics import roc_auc_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base classifiers
base_classifiers = [
    ('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
    ('lr', LogisticRegression(random_state=42))
]

# Create StackingClassifier with passthrough=False
stacking_false = StackingClassifier(
    estimators=base_classifiers,
    final_estimator=LogisticRegression(),
    passthrough=False,
    cv=5
)

# Create StackingClassifier with passthrough=True
stacking_true = StackingClassifier(
    estimators=base_classifiers,
    final_estimator=LogisticRegression(),
    passthrough=True,
    cv=5
)

# Fit and evaluate models
for model in [stacking_false, stacking_true]:
    model.fit(X_train, y_train)
    y_pred_proba = model.predict_proba(X_test)[:, 1]
    auc = roc_auc_score(y_test, y_pred_proba)
    print(f"AUC score with passthrough={model.passthrough}: {auc:.3f}")

Running the example gives an output like:

AUC score with passthrough=False: 0.973
AUC score with passthrough=True: 0.979

The key steps in this example are:

Generate a synthetic classification dataset with informative and redundant features
Split the data into train and test sets
Create two StackingClassifier models, one with passthrough=False and one with passthrough=True
Train both models and evaluate their performance using ROC AUC score

Tips for configuring the passthrough parameter:

Use passthrough=True when you suspect the original features contain valuable information not captured by base classifiers
Consider the trade-off between potential performance gain and increased computational cost
Experiment with both options and compare performance to determine the best configuration for your specific problem

Issues to consider:

Setting passthrough=True increases the input dimensionality for the meta-classifier, which may lead to overfitting on smaller datasets
The effectiveness of passthrough=True depends on the choice of base classifiers and meta-classifier
Using passthrough=True may increase training and prediction time, especially with high-dimensional datasets

See Also