Configure StackingClassifier "stack_method" Parameter

The stack_method parameter in scikit-learn’s StackingClassifier determines how the predictions from base estimators are used as input to the final estimator.

StackingClassifier is an ensemble method that combines multiple classification models via a meta-classifier. The stack_method parameter controls how the base estimators’ predictions are stacked.

This parameter can be set to ‘auto’, ‘predict_proba’, ‘decision_function’, or ‘predict’. The choice affects the type of prediction used from each base estimator.

The default value for stack_method is ‘auto’. This automatically chooses the best method based on the estimator’s available methods.

Common choices include ‘predict_proba’ for classifiers that can output probability estimates, and ‘predict’ for those that cannot.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, roc_auc_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base classifiers
estimators = [
    ('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
    ('svm', SVC(probability=True, random_state=42)),
    ('knn', KNeighborsClassifier(n_neighbors=5))
]

# Define stack methods to test
stack_methods = ['auto', 'predict_proba', 'predict']

for method in stack_methods:
    clf = StackingClassifier(
        estimators=estimators,
        final_estimator=LogisticRegression(),
        stack_method=method,
        cv=5
    )
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    auc = roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1])
    print(f"stack_method={method}, Accuracy: {accuracy:.3f}, AUC: {auc:.3f}")

Running the example gives an output like:

stack_method=auto, Accuracy: 0.930, AUC: 0.982
stack_method=predict_proba, Accuracy: 0.930, AUC: 0.982
stack_method=predict, Accuracy: 0.930, AUC: 0.951

The key steps in this example are:

Generate a synthetic classification dataset
Split the data into train and test sets
Define base classifiers (RandomForest, SVM, KNN)
Create StackingClassifier models with different stack_method values
Train models and evaluate performance using accuracy and AUC scores

Tips for choosing the appropriate stack_method:

Use ‘auto’ when unsure, as it selects the best available method for each estimator
‘predict_proba’ is preferable when all base estimators support probability estimates
‘decision_function’ can be useful for SVM classifiers
‘predict’ is a fallback option when probability estimates are not available

Considerations when setting stack_method:

Ensure all base estimators support the chosen method
Different methods may lead to varying performance, so experiment to find the best option
The choice can affect the interpretability of the final model
Computational cost may vary between methods, with ‘predict’ typically being the fastest

See Also