The stack_method
parameter in scikit-learn’s StackingClassifier
determines how the predictions from base estimators are used as input to the final estimator.
StackingClassifier
is an ensemble method that combines multiple classification models via a meta-classifier. The stack_method
parameter controls how the base estimators’ predictions are stacked.
This parameter can be set to ‘auto’, ‘predict_proba’, ‘decision_function’, or ‘predict’. The choice affects the type of prediction used from each base estimator.
The default value for stack_method
is ‘auto’. This automatically chooses the best method based on the estimator’s available methods.
Common choices include ‘predict_proba’ for classifiers that can output probability estimates, and ‘predict’ for those that cannot.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, roc_auc_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
n_redundant=5, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define base classifiers
estimators = [
('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
('svm', SVC(probability=True, random_state=42)),
('knn', KNeighborsClassifier(n_neighbors=5))
]
# Define stack methods to test
stack_methods = ['auto', 'predict_proba', 'predict']
for method in stack_methods:
clf = StackingClassifier(
estimators=estimators,
final_estimator=LogisticRegression(),
stack_method=method,
cv=5
)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
auc = roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1])
print(f"stack_method={method}, Accuracy: {accuracy:.3f}, AUC: {auc:.3f}")
Running the example gives an output like:
stack_method=auto, Accuracy: 0.930, AUC: 0.982
stack_method=predict_proba, Accuracy: 0.930, AUC: 0.982
stack_method=predict, Accuracy: 0.930, AUC: 0.951
The key steps in this example are:
- Generate a synthetic classification dataset
- Split the data into train and test sets
- Define base classifiers (RandomForest, SVM, KNN)
- Create
StackingClassifier
models with differentstack_method
values - Train models and evaluate performance using accuracy and AUC scores
Tips for choosing the appropriate stack_method
:
- Use ‘auto’ when unsure, as it selects the best available method for each estimator
- ‘predict_proba’ is preferable when all base estimators support probability estimates
- ‘decision_function’ can be useful for SVM classifiers
- ‘predict’ is a fallback option when probability estimates are not available
Considerations when setting stack_method
:
- Ensure all base estimators support the chosen method
- Different methods may lead to varying performance, so experiment to find the best option
- The choice can affect the interpretability of the final model
- Computational cost may vary between methods, with ‘predict’ typically being the fastest