SKLearner Home | About | Contact | Examples

Configure StackingClassifier "stack_method" Parameter

The stack_method parameter in scikit-learn’s StackingClassifier determines how the predictions from base estimators are used as input to the final estimator.

StackingClassifier is an ensemble method that combines multiple classification models via a meta-classifier. The stack_method parameter controls how the base estimators’ predictions are stacked.

This parameter can be set to ‘auto’, ‘predict_proba’, ‘decision_function’, or ‘predict’. The choice affects the type of prediction used from each base estimator.

The default value for stack_method is ‘auto’. This automatically chooses the best method based on the estimator’s available methods.

Common choices include ‘predict_proba’ for classifiers that can output probability estimates, and ‘predict’ for those that cannot.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, roc_auc_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base classifiers
estimators = [
    ('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
    ('svm', SVC(probability=True, random_state=42)),
    ('knn', KNeighborsClassifier(n_neighbors=5))
]

# Define stack methods to test
stack_methods = ['auto', 'predict_proba', 'predict']

for method in stack_methods:
    clf = StackingClassifier(
        estimators=estimators,
        final_estimator=LogisticRegression(),
        stack_method=method,
        cv=5
    )
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    auc = roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1])
    print(f"stack_method={method}, Accuracy: {accuracy:.3f}, AUC: {auc:.3f}")

Running the example gives an output like:

stack_method=auto, Accuracy: 0.930, AUC: 0.982
stack_method=predict_proba, Accuracy: 0.930, AUC: 0.982
stack_method=predict, Accuracy: 0.930, AUC: 0.951

The key steps in this example are:

  1. Generate a synthetic classification dataset
  2. Split the data into train and test sets
  3. Define base classifiers (RandomForest, SVM, KNN)
  4. Create StackingClassifier models with different stack_method values
  5. Train models and evaluate performance using accuracy and AUC scores

Tips for choosing the appropriate stack_method:

Considerations when setting stack_method:



See Also