SKLearner Home | About | Contact | Examples

Configure StackingClassifier "final_estimator" Parameter

The final_estimator parameter in scikit-learn’s StackingClassifier determines the model used to combine predictions from base estimators.

StackingClassifier is an ensemble method that fits multiple base classifiers on the original dataset, then uses their predictions as input to a final classifier. The final_estimator is this last layer that learns how to best combine the base predictions.

By default, final_estimator is set to LogisticRegression(). However, any classifier can be used, allowing for complex ensemble architectures.

Common choices for final_estimator include LogisticRegression, RandomForestClassifier, and GradientBoostingClassifier, each offering different trade-offs between interpretability and performance.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, StackingClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=10, n_redundant=0, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base estimators
base_estimators = [
    ('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
    ('svc', SVC(kernel='rbf', probability=True, random_state=42)),
    ('knn', KNeighborsClassifier(n_neighbors=5))
]

# Define final estimators to compare
final_estimators = [
    ('default', None),  # Uses default LogisticRegression
    ('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
    ('gb', GradientBoostingClassifier(n_estimators=100, random_state=42))
]

# Train and evaluate models with different final estimators
for name, final_estimator in final_estimators:
    clf = StackingClassifier(estimators=base_estimators, final_estimator=final_estimator, cv=5)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred, average='weighted')
    print(f"Final Estimator: {name}")
    print(f"Accuracy: {accuracy:.3f}")
    print(f"F1 Score: {f1:.3f}\n")

Running the example gives an output like:

Final Estimator: default
Accuracy: 0.910
F1 Score: 0.911

Final Estimator: rf
Accuracy: 0.895
F1 Score: 0.896

Final Estimator: gb
Accuracy: 0.885
F1 Score: 0.886

Key steps in this example:

  1. Generate a synthetic multi-class classification dataset
  2. Split data into train and test sets
  3. Define base estimators (RandomForest, SVC, KNN)
  4. Create StackingClassifier with different final estimators
  5. Train models and evaluate performance using accuracy and F1 score

Tips for choosing an appropriate final estimator:

Issues to consider:



See Also