Configure StackingClassifier "estimators" Parameter

The estimators parameter in scikit-learn’s StackingClassifier defines the set of base estimators used in the ensemble.

Stacking is an ensemble learning technique that combines multiple classification models via a meta-classifier. The estimators parameter specifies the list of base classifiers to be stacked.

Selecting appropriate base estimators is crucial for the performance of the stacked model. A diverse set of base models often leads to better generalization.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import StackingClassifier, RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define different sets of estimators
estimators_sets = [
    [('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
     ('svm', SVC(kernel='rbf', probability=True, random_state=42))],
    [('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
     ('lr', LogisticRegression(random_state=42)),
     ('nb', GaussianNB())]
]

# Train and evaluate StackingClassifier with different estimator sets
for i, estimators in enumerate(estimators_sets, 1):
    stacking_clf = StackingClassifier(
        estimators=estimators,
        final_estimator=LogisticRegression(random_state=42),
        cv=5
    )
    stacking_clf.fit(X_train, y_train)
    y_pred = stacking_clf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Estimator Set {i}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

Estimator Set 1, Accuracy: 0.860
Estimator Set 2, Accuracy: 0.845

The key steps in this example are:

Generate a synthetic multi-class classification dataset
Split the data into train and test sets
Define different sets of base estimators
Create StackingClassifier instances with different estimator sets
Train each stacking classifier and evaluate its accuracy on the test set

Some tips for configuring the estimators parameter:

Choose a diverse set of base models to capture different aspects of the data
Consider the computational cost of each estimator, especially for large datasets
Balance between weak learners (e.g., Decision Stumps) and strong learners (e.g., Random Forests)
Experiment with different combinations to find the optimal set for your specific problem

Issues to consider:

Using too many complex estimators may lead to overfitting
There’s a trade-off between model complexity and interpretability
Cross-validation is crucial when selecting and evaluating base estimators
The performance of the stacked model depends heavily on the choice of base estimators

See Also