SKLearner Home | About | Contact | Examples

Configure StackingClassifier "estimators" Parameter

The estimators parameter in scikit-learn’s StackingClassifier defines the set of base estimators used in the ensemble.

Stacking is an ensemble learning technique that combines multiple classification models via a meta-classifier. The estimators parameter specifies the list of base classifiers to be stacked.

Selecting appropriate base estimators is crucial for the performance of the stacked model. A diverse set of base models often leads to better generalization.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import StackingClassifier, RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define different sets of estimators
estimators_sets = [
    [('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
     ('svm', SVC(kernel='rbf', probability=True, random_state=42))],
    [('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
     ('lr', LogisticRegression(random_state=42)),
     ('nb', GaussianNB())]
]

# Train and evaluate StackingClassifier with different estimator sets
for i, estimators in enumerate(estimators_sets, 1):
    stacking_clf = StackingClassifier(
        estimators=estimators,
        final_estimator=LogisticRegression(random_state=42),
        cv=5
    )
    stacking_clf.fit(X_train, y_train)
    y_pred = stacking_clf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Estimator Set {i}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

Estimator Set 1, Accuracy: 0.860
Estimator Set 2, Accuracy: 0.845

The key steps in this example are:

  1. Generate a synthetic multi-class classification dataset
  2. Split the data into train and test sets
  3. Define different sets of base estimators
  4. Create StackingClassifier instances with different estimator sets
  5. Train each stacking classifier and evaluate its accuracy on the test set

Some tips for configuring the estimators parameter:

Issues to consider:



See Also