Scikit-Learn StackingClassifier Model

Stacking is an ensemble learning technique that combines multiple classification models to improve predictive performance.

The key hyperparameters of StackingClassifier include estimators (list of base models), final_estimator (meta-model), and cv (cross-validation strategy).

This algorithm is appropriate for various classification problems where combining models can enhance performance compared to individual models.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# generate a synthetic classification dataset
X, y = make_classification(n_samples=100, n_features=20, n_classes=2, random_state=1)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# define base models
base_models = [
    ('lr', LogisticRegression()),
    ('dt', DecisionTreeClassifier()),
    ('svm', SVC())
]

# define stacking classifier
stacking_model = StackingClassifier(estimators=base_models, final_estimator=LogisticRegression())

# fit stacking model
stacking_model.fit(X_train, y_train)

# evaluate model
yhat = stacking_model.predict(X_test)
acc = accuracy_score(y_test, yhat)
print('Accuracy: %.3f' % acc)

# make a prediction
row = [[-0.509, 1.494, -0.644, -1.198, -0.566, -0.595, -0.955, -1.173, -1.053, 0.722, -0.692, -1.659, 0.995, -0.334, 0.357, -1.135, 0.629, 0.491, -1.687, 0.896]]
yhat = stacking_model.predict(row)
print('Predicted: %d' % yhat[0])

Running the example gives an output like:

Accuracy: 0.950
Predicted: 1

The steps are as follows:

First, a synthetic binary classification dataset is generated using the make_classification() function. This creates a dataset with a specified number of samples (n_samples), features (n_features), classes (n_classes), and a fixed random seed (random_state) for reproducibility. The dataset is split into training and test sets using train_test_split().
Next, base models are defined for the StackingClassifier. In this example, three base models are used: LogisticRegression, DecisionTreeClassifier, and SVC.
A StackingClassifier is instantiated with the defined base models (estimators) and a LogisticRegression model as the final estimator (final_estimator).
The StackingClassifier is then fit on the training data using the fit() method.
The performance of the model is evaluated by comparing the predictions (yhat) to the actual values (y_test) using the accuracy score metric.
A single prediction can be made by passing a new data sample to the predict() method.

This example demonstrates how to use the StackingClassifier to combine multiple models and enhance predictive performance for classification tasks. The ensemble approach leverages the strengths of different base models and a meta-model to achieve better accuracy compared to individual models.

See Also