Stacking is an ensemble learning technique that combines multiple classification models to improve predictive performance.
The key hyperparameters of StackingClassifier
include estimators
(list of base models), final_estimator
(meta-model), and cv
(cross-validation strategy).
This algorithm is appropriate for various classification problems where combining models can enhance performance compared to individual models.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# generate a synthetic classification dataset
X, y = make_classification(n_samples=100, n_features=20, n_classes=2, random_state=1)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# define base models
base_models = [
('lr', LogisticRegression()),
('dt', DecisionTreeClassifier()),
('svm', SVC())
]
# define stacking classifier
stacking_model = StackingClassifier(estimators=base_models, final_estimator=LogisticRegression())
# fit stacking model
stacking_model.fit(X_train, y_train)
# evaluate model
yhat = stacking_model.predict(X_test)
acc = accuracy_score(y_test, yhat)
print('Accuracy: %.3f' % acc)
# make a prediction
row = [[-0.509, 1.494, -0.644, -1.198, -0.566, -0.595, -0.955, -1.173, -1.053, 0.722, -0.692, -1.659, 0.995, -0.334, 0.357, -1.135, 0.629, 0.491, -1.687, 0.896]]
yhat = stacking_model.predict(row)
print('Predicted: %d' % yhat[0])
Running the example gives an output like:
Accuracy: 0.950
Predicted: 1
The steps are as follows:
First, a synthetic binary classification dataset is generated using the
make_classification()
function. This creates a dataset with a specified number of samples (n_samples
), features (n_features
), classes (n_classes
), and a fixed random seed (random_state
) for reproducibility. The dataset is split into training and test sets usingtrain_test_split()
.Next, base models are defined for the
StackingClassifier
. In this example, three base models are used:LogisticRegression
,DecisionTreeClassifier
, andSVC
.A
StackingClassifier
is instantiated with the defined base models (estimators
) and aLogisticRegression
model as the final estimator (final_estimator
).The
StackingClassifier
is then fit on the training data using thefit()
method.The performance of the model is evaluated by comparing the predictions (
yhat
) to the actual values (y_test
) using the accuracy score metric.A single prediction can be made by passing a new data sample to the
predict()
method.
This example demonstrates how to use the StackingClassifier
to combine multiple models and enhance predictive performance for classification tasks. The ensemble approach leverages the strengths of different base models and a meta-model to achieve better accuracy compared to individual models.