BaggingClassifier is an ensemble algorithm that improves the accuracy of machine learning models by training multiple instances of a base estimator on different subsets of the training data and combining their predictions.
The key hyperparameters of BaggingClassifier
include n_estimators
(number of base estimators), estimator
(the model to be bagged), and max_samples
(number of samples to draw from X to train each base estimator).
The algorithm is appropriate for classification problems where model stability and accuracy are critical.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# generate binary classification dataset
X, y = make_classification(n_samples=100, n_features=20, n_classes=2, random_state=1)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create base model
base_model = DecisionTreeClassifier()
# create bagging classifier
model = BaggingClassifier(estimator=base_model, n_estimators=10, random_state=1)
# fit model
model.fit(X_train, y_train)
# evaluate model
yhat = model.predict(X_test)
acc = accuracy_score(y_test, yhat)
print('Accuracy: %.3f' % acc)
# make a prediction
row = [[-0.233, -0.123, 0.231, -0.321, 0.123, -0.456, 0.789, -0.432, 0.123, -0.234,
-0.345, 0.456, -0.567, 0.678, -0.789, 0.890, -0.901, 0.012, -0.123, 0.234]]
yhat = model.predict(row)
print('Predicted: %d' % yhat[0])
Running the example gives an output like:
Accuracy: 0.900
Predicted: 0
The steps are as follows:
First, a synthetic binary classification dataset is generated using the
make_classification()
function. This creates a dataset with a specified number of samples (n_samples
), features (n_features
), and a fixed random seed (random_state
) for reproducibility. The dataset is split into training and test sets usingtrain_test_split()
.Next, a
DecisionTreeClassifier
is instantiated as the base model. TheBaggingClassifier
is then created using this base model and specifyingn_estimators=10
to use 10 base estimators.The
BaggingClassifier
model is fit on the training data using thefit()
method.The performance of the model is evaluated by comparing the predictions (
yhat
) to the actual values (y_test
) using the accuracy score metric.A single prediction can be made by passing a new data sample to the
predict()
method.
This example demonstrates how to quickly set up and use a BaggingClassifier
with a DecisionTreeClassifier
for binary classification tasks, showcasing the ensemble method’s ability to improve model accuracy and stability.