Scikit-Learn BaggingClassifier Model

BaggingClassifier is an ensemble algorithm that improves the accuracy of machine learning models by training multiple instances of a base estimator on different subsets of the training data and combining their predictions.

The key hyperparameters of BaggingClassifier include n_estimators (number of base estimators), estimator (the model to be bagged), and max_samples (number of samples to draw from X to train each base estimator).

The algorithm is appropriate for classification problems where model stability and accuracy are critical.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# generate binary classification dataset
X, y = make_classification(n_samples=100, n_features=20, n_classes=2, random_state=1)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# create base model
base_model = DecisionTreeClassifier()

# create bagging classifier
model = BaggingClassifier(estimator=base_model, n_estimators=10, random_state=1)

# fit model
model.fit(X_train, y_train)

# evaluate model
yhat = model.predict(X_test)
acc = accuracy_score(y_test, yhat)
print('Accuracy: %.3f' % acc)

# make a prediction
row = [[-0.233, -0.123, 0.231, -0.321, 0.123, -0.456, 0.789, -0.432, 0.123, -0.234,
        -0.345, 0.456, -0.567, 0.678, -0.789, 0.890, -0.901, 0.012, -0.123, 0.234]]
yhat = model.predict(row)
print('Predicted: %d' % yhat[0])

Running the example gives an output like:

Accuracy: 0.900
Predicted: 0

The steps are as follows:

First, a synthetic binary classification dataset is generated using the make_classification() function. This creates a dataset with a specified number of samples (n_samples), features (n_features), and a fixed random seed (random_state) for reproducibility. The dataset is split into training and test sets using train_test_split().
Next, a DecisionTreeClassifier is instantiated as the base model. The BaggingClassifier is then created using this base model and specifying n_estimators=10 to use 10 base estimators.
The BaggingClassifier model is fit on the training data using the fit() method.
The performance of the model is evaluated by comparing the predictions (yhat) to the actual values (y_test) using the accuracy score metric.
A single prediction can be made by passing a new data sample to the predict() method.

This example demonstrates how to quickly set up and use a BaggingClassifier with a DecisionTreeClassifier for binary classification tasks, showcasing the ensemble method’s ability to improve model accuracy and stability.

See Also