BernoulliNB is a Naive Bayes classifier designed for binary/boolean features. It models the probability of an input belonging to a particular class based on the presence or absence of specific features.
The key hyperparameters of BernoulliNB
include alpha
(smoothing parameter) and binarize
(threshold for binarizing features).
The algorithm is appropriate for binary classification problems.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import accuracy_score
# generate binary classification dataset
X, y = make_classification(n_samples=100, n_features=5, n_classes=2, random_state=1)
# binarize the dataset (convert features to binary values)
X = (X > 0).astype(int)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create model
model = BernoulliNB()
# fit model
model.fit(X_train, y_train)
# evaluate model
yhat = model.predict(X_test)
acc = accuracy_score(y_test, yhat)
print('Accuracy: %.3f' % acc)
# make a prediction
row = [[1, 0, 1, 0, 1]]
yhat = model.predict(row)
print('Predicted: %d' % yhat[0])
Running the example gives an output like:
Accuracy: 0.950
Predicted: 1
The steps are as follows:
First, a synthetic binary classification dataset is generated using the
make_classification()
function. This creates a dataset with a specified number of samples (n_samples
), features (n_features
), and a fixed random seed (random_state
) for reproducibility. The dataset is then binarized by converting feature values to binary (0 or 1).The dataset is split into training and test sets using
train_test_split()
.A
BernoulliNB
model is instantiated with default hyperparameters and fit on the training data using thefit()
method.The performance of the model is evaluated by comparing the predictions (
yhat
) to the actual values (y_test
) using the accuracy score metric.A single prediction can be made by passing a new binary data sample to the
predict()
method.
This example demonstrates how to use the BernoulliNB
model for binary classification tasks, highlighting its effectiveness in handling binary/boolean features.