CategoricalNB
is a Naive Bayes classifier tailored for categorical features. It models the probability distribution of each feature given the class label.
The key hyperparameters of CategoricalNB
include alpha
(additive smoothing parameter), fit_prior
(whether to learn class priors), and class_prior
(prior probabilities of the classes).
The algorithm is suitable for classification problems where features are categorical.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import CategoricalNB
from sklearn.metrics import accuracy_score
import numpy as np
# Generate a synthetic dataset with categorical features
X, y = make_classification(n_samples=100, n_features=5, n_classes=2, n_informative=3, random_state=1)
X = np.random.randint(0, 3, size=X.shape) # Convert features to categorical by random integers
# Split the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# Create the model
model = CategoricalNB()
# Fit the model
model.fit(X_train, y_train)
# Evaluate the model
yhat = model.predict(X_test)
acc = accuracy_score(y_test, yhat)
print('Accuracy: %.3f' % acc)
# Make a prediction
row = [[1, 2, 0, 1, 0]]
yhat = model.predict(row)
print('Predicted: %d' % yhat[0])
Running the example gives an output like:
Accuracy: 0.350
Predicted: 0
The steps are as follows:
First, a synthetic dataset is generated using the
make_classification()
function. The features are converted to categorical values using random integers. The dataset is split into training and testing sets usingtrain_test_split()
.Next, a
CategoricalNB
model is instantiated with default hyperparameters. The model is then fit on the training data using thefit()
method.The performance of the model is evaluated by comparing the predictions (
yhat
) to the actual values (y_test
) using the accuracy score metric.A single prediction can be made by passing a new data sample to the
predict()
method.
This example demonstrates how to quickly set up and use a CategoricalNB
model for classification tasks involving categorical data, showcasing the simplicity and effectiveness of this algorithm in scikit-learn.