SKLearner Home | About | Contact | Examples

Scikit-Learn CategoricalNB Model

CategoricalNB is a Naive Bayes classifier tailored for categorical features. It models the probability distribution of each feature given the class label.

The key hyperparameters of CategoricalNB include alpha (additive smoothing parameter), fit_prior (whether to learn class priors), and class_prior (prior probabilities of the classes).

The algorithm is suitable for classification problems where features are categorical.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import CategoricalNB
from sklearn.metrics import accuracy_score
import numpy as np

# Generate a synthetic dataset with categorical features
X, y = make_classification(n_samples=100, n_features=5, n_classes=2, n_informative=3, random_state=1)
X = np.random.randint(0, 3, size=X.shape)  # Convert features to categorical by random integers

# Split the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# Create the model
model = CategoricalNB()

# Fit the model
model.fit(X_train, y_train)

# Evaluate the model
yhat = model.predict(X_test)
acc = accuracy_score(y_test, yhat)
print('Accuracy: %.3f' % acc)

# Make a prediction
row = [[1, 2, 0, 1, 0]]
yhat = model.predict(row)
print('Predicted: %d' % yhat[0])

Running the example gives an output like:

Accuracy: 0.350
Predicted: 0

The steps are as follows:

  1. First, a synthetic dataset is generated using the make_classification() function. The features are converted to categorical values using random integers. The dataset is split into training and testing sets using train_test_split().

  2. Next, a CategoricalNB model is instantiated with default hyperparameters. The model is then fit on the training data using the fit() method.

  3. The performance of the model is evaluated by comparing the predictions (yhat) to the actual values (y_test) using the accuracy score metric.

  4. A single prediction can be made by passing a new data sample to the predict() method.

This example demonstrates how to quickly set up and use a CategoricalNB model for classification tasks involving categorical data, showcasing the simplicity and effectiveness of this algorithm in scikit-learn.



See Also