CategoricalNB and MultinomialNB are two Naive Bayes algorithms in scikit-learn, each tailored for different types of data. This example compares these algorithms, focusing on their performance with categorical features.
CategoricalNB
is designed for categorical features. It calculates probabilities based on frequency counts and has key hyperparameters such as alpha
(smoothing parameter) and fit_prior
(whether to learn class prior probabilities).
MultinomialNB
is typically used for discrete features like word counts in text classification. Its key hyperparameters include alpha
(smoothing parameter) and fit_prior
.
The main difference is that CategoricalNB
is suited for categorical data, while MultinomialNB
works with count-based data. The choice depends on the nature of the dataset’s features.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import CategoricalNB, MultinomialNB
from sklearn.metrics import accuracy_score, f1_score
import numpy as np
# Generate synthetic dataset
X, y = make_classification(n_samples=100, n_features=10, n_informative=5, n_redundant=5, random_state=42)
X = np.array(X, dtype=int) # Convert to integer values for categorical treatment
X = np.abs(X)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Fit and evaluate CategoricalNB
cnb = CategoricalNB()
cnb.fit(X_train, y_train)
y_pred_cnb = cnb.predict(X_test)
print(f"CategoricalNB accuracy: {accuracy_score(y_test, y_pred_cnb):.3f}")
print(f"CategoricalNB F1 score: {f1_score(y_test, y_pred_cnb):.3f}")
# Fit and evaluate MultinomialNB
mnb = MultinomialNB()
mnb.fit(X_train, y_train)
y_pred_mnb = mnb.predict(X_test)
print(f"\nMultinomialNB accuracy: {accuracy_score(y_test, y_pred_mnb):.3f}")
print(f"MultinomialNB F1 score: {f1_score(y_test, y_pred_mnb):.3f}")
Running the example gives an output like:
CategoricalNB accuracy: 0.600
CategoricalNB F1 score: 0.556
MultinomialNB accuracy: 0.550
MultinomialNB F1 score: 0.400
The steps are as follows:
- Generate a synthetic classification dataset with categorical features using
make_classification
and convert features to integer values. - Split the data into training and test sets with
train_test_split
. - Instantiate and fit
CategoricalNB
on the training data, then evaluate its performance on the test set, recording accuracy and F1 scores. - Instantiate and fit
MultinomialNB
on the training data, then evaluate its performance on the test set, recording accuracy and F1 scores. - Compare the test set performance of
CategoricalNB
andMultinomialNB
by examining their accuracy and F1 scores.