SKLearner Home | About | Contact | Examples

Scikit-Learn MultinomialNB Model

Multinomial Naive Bayes is a classic algorithm for text classification and other discrete data. It models the probability distribution of features given a class.

The key hyperparameters of MultinomialNB include alpha, which is the smoothing parameter used to handle zero counts in the data.

The algorithm is appropriate for text classification and other problems with discrete features, such as word counts.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
import numpy as np

# generate a synthetic dataset with discrete features
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, n_informative=8, random_state=1)
X = np.abs(X.astype(int))  # make features discrete by taking absolute integer values

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# create model
model = MultinomialNB()

# fit model
model.fit(X_train, y_train)

# evaluate model
yhat = model.predict(X_test)
acc = accuracy_score(y_test, yhat)
print('Accuracy: %.3f' % acc)

# make a prediction
row = [[1, 0, 3, 2, 1, 0, 4, 2, 1, 3]]
yhat = model.predict(row)
print('Predicted: %d' % yhat[0])

Running the example gives an output like:

Accuracy: 0.775
Predicted: 1

The steps are as follows:

  1. First, a synthetic dataset with discrete features is generated using the make_classification() function. The features are converted to absolute integer values to simulate word counts or similar discrete data. The dataset is split into training and test sets using train_test_split().

  2. Next, a MultinomialNB model is instantiated with default hyperparameters. The model is then fit on the training data using the fit() method.

  3. The performance of the model is evaluated by comparing the predictions (yhat) to the actual values (y_test) using the accuracy score metric.

  4. A single prediction can be made by passing a new data sample to the predict() method.

This example demonstrates how to quickly set up and use a MultinomialNB model for text classification tasks, showcasing the simplicity and effectiveness of this algorithm in scikit-learn.



See Also