Scikit-Learn GaussianNB Model

Gaussian Naive Bayes (GaussianNB) is a simple yet powerful algorithm for classification problems. It assumes that the features follow a normal distribution.

The key hyperparameters of GaussianNB include var_smoothing, which adds a small amount to the variance to avoid division by zero.

The algorithm is suitable for classification problems, especially binary and multi-class classification.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# generate synthetic classification dataset
X, y = make_classification(n_samples=100, n_features=5, n_classes=2, random_state=1)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# create GaussianNB model
model = GaussianNB()

# fit the model
model.fit(X_train, y_train)

# evaluate the model
yhat = model.predict(X_test)
acc = accuracy_score(y_test, yhat)
print('Accuracy: %.3f' % acc)

# make a prediction
row = [[-1.10325445, -0.49821356, -0.05962247, -0.89224592, -0.70158632]]
yhat = model.predict(row)
print('Predicted: %d' % yhat[0])

Running the example gives an output like:

Accuracy: 0.950
Predicted: 0

The steps are as follows:

A synthetic binary classification dataset is generated using the make_classification() function. This creates a dataset with a specified number of samples (n_samples), features (n_features), and classes (n_classes). A fixed random seed (random_state) ensures reproducibility. The dataset is split into training and test sets using train_test_split().
A GaussianNB model is instantiated with default hyperparameters. The model is then fit on the training data using the fit() method.
The model’s performance is evaluated by comparing the predictions (yhat) to the actual values (y_test) using the accuracy score metric.
A single prediction is made by passing a new data sample to the predict() method.

This example demonstrates how to quickly set up and use a GaussianNB model for classification tasks, showcasing the simplicity and effectiveness of this algorithm in scikit-learn.

The GaussianNB model is particularly useful for high-dimensional datasets and can be applied directly without the need for complex preprocessing steps. Once trained, the model can be used for making predictions on new data.

See Also