The priors
parameter in GaussianNB
allows you to set prior probabilities for each class.
GaussianNB
is a naive Bayes classifier based on applying Bayes’ theorem with the assumption of Gaussian (normal) distribution of the features. The priors
parameter specifies the prior probabilities of the classes. If not specified, the class prior probabilities are determined from the data.
The default value for priors
is None
, meaning that class priors are inferred from the training data. In practice, setting specific priors can be useful when you have prior knowledge about the class distributions.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
n_redundant=0, n_classes=3, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different priors values
priors_values = [None, [0.2, 0.5, 0.3], [0.33, 0.33, 0.34]]
accuracies = []
for priors in priors_values:
gnb = GaussianNB(priors=priors)
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"priors={priors}, Accuracy: {accuracy:.3f}")
Running the example gives an output like:
priors=None, Accuracy: 0.705
priors=[0.2, 0.5, 0.3], Accuracy: 0.695
priors=[0.33, 0.33, 0.34], Accuracy: 0.710
The key steps in this example are:
- Generate a synthetic multi-class classification dataset with informative and noise features.
- Split the data into train and test sets.
- Train
GaussianNB
models with differentpriors
values. - Evaluate the accuracy of each model on the test set.
Some tips and heuristics for setting priors
:
- Use domain knowledge to set priors if you have prior probabilities available.
- If you do not have prior probabilities, let the model infer them from the training data by setting
priors
toNone
. - Adjust
priors
to see if the model performance improves with different assumptions about class distribution.
Issues to consider:
- Incorrectly setting
priors
can bias the model towards certain classes. - The optimal priors depend on the specific dataset and problem context.
- If the data is imbalanced, setting appropriate priors can help improve model performance.