SKLearner Home | About | Contact | Examples

Configure GaussianNB "var_smoothing" Parameter

The var_smoothing parameter in scikit-learn’s GaussianNB controls the amount of variance smoothing applied to data for numerical stability.

GaussianNB is a variant of the Naive Bayes classifier that assumes the features follow a Gaussian distribution. It is particularly effective for continuous data.

The var_smoothing parameter adds a small value to the variance of each feature to ensure numerical stability, preventing division by zero or very small numbers.

The default value for var_smoothing is 1e-9.

In practice, values between 1e-12 and 1e-5 are commonly used depending on the dataset’s properties.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
                           n_redundant=0, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different var_smoothing values
var_smoothing_values = [1e-12, 1e-9, 1e-6, 1e-3]
accuracies = []

for vs in var_smoothing_values:
    gnb = GaussianNB(var_smoothing=vs)
    gnb.fit(X_train, y_train)
    y_pred = gnb.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"var_smoothing={vs}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

var_smoothing=1e-12, Accuracy: 0.775
var_smoothing=1e-09, Accuracy: 0.775
var_smoothing=1e-06, Accuracy: 0.775
var_smoothing=0.001, Accuracy: 0.775

The key steps in this example are:

  1. Generate a synthetic binary classification dataset with informative and noise features.
  2. Split the data into train and test sets.
  3. Train GaussianNB models with different var_smoothing values.
  4. Evaluate the accuracy of each model on the test set.

Some tips and heuristics for setting var_smoothing:

Issues to consider:



See Also