Configure QuadraticDiscriminantAnalysis "priors" Parameter

The priors parameter in scikit-learn’s QuadraticDiscriminantAnalysis (QDA) allows you to specify the prior probabilities of the classes.

QDA is a classification algorithm that assumes each class has its own covariance matrix. It’s particularly useful when the assumption of shared covariance matrices in Linear Discriminant Analysis doesn’t hold.

The priors parameter affects the decision boundary by adjusting the likelihood of each class. This can be especially useful when dealing with imbalanced datasets or when you have prior knowledge about class distributions.

By default, priors is set to None, which means the class priors are estimated from the training data. You can explicitly set priors as an array-like of shape (n_classes,) to override this behavior.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.metrics import accuracy_score, f1_score

# Generate imbalanced synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
                           n_redundant=0, n_classes=2, weights=[0.9, 0.1],
                           random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different priors
priors_values = [None, [0.5, 0.5], [0.9, 0.1], [0.1, 0.9]]
results = []

for priors in priors_values:
    qda = QuadraticDiscriminantAnalysis(priors=priors)
    qda.fit(X_train, y_train)
    y_pred = qda.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    results.append((priors, accuracy, f1))
    print(f"priors={priors}, Accuracy: {accuracy:.3f}, F1-score: {f1:.3f}")

Running the example gives an output like:

priors=None, Accuracy: 0.955, F1-score: 0.791
priors=[0.5, 0.5], Accuracy: 0.865, F1-score: 0.597
priors=[0.9, 0.1], Accuracy: 0.955, F1-score: 0.791
priors=[0.1, 0.9], Accuracy: 0.730, F1-score: 0.449

The key steps in this example are:

Generate an imbalanced synthetic binary classification dataset
Split the data into train and test sets
Train QuadraticDiscriminantAnalysis models with different priors values
Evaluate the accuracy and F1-score of each model on the test set

Some tips and heuristics for setting priors:

Use None (default) when you want to estimate priors from the training data
Set equal priors (e.g., [0.5, 0.5] for binary classification) to give equal importance to all classes
Adjust priors based on domain knowledge or to counteract class imbalance
Consider using priors in combination with class weights or other balancing techniques

Issues to consider:

Setting incorrect priors can lead to biased predictions
Priors should reflect the true class distribution in the population, not just the sample
The effect of priors may vary depending on the separability of the classes and the strength of the features

See Also