Configure LinearDiscriminantAnalysis "priors" Parameter

The priors parameter in scikit-learn’s LinearDiscriminantAnalysis allows you to specify the prior probabilities of the classes.

Linear Discriminant Analysis (LDA) is a method used for classification and dimensionality reduction. It aims to find a linear combination of features that characterizes or separates two or more classes of objects or events.

The priors parameter affects the decision boundary of the classifier. By adjusting the priors, you can influence the model’s sensitivity to different classes, which is particularly useful when dealing with imbalanced datasets.

By default, priors is set to None, which means the class priors are estimated from the training data. You can set priors to a list or array of class probabilities, ensuring they sum to 1.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.metrics import balanced_accuracy_score

# Generate synthetic imbalanced dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[0.9, 0.1],
                           n_features=20, n_informative=2, n_redundant=2,
                           random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different priors
priors_list = [None, [0.5, 0.5], [0.9, 0.1]]
scores = []

for priors in priors_list:
    lda = LinearDiscriminantAnalysis(priors=priors)
    lda.fit(X_train, y_train)
    y_pred = lda.predict(X_test)
    score = balanced_accuracy_score(y_test, y_pred)
    scores.append(score)
    print(f"Priors={priors}, Balanced Accuracy: {score:.3f}")

Running the example gives an output like:

Priors=None, Balanced Accuracy: 0.650
Priors=[0.5, 0.5], Balanced Accuracy: 0.822
Priors=[0.9, 0.1], Balanced Accuracy: 0.650

The key steps in this example are:

Generate a synthetic imbalanced binary classification dataset
Split the data into train and test sets
Train LinearDiscriminantAnalysis models with different priors settings
Evaluate the balanced accuracy of each model on the test set

Some tips and heuristics for setting priors:

Use priors to counteract class imbalance in your dataset
Set priors to the inverse of class frequencies to give equal importance to all classes
Experiment with different priors values and evaluate their impact on model performance

Issues to consider:

Setting priors incorrectly can lead to biased predictions
The effect of priors may vary depending on the dataset and problem
Always validate the model’s performance on a held-out test set after adjusting priors

See Also