SKLearner Home | About | Contact | Examples

Configure LinearDiscriminantAnalysis "priors" Parameter

The priors parameter in scikit-learn’s LinearDiscriminantAnalysis allows you to specify the prior probabilities of the classes.

Linear Discriminant Analysis (LDA) is a method used for classification and dimensionality reduction. It aims to find a linear combination of features that characterizes or separates two or more classes of objects or events.

The priors parameter affects the decision boundary of the classifier. By adjusting the priors, you can influence the model’s sensitivity to different classes, which is particularly useful when dealing with imbalanced datasets.

By default, priors is set to None, which means the class priors are estimated from the training data. You can set priors to a list or array of class probabilities, ensuring they sum to 1.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.metrics import balanced_accuracy_score

# Generate synthetic imbalanced dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[0.9, 0.1],
                           n_features=20, n_informative=2, n_redundant=2,
                           random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different priors
priors_list = [None, [0.5, 0.5], [0.9, 0.1]]
scores = []

for priors in priors_list:
    lda = LinearDiscriminantAnalysis(priors=priors)
    lda.fit(X_train, y_train)
    y_pred = lda.predict(X_test)
    score = balanced_accuracy_score(y_test, y_pred)
    scores.append(score)
    print(f"Priors={priors}, Balanced Accuracy: {score:.3f}")

Running the example gives an output like:

Priors=None, Balanced Accuracy: 0.650
Priors=[0.5, 0.5], Balanced Accuracy: 0.822
Priors=[0.9, 0.1], Balanced Accuracy: 0.650

The key steps in this example are:

  1. Generate a synthetic imbalanced binary classification dataset
  2. Split the data into train and test sets
  3. Train LinearDiscriminantAnalysis models with different priors settings
  4. Evaluate the balanced accuracy of each model on the test set

Some tips and heuristics for setting priors:

Issues to consider:



See Also