The priors
parameter in scikit-learn’s LinearDiscriminantAnalysis
allows you to specify the prior probabilities of the classes.
Linear Discriminant Analysis (LDA) is a method used for classification and dimensionality reduction. It aims to find a linear combination of features that characterizes or separates two or more classes of objects or events.
The priors
parameter affects the decision boundary of the classifier. By adjusting the priors, you can influence the model’s sensitivity to different classes, which is particularly useful when dealing with imbalanced datasets.
By default, priors
is set to None
, which means the class priors are estimated from the training data. You can set priors
to a list or array of class probabilities, ensuring they sum to 1.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.metrics import balanced_accuracy_score
# Generate synthetic imbalanced dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[0.9, 0.1],
n_features=20, n_informative=2, n_redundant=2,
random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different priors
priors_list = [None, [0.5, 0.5], [0.9, 0.1]]
scores = []
for priors in priors_list:
lda = LinearDiscriminantAnalysis(priors=priors)
lda.fit(X_train, y_train)
y_pred = lda.predict(X_test)
score = balanced_accuracy_score(y_test, y_pred)
scores.append(score)
print(f"Priors={priors}, Balanced Accuracy: {score:.3f}")
Running the example gives an output like:
Priors=None, Balanced Accuracy: 0.650
Priors=[0.5, 0.5], Balanced Accuracy: 0.822
Priors=[0.9, 0.1], Balanced Accuracy: 0.650
The key steps in this example are:
- Generate a synthetic imbalanced binary classification dataset
- Split the data into train and test sets
- Train
LinearDiscriminantAnalysis
models with differentpriors
settings - Evaluate the balanced accuracy of each model on the test set
Some tips and heuristics for setting priors
:
- Use
priors
to counteract class imbalance in your dataset - Set
priors
to the inverse of class frequencies to give equal importance to all classes - Experiment with different
priors
values and evaluate their impact on model performance
Issues to consider:
- Setting
priors
incorrectly can lead to biased predictions - The effect of
priors
may vary depending on the dataset and problem - Always validate the model’s performance on a held-out test set after adjusting
priors