SKLearner Home | About | Contact | Examples

Scikit-Learn class_likelihood_ratios() Metric

The class_likelihood_ratios() function in scikit-learn calculates the likelihood ratios for each class in a classification problem based on the predicted probabilities. It is particularly useful when dealing with imbalanced datasets or when the cost of different types of errors varies.

The likelihood ratio for a given class is the ratio of the probability of an instance belonging to that class, given that it truly belongs to the class, to the probability of an instance belonging to that class, given that it does not truly belong to the class. A high likelihood ratio indicates that the classifier is performing well in distinguishing the class from others.

The class_likelihood_ratios() function takes the true labels and predicted probabilities as input and returns an array of likelihood ratios, one for each class. It is commonly used in conjunction with other metrics like precision, recall, and F1-score to get a more comprehensive evaluation of the classifier’s performance.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import class_likelihood_ratios

# Generate a synthetic imbalanced binary classification dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[0.8, 0.2], random_state=42)

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a logistic regression classifier
clf = LogisticRegression(random_state=42)
clf.fit(X_train, y_train)

# Predict probabilities on the test set
y_prob = clf.predict(X_test)

# Calculate the class likelihood ratios
ratios = class_likelihood_ratios(y_test, y_prob)
print(f"Class likelihood ratios: {ratios}")

Running the example gives an output like:

Class likelihood ratios: (8.925064599483203, 0.5180703959773727)

The example can be summarized as follows:

  1. Generate a synthetic imbalanced binary classification dataset using make_classification() with a class distribution of 80% for class 0 and 20% for class 1.
  2. Split the dataset into training and test sets using train_test_split().
  3. Train a logistic regression classifier on the training set using LogisticRegression.
  4. Use the trained classifier to predict class labels on the test set with predict().
  5. Calculate the class likelihood ratios using class_likelihood_ratios() by passing the true labels and predicted labels.
  6. Print the resulting array of likelihood ratios, which contains one value for each class.

The likelihood ratios provide insight into how well the classifier distinguishes each class from the others. A high ratio for a particular class indicates that the classifier is effectively identifying instances of that class, while a low ratio suggests that the classifier may be struggling to differentiate that class from others.

By examining the class likelihood ratios alongside other evaluation metrics, one can gain a more nuanced understanding of the classifier’s strengths and weaknesses, especially in scenarios involving imbalanced datasets or varying misclassification costs.



See Also