Scikit-Learn f1_score() Metric

Evaluating the performance of classification models, particularly when dealing with imbalanced datasets, can be challenging. The f1_score() metric in scikit-learn provides a balanced measure by considering both precision and recall, making it a useful tool in such scenarios.

The F1 score is the harmonic mean of precision and recall, providing a single metric that balances the two. It ranges from 0 to 1, with 1 indicating perfect precision and recall. The F1 score is calculated as 2 * (precision * recall) / (precision + recall). This metric is particularly suitable for binary and multiclass classification problems, especially those with imbalanced datasets. However, it does not reflect the number of false positives and false negatives separately, which might be important in some contexts.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import f1_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[0.7, 0.3], random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM classifier
clf = SVC(kernel='linear', C=1, random_state=42)
clf.fit(X_train, y_train)

# Predict on test set
y_pred = clf.predict(X_test)

# Calculate F1 score
f1 = f1_score(y_test, y_pred)
print(f"F1 Score: {f1:.2f}")

Running the example gives an output like:

F1 Score: 0.67

The steps are as follows:

Generate a synthetic binary classification dataset using make_classification() with class imbalance.
Split the dataset into training and test sets using train_test_split().
Train an SVC classifier on the training set.
Use the trained classifier to make predictions on the test set with predict().
Calculate the F1 score using f1_score() by comparing the predicted labels to the true labels.

First, we generate a synthetic binary classification dataset using the make_classification() function from scikit-learn. This function creates a dataset with 1000 samples and two classes, with a 70-30 class imbalance, allowing us to simulate a realistic classification problem without using real-world data.

Next, we split the dataset into training and test sets using the train_test_split() function. This step is crucial for evaluating the performance of our classifier on unseen data. We use 80% of the data for training and reserve 20% for testing.

With our data prepared, we train an SVM classifier using the SVC class from scikit-learn. We specify a linear kernel and set the regularization parameter C to 1. The fit() method is called on the classifier object, passing in the training features (X_train) and labels (y_train) to learn the underlying patterns in the data.

After training, we use the trained classifier to make predictions on the test set by calling the predict() method with X_test. This generates predicted labels for each sample in the test set.

Finally, we evaluate the F1 score of our classifier using the f1_score() function. This function takes the true labels (y_test) and the predicted labels (y_pred) as input and calculates the harmonic mean of precision and recall. The resulting F1 score is printed, giving us a balanced measure of our classifier’s performance.

This example demonstrates how to use the f1_score() function from scikit-learn to evaluate the performance of a classification model, particularly in the context of imbalanced datasets.

See Also