Scikit-Learn accuracy_score() Metric

Accuracy is a commonly used metric for evaluating the performance of classification models. It represents the ratio of correct predictions to the total number of predictions made. In other words, accuracy tells us how often the classifier makes the right prediction.

The accuracy_score() function in scikit-learn calculates accuracy by dividing the number of correct predictions by the total number of predictions. It takes the true labels and predicted labels as input and returns a float value between 0 and 1, with 1 being perfect accuracy.

Accuracy is used for both binary and multiclass classification problems. However, it has some limitations. Accuracy can be misleading when dealing with imbalanced datasets, where the number of samples in each class varies significantly. In such cases, a classifier that simply predicts the majority class all the time can achieve high accuracy, even though it fails to classify the minority class correctly. Additionally, accuracy does not take into account the cost of different types of errors, which may vary depending on the problem at hand.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM classifier
clf = SVC(kernel='linear', C=1, random_state=42)
clf.fit(X_train, y_train)

# Predict on test set
y_pred = clf.predict(X_test)

# Calculate accuracy
acc = accuracy_score(y_test, y_pred)
print(f"Accuracy: {acc:.2f}")

Running the example gives an output like:

Accuracy: 0.87

The steps are as follows:

Generate a synthetic binary classification dataset using make_classification().
Split the dataset into training and test sets using train_test_split().
Train an SVC classifier on the training set.
Use the trained classifier to make predictions on the test set with predict().
Calculate the accuracy of the predictions using accuracy_score() by comparing the predicted labels to the true labels.

First, we generate a synthetic binary classification dataset using the make_classification() function from scikit-learn. This function creates a dataset with 1000 samples and 2 classes, allowing us to simulate a classification problem without using real-world data.

Next, we split the dataset into training and test sets using the train_test_split() function. This step is crucial for evaluating the performance of our classifier on unseen data. We use 80% of the data for training and reserve 20% for testing.

With our data prepared, we train an SVM classifier using the SVC class from scikit-learn. We specify a linear kernel and set the regularization parameter C to 1. The fit() method is called on the classifier object, passing in the training features (X_train) and labels (y_train) to learn the underlying patterns in the data.

After training, we use the trained classifier to make predictions on the test set by calling the predict() method with X_test. This generates predicted labels for each sample in the test set.

Finally, we evaluate the accuracy of our classifier using the accuracy_score() function. This function takes the true labels (y_test) and the predicted labels (y_pred) as input and calculates the ratio of correct predictions to the total number of predictions. The resulting accuracy score is printed, giving us a quantitative measure of our classifier’s performance.

This example demonstrates how to use the accuracy_score() function from scikit-learn to evaluate the performance of a binary classification model. By generating a synthetic dataset, splitting it into train and test sets, training an SVM classifier, making predictions, and calculating the accuracy score, we can assess how well our model generalizes to unseen data. Accuracy is a simple and intuitive metric, but it’s important to keep its limitations in mind, especially when dealing with imbalanced datasets or when the costs of different types of errors vary.

See Also