SKLearner Home | About | Contact | Examples

Scikit-Learn classification_report() Metric

The classification_report() function in scikit-learn generates a text report showing the main classification metrics. It provides a comprehensive overview of a classifier’s performance, including precision, recall, F1-score, and support for each class, as well as overall accuracy and macro/weighted averages.

This function is commonly used for evaluating the performance of classification algorithms on both binary and multiclass problems. It offers a detailed breakdown of metrics for each individual class, making it particularly useful when dealing with imbalanced datasets.

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_informative=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Random Forest classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Predict on test set
y_pred = clf.predict(X_test)

# Generate classification report
report = classification_report(y_test, y_pred)
print(report)

Running the example gives an output like:

              precision    recall  f1-score   support

           0       0.77      0.96      0.85        55
           1       0.86      0.86      0.86        72
           2       0.92      0.74      0.82        73

    accuracy                           0.84       200
   macro avg       0.85      0.85      0.84       200
weighted avg       0.86      0.84      0.84       200

The steps are as follows:

  1. Generate a synthetic multiclass classification dataset using make_classification().
  2. Split the dataset into training and test sets using train_test_split().
  3. Train a Random Forest classifier on the training set.
  4. Use the trained classifier to make predictions on the test set.
  5. Generate the classification report using classification_report() by passing the true labels and predicted labels.

First, we create a synthetic multiclass classification dataset using the make_classification() function. This dataset contains 1000 samples and 3 classes, with 5 informative features.

Next, we split the data into training and test sets using train_test_split(), allocating 80% for training and 20% for testing.

We then train a RandomForestClassifier on the training data. Random Forest is an ensemble method that combines multiple decision trees to make robust predictions.

After training, we use the classifier to predict labels for the test set samples using the predict() method.

Finally, we generate the classification report by calling classification_report() with the true labels (y_test) and predicted labels (y_pred). This function computes various metrics like precision, recall, and F1-score for each class, as well as macro and weighted averages across all classes.

The resulting report provides a comprehensive evaluation of the classifier’s performance, highlighting its strengths and weaknesses in predicting each class. This information can be used to assess the model’s effectiveness and guide further improvements.



See Also