SKLearner Home | About | Contact | Examples

Scikit-Learn label_binarize() for Data Preprocessing

Using label_binarize() to Transform Target Feature

Transforming multi-class labels into a binary format is essential for certain preprocessing tasks. The label_binarize() function from scikit-learn.preprocessing module simplifies this process.

The key hyperparameters for label_binarize() include classes (list of all possible class labels), neg_label (value for negative class, default 0), pos_label (value for positive class, default 1), and sparse_output (whether output is sparse or dense, default False).

This function is suitable for multi-class and multi-label classification problems.

from sklearn.preprocessing import label_binarize
from sklearn.datasets import make_classification

# generate a multi-class classification dataset
X, y = make_classification(n_samples=100, n_clusters_per_class=1, n_features=5, n_classes=3, random_state=1)

# binarize the output labels
y_bin = label_binarize(y, classes=[0, 1, 2])

# display original and binarized labels for the first 5 samples
print("Original labels:", y[:5])
print("Binarized labels:\n", y_bin[:5])

Running the example gives an output like:

Original labels: [1 0 0 0 1]
Binarized labels:
 [[0 1 0]
 [1 0 0]
 [1 0 0]
 [1 0 0]
 [0 1 0]]

This example demonstrates the use of label_binarize() for transforming multi-class labels into a binary format, which is useful for preprocessing tasks in machine learning workflows.



See Also