Using label_binarize()
to Transform Target Feature
Transforming multi-class labels into a binary format is essential for certain preprocessing tasks. The label_binarize()
function from scikit-learn.preprocessing
module simplifies this process.
The key hyperparameters for label_binarize()
include classes
(list of all possible class labels), neg_label
(value for negative class, default 0), pos_label
(value for positive class, default 1), and sparse_output
(whether output is sparse or dense, default False).
This function is suitable for multi-class and multi-label classification problems.
from sklearn.preprocessing import label_binarize
from sklearn.datasets import make_classification
# generate a multi-class classification dataset
X, y = make_classification(n_samples=100, n_clusters_per_class=1, n_features=5, n_classes=3, random_state=1)
# binarize the output labels
y_bin = label_binarize(y, classes=[0, 1, 2])
# display original and binarized labels for the first 5 samples
print("Original labels:", y[:5])
print("Binarized labels:\n", y_bin[:5])
Running the example gives an output like:
Original labels: [1 0 0 0 1]
Binarized labels:
[[0 1 0]
[1 0 0]
[1 0 0]
[1 0 0]
[0 1 0]]
- First, a synthetic multi-class classification dataset is generated using
make_classification()
. This dataset includes multiple classes and features. - The multi-class labels are transformed to a binary format using
label_binarize()
by specifying all possible classes. - The original and binarized labels for the first five samples are displayed to illustrate the transformation.
This example demonstrates the use of label_binarize()
for transforming multi-class labels into a binary format, which is useful for preprocessing tasks in machine learning workflows.