LabelBinarizer
is a useful preprocessing tool for converting categorical labels into a binary matrix format, suitable for machine learning models that require numerical input.
The key parameter of LabelBinarizer
is sparse_output
, which determines if the output should be a sparse matrix or not.
This tool is particularly useful for preprocessing categorical data in classification tasks.
from sklearn.preprocessing import LabelBinarizer
# sample categorical labels
labels = ['cat', 'dog', 'fish', 'cat', 'dog', 'fish']
# initialize the LabelBinarizer
lb = LabelBinarizer()
# fit and transform the labels
binary_labels = lb.fit_transform(labels)
# show the binary encoded labels
print(binary_labels)
# inverse transform the binary labels back to original
original_labels = lb.inverse_transform(binary_labels)
print(original_labels)
Running the example gives an output like:
[[1 0 0]
[0 1 0]
[0 0 1]
[1 0 0]
[0 1 0]
[0 0 1]]
['cat' 'dog' 'fish' 'cat' 'dog' 'fish']
The steps are as follows:
Generate a sample list of categorical labels such as
['cat', 'dog', 'fish', 'cat', 'dog', 'fish']
.Initialize
LabelBinarizer
by creating an instance of the class.Fit and transform the labels using
fit_transform()
, which converts the categorical labels into a binary format.Display the binary encoded labels by printing the resulting binary matrix.
Convert the binary matrix back to the original labels using
inverse_transform()
, demonstrating the complete process of encoding and decoding the labels.
This example illustrates how to effectively use LabelBinarizer
for transforming categorical labels into a format suitable for machine learning algorithms, ensuring the data is ready for model training and prediction.