MultiLabelBinarizer is used for converting a list of labels to a binary form, which is useful for multi-label classification tasks. This transformer creates binary matrices from input lists of labels, where each column represents a unique label.
Important hyperparameters include classes
(specifying all possible labels) and sparse_output
(for generating sparse matrices).
Suitable for multi-label classification problems where each instance can belong to multiple classes.
from sklearn.preprocessing import MultiLabelBinarizer
# example multi-label data
y = [[1, 2, 3], [1, 2], [2, 3], [1]]
# create MultiLabelBinarizer instance
mlb = MultiLabelBinarizer()
# fit and transform the data
binary_labels = mlb.fit_transform(y)
# inverse transform to get original labels
original_labels = mlb.inverse_transform(binary_labels)
print('Binary labels:\n', binary_labels)
print('Original labels:\n', original_labels)
Running the example gives an output like:
Binary labels:
[[1 1 1]
[1 1 0]
[0 1 1]
[1 0 0]]
Original labels:
[(1, 2, 3), (1, 2), (2, 3), (1,)]
The steps are as follows:
- A synthetic dataset is created, consisting of multi-label data where each sample has multiple labels.
- A
MultiLabelBinarizer
instance is created. - The
fit_transform()
method is used to fit the transformer to the data and transform the labels into binary form. - The
inverse_transform()
method is used to convert the binary labels back to the original label form.
This example demonstrates the usage of MultiLabelBinarizer
for encoding and decoding multi-label data, making it easier to handle such datasets in scikit-learn. The transformer simplifies the preprocessing step for multi-label classification tasks.