SKLearner Home | About | Contact | Examples

Scikit-Learn LabelEncoder for Data Preprocessing

Encoding categorical labels as numeric values is a common preprocessing step for machine learning models. `

LabelEncoder` is a simple yet powerful tool for this purpose. It converts categorical labels into integers, making them suitable for algorithms that require numerical input.

from sklearn.preprocessing import LabelEncoder

# sample categorical data
labels = ['cat', 'dog', 'mouse', 'cat', 'dog', 'mouse']

# create label encoder
encoder = LabelEncoder()

# fit and transform the labels
encoded_labels = encoder.fit_transform(labels)
print('Encoded Labels:', encoded_labels)

# inverse transform the encoded labels
original_labels = encoder.inverse_transform(encoded_labels)
print('Original Labels:', original_labels)

Running the example gives an output like:

Encoded Labels: [0 1 2 0 1 2]
Original Labels: ['cat' 'dog' 'mouse' 'cat' 'dog' 'mouse']

The steps are as follows:

  1. Define a list of categorical labels that need to be encoded. In this example, the labels are ‘cat’, ‘dog’, and ‘mouse’.

  2. Create an instance of LabelEncoder.

  3. Use the fit_transform() method to convert the categorical labels to numeric values. The labels are encoded as integers.

  4. Print the encoded labels to verify the transformation. This shows the numeric representation of the original categories.

  5. Use the inverse_transform() method to convert the numeric values back to the original labels. This demonstrates the reversibility of the encoding process.

This example illustrates how to quickly set up and use LabelEncoder for encoding categorical data, which is a crucial step in preparing data for machine learning models in scikit-learn. The encoded labels can then be used as inputs for various algorithms, ensuring that categorical data is handled effectively.



See Also