SKLearner Home | About | Contact | Examples

Scikit-Learn Normalizer for Data Preprocessing

Normalizer is a preprocessing tool used to scale individual samples to have unit norm. This is useful for algorithms that are sensitive to the scale of the data, such as those relying on dot product calculations.

Key hyperparameters include norm, which specifies the type of norm to apply (l1, l2, or max).

Appropriate for preprocessing data for clustering or classification tasks.

from sklearn.preprocessing import Normalizer
from sklearn.datasets import make_classification
import numpy as np

# generate synthetic dataset
X, _ = make_classification(n_samples=100, n_features=5, random_state=1)

# create the Normalizer
scaler = Normalizer(norm='l2')

# fit and transform the dataset
X_normalized = scaler.fit_transform(X)

# show a sample of the data before and after normalization
print("Before normalization:")
print(X[:5, :])
print("After normalization:")
print(X_normalized[:5, :])

Running the example gives an output like:

Before normalization:
[[-1.10325445 -0.49821356 -0.05962247 -0.89224592 -0.70158632]
 [-1.36910947 -0.19883786  0.49099577 -0.57562575 -0.17113665]
 [ 0.9825172   0.58591043 -0.17816707  0.57699061  0.33847597]
 [ 1.16188579  3.03085711 -0.12593507  0.7620801   0.50520809]
 [-0.6963714   1.54335911  1.09850848  0.50587849  0.96382716]]
After normalization:
[[-0.66441004 -0.30003785 -0.03590628 -0.53733493 -0.4225145 ]
 [-0.8631935  -0.12536291  0.30956206 -0.36291941 -0.10789791]
 [ 0.73480747  0.43819219 -0.13324805  0.43152121  0.25314027]
 [ 0.34430312  0.89813782 -0.0373185   0.22582818  0.14970897]
 [-0.30367603  0.67303334  0.47904135  0.22060523  0.42030906]]

The steps are as follows:

  1. First, a synthetic dataset is generated using the make_classification() function. This creates a dataset with a specified number of samples (n_samples) and features (n_features) with a fixed random seed (random_state) for reproducibility.

  2. Next, a Normalizer is instantiated with the norm parameter set to l2. This scaler is used to ensure each sample has a unit norm.

  3. The dataset is then fit and transformed using the fit_transform() method. This scales the samples in the dataset according to the specified norm.

  4. Finally, samples of the data before and after normalization are printed to demonstrate the effect of the transformation.

This example demonstrates how to use the Normalizer to preprocess data, which can be particularly useful for clustering or classification tasks where the scale of the data impacts model performance.



See Also