Scikit-Learn MinMaxScaler for Data Preprocessing

MinMaxScaler is a feature scaling technique that transforms features by scaling each feature to a given range, often between zero and one.

This scaler is useful for algorithms that require features to be within a specific range or when the features have different units or scales. The most important hyperparameter is feature_range, which sets the desired range of transformed data.

This scaler is appropriate for preprocessing data for machine learning models that are sensitive to the scale of the data.

from sklearn.preprocessing import MinMaxScaler
import numpy as np

# generate sample data
data = np.array([[10, 200], [15, 400], [20, 600], [25, 800], [30, 1000]])

# create MinMaxScaler instance
scaler = MinMaxScaler(feature_range=(0, 1))

# fit and transform the data
scaled_data = scaler.fit_transform(data)

print("Original Data:\n", data)
print("Scaled Data:\n", scaled_data)

Running the example gives an output like:

Original Data:
 [[  10  200]
 [  15  400]
 [  20  600]
 [  25  800]
 [  30 1000]]
Scaled Data:
 [[0.   0.  ]
 [0.25 0.25]
 [0.5  0.5 ]
 [0.75 0.75]
 [1.   1.  ]]

The steps are as follows:

A sample dataset is created using a NumPy array with varying scales for different features.
A MinMaxScaler instance is created with the default feature_range set to (0, 1).
The fit_transform() method is applied to the dataset, scaling the features to the specified range.
The original and scaled data are printed to compare the results.

This example shows how to use MinMaxScaler to normalize data, ensuring that all features are within a specified range. This preprocessing step can improve the performance of many machine learning algorithms.

See Also