Scikit-Learn minmax_scale() for Data Preprocessing

minmax_scale is a preprocessing function in scikit-learn that scales each feature of the dataset to a given range, typically between 0 and 1. This scaling is essential for many machine learning algorithms that require normalized data for optimal performance.

The key hyperparameters of minmax_scale include feature_range, which defines the desired range of transformed data.

This function is appropriate for preprocessing data for both classification and regression problems.

from sklearn.datasets import make_classification
from sklearn.preprocessing import minmax_scale
import pandas as pd

# generate a synthetic dataset
X, y = make_classification(n_samples=100, n_features=5, random_state=1)

# convert to DataFrame to better illustrate the before and after
df = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(X.shape[1])])
print("Before scaling:")
print(df.head())

# apply minmax scaling
X_scaled = minmax_scale(X, feature_range=(0, 1))

# convert scaled data to DataFrame
df_scaled = pd.DataFrame(X_scaled, columns=[f'feature_{i}' for i in range(X.shape[1])])
print("After scaling:")
print(df_scaled.head())

Running the example gives an output like:

Before scaling:
   feature_0  feature_1  feature_2  feature_3  feature_4
0  -1.103254  -0.498214  -0.059622  -0.892246  -0.701586
1  -1.369109  -0.198838   0.490996  -0.575626  -0.171137
2   0.982517   0.585910  -0.178167   0.576991   0.338476
3   1.161886   3.030857  -0.125935   0.762080   0.505208
4  -0.696371   1.543359   1.098508   0.505878   0.963827
After scaling:
   feature_0  feature_1  feature_2  feature_3  feature_4
0   0.146488   0.394041   0.338779   0.179071   0.219175
1   0.076730   0.445445   0.466819   0.261603   0.326029
2   0.693776   0.580190   0.311213   0.562054   0.428685
3   0.740840   1.000000   0.323359   0.610301   0.462272
4   0.253251   0.744589   0.608090   0.543517   0.554656

The steps are as follows:

First, a synthetic dataset is generated using the make_classification() function, creating a dataset with a specified number of samples and features. The dataset is then converted to a pandas.DataFrame for easy visualization.
The original dataset is displayed to show the values before scaling.
The minmax_scale function is applied to the dataset, scaling all features to the range 0 to 1.
The scaled dataset is converted back to a pandas.DataFrame and displayed to illustrate the effect of the scaling process.

This example demonstrates how to use the minmax_scale function to preprocess data, making it suitable for use in various machine learning algorithms that perform better with normalized data.

See Also