minmax_scale
is a preprocessing function in scikit-learn that scales each feature of the dataset to a given range, typically between 0 and 1. This scaling is essential for many machine learning algorithms that require normalized data for optimal performance.
The key hyperparameters of minmax_scale
include feature_range
, which defines the desired range of transformed data.
This function is appropriate for preprocessing data for both classification and regression problems.
from sklearn.datasets import make_classification
from sklearn.preprocessing import minmax_scale
import pandas as pd
# generate a synthetic dataset
X, y = make_classification(n_samples=100, n_features=5, random_state=1)
# convert to DataFrame to better illustrate the before and after
df = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(X.shape[1])])
print("Before scaling:")
print(df.head())
# apply minmax scaling
X_scaled = minmax_scale(X, feature_range=(0, 1))
# convert scaled data to DataFrame
df_scaled = pd.DataFrame(X_scaled, columns=[f'feature_{i}' for i in range(X.shape[1])])
print("After scaling:")
print(df_scaled.head())
Running the example gives an output like:
Before scaling:
feature_0 feature_1 feature_2 feature_3 feature_4
0 -1.103254 -0.498214 -0.059622 -0.892246 -0.701586
1 -1.369109 -0.198838 0.490996 -0.575626 -0.171137
2 0.982517 0.585910 -0.178167 0.576991 0.338476
3 1.161886 3.030857 -0.125935 0.762080 0.505208
4 -0.696371 1.543359 1.098508 0.505878 0.963827
After scaling:
feature_0 feature_1 feature_2 feature_3 feature_4
0 0.146488 0.394041 0.338779 0.179071 0.219175
1 0.076730 0.445445 0.466819 0.261603 0.326029
2 0.693776 0.580190 0.311213 0.562054 0.428685
3 0.740840 1.000000 0.323359 0.610301 0.462272
4 0.253251 0.744589 0.608090 0.543517 0.554656
The steps are as follows:
First, a synthetic dataset is generated using the
make_classification()
function, creating a dataset with a specified number of samples and features. The dataset is then converted to apandas.DataFrame
for easy visualization.The original dataset is displayed to show the values before scaling.
The
minmax_scale
function is applied to the dataset, scaling all features to the range 0 to 1.The scaled dataset is converted back to a
pandas.DataFrame
and displayed to illustrate the effect of the scaling process.
This example demonstrates how to use the minmax_scale
function to preprocess data, making it suitable for use in various machine learning algorithms that perform better with normalized data.