Scikit-Learn MaxAbsScaler for Data Preprocessing

MaxAbsScaler is a data preprocessing technique used to scale each feature by its maximum absolute value.

It is especially useful for data with a lot of zeros or sparse data.

This scaler preserves the sparsity of the data and scales the data to the range [-1, 1]. It is suitable for preprocessing data for machine learning algorithms.

from sklearn.datasets import make_classification
from sklearn.preprocessing import MaxAbsScaler
from sklearn.model_selection import train_test_split
import numpy as np

# generate synthetic dataset
X, y = make_classification(n_samples=100, n_features=5, random_state=1)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# create and fit MaxAbsScaler
scaler = MaxAbsScaler()
scaler.fit(X_train)

# transform the train and test sets
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

# show a sample of the original and scaled data
print("Original data sample:", X_train[0])
print("Scaled data sample:", X_train_scaled[0])

Running the example gives an output like:

Original data sample: [ 0.9825172   0.58591043 -0.17816707  0.57699061  0.33847597]
Scaled data sample: [ 0.57929303  0.1933151  -0.06399996  0.25563568  0.1066192 ]

The steps are as follows:

Generate a synthetic dataset using make_classification() with specified features and a fixed random seed for reproducibility. Split the dataset into training and test sets using train_test_split().
Instantiate MaxAbsScaler and fit it on the training data using the fit() method.
Transform both the training and test sets using the transform() method of the scaler.
Display a sample of the original and scaled data to illustrate the effect of the scaling.

This example demonstrates how to apply MaxAbsScaler to a dataset, preserving sparsity and scaling features within the range of [-1, 1].

See Also