SKLearner Home | About | Contact | Examples

Scikit-Learn FunctionTransformer for Data Preprocessing

FunctionTransformer allows the integration of custom functions into scikit-learn workflows.

This example uses a simple logarithmic transformation to demonstrate how FunctionTransformer can be used for preprocessing steps in both classification and regression tasks.

The FunctionTransformer is configured with the custom function to apply the desired transformation.

In this case, we use a log transformation. This transformer is useful for applying custom preprocessing steps in a consistent and reusable manner.

from sklearn.datasets import make_classification
from sklearn.preprocessing import FunctionTransformer
import numpy as np

# Generate a synthetic dataset
X, y = make_classification(n_samples=100, n_features=5, random_state=1)
X = np.abs(X)

# Define a custom log transformation function
def log_transform(X):
    return np.log1p(X)

# Create the FunctionTransformer with the custom function
transformer = FunctionTransformer(log_transform)

# Apply the transformer to the dataset
X_transformed = transformer.fit_transform(X)

# Show original and transformed data
print("Original Data:\n", X[:5])
print("Transformed Data:\n", X_transformed[:5])

Running the example gives an output like:

Original Data:
 [[1.10325445 0.49821356 0.05962247 0.89224592 0.70158632]
 [1.36910947 0.19883786 0.49099577 0.57562575 0.17113665]
 [0.9825172  0.58591043 0.17816707 0.57699061 0.33847597]
 [1.16188579 3.03085711 0.12593507 0.7620801  0.50520809]
 [0.6963714  1.54335911 1.09850848 0.50587849 0.96382716]]
Transformed Data:
 [[0.74348588 0.40427344 0.05791268 0.63776444 0.53156095]
 [0.86251413 0.18135264 0.3994442  0.45465249 0.15797478]
 [0.68436735 0.46115865 0.1639599  0.45551835 0.29153163]
 [0.77098089 1.39397904 0.11861386 0.56649498 0.40893115]
 [0.5284915  0.93348569 0.74122684 0.40937645 0.6748952 ]]
  1. Generate a synthetic dataset: Use make_classification() to create a dataset with 100 samples and 5 features.
  2. Define a custom function: Implement a log transformation function log_transform() that uses np.log1p() to avoid issues with log(0).
  3. Create FunctionTransformer: Instantiate FunctionTransformer with the custom log transformation function.
  4. Transform the dataset: Apply fit_transform() to the synthetic dataset to perform the log transformation.
  5. Display the data: Print the first five rows of both the original and transformed datasets to illustrate the effect of the transformation.

This example shows how to use FunctionTransformer to apply custom preprocessing steps in a scikit-learn pipeline, making it easier to integrate and reuse custom transformations in your machine learning workflows.



See Also