FunctionTransformer
allows the integration of custom functions into scikit-learn workflows.
This example uses a simple logarithmic transformation to demonstrate how FunctionTransformer
can be used for preprocessing steps in both classification and regression tasks.
The FunctionTransformer
is configured with the custom function to apply the desired transformation.
In this case, we use a log transformation. This transformer is useful for applying custom preprocessing steps in a consistent and reusable manner.
from sklearn.datasets import make_classification
from sklearn.preprocessing import FunctionTransformer
import numpy as np
# Generate a synthetic dataset
X, y = make_classification(n_samples=100, n_features=5, random_state=1)
X = np.abs(X)
# Define a custom log transformation function
def log_transform(X):
return np.log1p(X)
# Create the FunctionTransformer with the custom function
transformer = FunctionTransformer(log_transform)
# Apply the transformer to the dataset
X_transformed = transformer.fit_transform(X)
# Show original and transformed data
print("Original Data:\n", X[:5])
print("Transformed Data:\n", X_transformed[:5])
Running the example gives an output like:
Original Data:
[[1.10325445 0.49821356 0.05962247 0.89224592 0.70158632]
[1.36910947 0.19883786 0.49099577 0.57562575 0.17113665]
[0.9825172 0.58591043 0.17816707 0.57699061 0.33847597]
[1.16188579 3.03085711 0.12593507 0.7620801 0.50520809]
[0.6963714 1.54335911 1.09850848 0.50587849 0.96382716]]
Transformed Data:
[[0.74348588 0.40427344 0.05791268 0.63776444 0.53156095]
[0.86251413 0.18135264 0.3994442 0.45465249 0.15797478]
[0.68436735 0.46115865 0.1639599 0.45551835 0.29153163]
[0.77098089 1.39397904 0.11861386 0.56649498 0.40893115]
[0.5284915 0.93348569 0.74122684 0.40937645 0.6748952 ]]
- Generate a synthetic dataset: Use
make_classification()
to create a dataset with 100 samples and 5 features. - Define a custom function: Implement a log transformation function
log_transform()
that usesnp.log1p()
to avoid issues with log(0). - Create
FunctionTransformer
: InstantiateFunctionTransformer
with the custom log transformation function. - Transform the dataset: Apply
fit_transform()
to the synthetic dataset to perform the log transformation. - Display the data: Print the first five rows of both the original and transformed datasets to illustrate the effect of the transformation.
This example shows how to use FunctionTransformer
to apply custom preprocessing steps in a scikit-learn pipeline, making it easier to integrate and reuse custom transformations in your machine learning workflows.