Converting continuous data into discrete bins is often necessary for categorical analysis.
The KBinsDiscretizer in scikit-learn provides a way to transform continuous features into discrete bins, using different strategies such as ‘uniform’, ‘quantile’, and ‘kmeans’.
Key hyperparameters include n_bins (number of bins), encode (encoding method), and strategy (binning strategy).
This method is suitable for preprocessing steps in classification and regression tasks where discrete bins are required.
from sklearn.datasets import make_regression
from sklearn.preprocessing import KBinsDiscretizer
import numpy as np
# generate continuous dataset
X, y = make_regression(n_samples=100, n_features=1, noise=0.1, random_state=1)
# configure the transform
kbin = KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='uniform')
# fit the transform
kbin.fit(X)
# apply the transform
X_binned = kbin.transform(X)
# show before and after
print("Before binning:\n", X[:10])
print("After binning:\n", X_binned[:10])
Running the example gives an output like:
Before binning:
[[-0.61175641]
[-0.24937038]
[ 0.48851815]
[ 0.76201118]
[ 1.51981682]
[ 0.37756379]
[ 0.51292982]
[-0.67124613]
[-1.39649634]
[ 0.31563495]]
After binning:
[[1.]
[2.]
[3.]
[3.]
[4.]
[2.]
[3.]
[1.]
[1.]
[2.]]
The steps are as follows:
Generate a continuous dataset using
make_regression(). This creates a dataset with a specified number of samples (n_samples), features (n_features), noise level (noise), and a fixed random seed (random_state) for reproducibility.Instantiate
KBinsDiscretizerwithn_binsfor the number of bins,encodefor the encoding method, andstrategyfor the binning strategy.Fit the
KBinsDiscretizeron the dataset using thefit()method.Transform the dataset with the
transform()method to get binned data.Display the first 10 samples of the dataset before and after binning to show the effect of the transformation.
This example demonstrates how to use KBinsDiscretizer to convert continuous features into discrete bins, which can be useful for various preprocessing tasks in machine learning workflows.