Scikit-Learn add_dummy_feature() for Data Preprocessing

The add_dummy_feature() function adds a dummy feature to a dataset, which is useful for including a bias term in linear models.

This function prepends a column of ones to the feature matrix, effectively adding an intercept term.

This example demonstrates adding a dummy feature to a dataset, which can be applied to both classification and regression problems.

from sklearn.datasets import make_classification
from sklearn.preprocessing import add_dummy_feature
import numpy as np

# generate binary classification dataset
X, y = make_classification(n_samples=100, n_features=5, n_classes=2, random_state=1)

# display the original dataset
print("Original dataset:")
print(X[:5])

# add a dummy feature (column of ones)
X_new = add_dummy_feature(X)

# display the dataset with the dummy feature added
print("Dataset with dummy feature added:")
print(X_new[:5])

Running the example gives an output like:

Original dataset:
[[-1.10325445 -0.49821356 -0.05962247 -0.89224592 -0.70158632]
 [-1.36910947 -0.19883786  0.49099577 -0.57562575 -0.17113665]
 [ 0.9825172   0.58591043 -0.17816707  0.57699061  0.33847597]
 [ 1.16188579  3.03085711 -0.12593507  0.7620801   0.50520809]
 [-0.6963714   1.54335911  1.09850848  0.50587849  0.96382716]]
Dataset with dummy feature added:
[[ 1.         -1.10325445 -0.49821356 -0.05962247 -0.89224592 -0.70158632]
 [ 1.         -1.36910947 -0.19883786  0.49099577 -0.57562575 -0.17113665]
 [ 1.          0.9825172   0.58591043 -0.17816707  0.57699061  0.33847597]
 [ 1.          1.16188579  3.03085711 -0.12593507  0.7620801   0.50520809]
 [ 1.         -0.6963714   1.54335911  1.09850848  0.50587849  0.96382716]]

The steps are as follows:

First, a synthetic binary classification dataset is generated using the make_classification() function. This creates a dataset with a specified number of samples (n_samples), classes (n_classes), and a fixed random seed (random_state) for reproducibility. The dataset is displayed to show its initial structure.
Next, the add_dummy_feature() function is used to add a column of ones to the dataset. This operation modifies the feature matrix to include a bias term, which can be essential for some linear models.
Finally, the modified dataset is displayed to show the added dummy feature. This helps visualize the changes and understand how the feature matrix has been altered.

This example demonstrates how to use the add_dummy_feature() function to add a bias term to your dataset, highlighting its practical application in preparing data for machine learning models in scikit-learn.

See Also