SequentialFeatureSelector is a feature selection method that incrementally selects features based on their contribution to the model’s performance.
This method supports both forward and backward selection strategies.
Key parameters include n_features_to_select
to specify the number of features to select and direction
to determine whether selection should be forward or backward.
It is suitable for both classification and regression tasks.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.linear_model import LogisticRegression
# generate dataset with redundant features
X, y = make_classification(n_samples=100, n_features=10, n_informative=5, n_redundant=5, random_state=1)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create model
model = LogisticRegression()
# perform sequential feature selection
sfs = SequentialFeatureSelector(model, n_features_to_select=5, direction='forward')
sfs.fit(X_train, y_train)
# transform dataset
X_train_sfs = sfs.transform(X_train)
X_test_sfs = sfs.transform(X_test)
print(f"Original shape: {X_train.shape}, Shape after feature selection: {X_train_sfs.shape}")
Running the example gives an output like:
Original shape: (80, 10), Shape after feature selection: (80, 5)
The steps are as follows:
Import necessary libraries and classes:
make_classification
,train_test_split
,SequentialFeatureSelector
, andLogisticRegression
.Generate a synthetic dataset using
make_classification()
with 10 features, 5 of which are informative and 5 redundant.Split the dataset into training and test sets using
train_test_split()
.Create a
LogisticRegression
model to be used as the estimator in feature selection.Initialize
SequentialFeatureSelector
withLogisticRegression
as the base estimator, set to select 5 features using forward selection.Fit the
SequentialFeatureSelector
on the training data.Transform the training and test datasets using the fitted feature selector.
Print the shapes of the dataset before and after feature selection to show the reduction in feature dimensions.
This example demonstrates how to use SequentialFeatureSelector
for selecting a subset of features from the original dataset. By reducing the number of features, it helps in improving the performance and interpretability of the model.