SKLearner Home | About | Contact | Examples

Scikit-Learn SequentialFeatureSelector for Feature Selection

SequentialFeatureSelector is a feature selection method that incrementally selects features based on their contribution to the model’s performance.

This method supports both forward and backward selection strategies.

Key parameters include n_features_to_select to specify the number of features to select and direction to determine whether selection should be forward or backward.

It is suitable for both classification and regression tasks.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.linear_model import LogisticRegression

# generate dataset with redundant features
X, y = make_classification(n_samples=100, n_features=10, n_informative=5, n_redundant=5, random_state=1)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# create model
model = LogisticRegression()

# perform sequential feature selection
sfs = SequentialFeatureSelector(model, n_features_to_select=5, direction='forward')
sfs.fit(X_train, y_train)

# transform dataset
X_train_sfs = sfs.transform(X_train)
X_test_sfs = sfs.transform(X_test)

print(f"Original shape: {X_train.shape}, Shape after feature selection: {X_train_sfs.shape}")

Running the example gives an output like:

Original shape: (80, 10), Shape after feature selection: (80, 5)

The steps are as follows:

  1. Import necessary libraries and classes: make_classification, train_test_split, SequentialFeatureSelector, and LogisticRegression.

  2. Generate a synthetic dataset using make_classification() with 10 features, 5 of which are informative and 5 redundant.

  3. Split the dataset into training and test sets using train_test_split().

  4. Create a LogisticRegression model to be used as the estimator in feature selection.

  5. Initialize SequentialFeatureSelector with LogisticRegression as the base estimator, set to select 5 features using forward selection.

  6. Fit the SequentialFeatureSelector on the training data.

  7. Transform the training and test datasets using the fitted feature selector.

  8. Print the shapes of the dataset before and after feature selection to show the reduction in feature dimensions.

This example demonstrates how to use SequentialFeatureSelector for selecting a subset of features from the original dataset. By reducing the number of features, it helps in improving the performance and interpretability of the model.



See Also