Scikit-Learn Get GridSearchCV "feature_names_in_" Attribute

Efficiently accessing and utilizing the feature_names_in_ attribute of the GridSearchCV object can be crucial for verifying feature names in your machine learning pipeline. This example demonstrates how to configure and retrieve the feature_names_in_ attribute to ensure that the correct feature names are being used during the grid search process.

The GridSearchCV class in scikit-learn is a powerful tool for hyperparameter tuning and model selection. It allows you to define a grid of hyperparameter values and fits a specified model for each combination of those values using cross-validation.

After fitting the GridSearchCV object, the feature_names_in_ attribute lists the names of the features seen during the fit. This is particularly useful when working with pipelines or datasets that have named features, as it helps in verifying that the expected features were used in the grid search.

import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

# Generate a synthetic classification dataset
X, y = make_classification(n_samples=1000, n_features=5, n_classes=2, random_state=42)
feature_names = ['feature1', 'feature2', 'feature3', 'feature4', 'feature5']
X = pd.DataFrame(X, columns=feature_names)

# Create a RandomForestClassifier estimator
rf = RandomForestClassifier(random_state=42)

# Define the parameter grid
param_grid = {
    'n_estimators': [5, 10, 50],
    'max_depth': [None, 5, 10],
    'min_samples_split': [2, 5, 10]
}

# Create a GridSearchCV object
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5)

# Fit the GridSearchCV object
grid_search.fit(X, y)

# Access the feature_names_in_ attribute
feature_names_in = grid_search.feature_names_in_

# View the feature names seen during fit
print(feature_names_in)

Running the example gives an output like:

['feature1' 'feature2' 'feature3' 'feature4' 'feature5']

The key steps in this example are:

Preparing a synthetic classification dataset using make_classification and converting it to a pandas DataFrame with named features.
Defining the RandomForestClassifier estimator and the parameter grid with hyperparameters to tune.
Creating a GridSearchCV object with the estimator, parameter grid, and cross-validation strategy.
Fitting the GridSearchCV object on the synthetic dataset.
Accessing the feature_names_in_ attribute from the fitted GridSearchCV object.
Using the feature_names_in_ attribute to verify the feature names seen during fitting.

See Also