Efficiently accessing and utilizing the feature_names_in_
attribute of the GridSearchCV
object can be crucial for verifying feature names in your machine learning pipeline. This example demonstrates how to configure and retrieve the feature_names_in_
attribute to ensure that the correct feature names are being used during the grid search process.
The GridSearchCV
class in scikit-learn is a powerful tool for hyperparameter tuning and model selection. It allows you to define a grid of hyperparameter values and fits a specified model for each combination of those values using cross-validation.
After fitting the GridSearchCV
object, the feature_names_in_
attribute lists the names of the features seen during the fit. This is particularly useful when working with pipelines or datasets that have named features, as it helps in verifying that the expected features were used in the grid search.
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
# Generate a synthetic classification dataset
X, y = make_classification(n_samples=1000, n_features=5, n_classes=2, random_state=42)
feature_names = ['feature1', 'feature2', 'feature3', 'feature4', 'feature5']
X = pd.DataFrame(X, columns=feature_names)
# Create a RandomForestClassifier estimator
rf = RandomForestClassifier(random_state=42)
# Define the parameter grid
param_grid = {
'n_estimators': [5, 10, 50],
'max_depth': [None, 5, 10],
'min_samples_split': [2, 5, 10]
}
# Create a GridSearchCV object
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5)
# Fit the GridSearchCV object
grid_search.fit(X, y)
# Access the feature_names_in_ attribute
feature_names_in = grid_search.feature_names_in_
# View the feature names seen during fit
print(feature_names_in)
Running the example gives an output like:
['feature1' 'feature2' 'feature3' 'feature4' 'feature5']
The key steps in this example are:
- Preparing a synthetic classification dataset using
make_classification
and converting it to a pandas DataFrame with named features. - Defining the
RandomForestClassifier
estimator and the parameter grid with hyperparameters to tune. - Creating a
GridSearchCV
object with the estimator, parameter grid, and cross-validation strategy. - Fitting the
GridSearchCV
object on the synthetic dataset. - Accessing the
feature_names_in_
attribute from the fittedGridSearchCV
object. - Using the
feature_names_in_
attribute to verify the feature names seen during fitting.