Efficiently accessing and utilizing the feature_names_in_ attribute of the GridSearchCV object can be crucial for verifying feature names in your machine learning pipeline. This example demonstrates how to configure and retrieve the feature_names_in_ attribute to ensure that the correct feature names are being used during the grid search process.
The GridSearchCV class in scikit-learn is a powerful tool for hyperparameter tuning and model selection. It allows you to define a grid of hyperparameter values and fits a specified model for each combination of those values using cross-validation.
After fitting the GridSearchCV object, the feature_names_in_ attribute lists the names of the features seen during the fit. This is particularly useful when working with pipelines or datasets that have named features, as it helps in verifying that the expected features were used in the grid search.
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
# Generate a synthetic classification dataset
X, y = make_classification(n_samples=1000, n_features=5, n_classes=2, random_state=42)
feature_names = ['feature1', 'feature2', 'feature3', 'feature4', 'feature5']
X = pd.DataFrame(X, columns=feature_names)
# Create a RandomForestClassifier estimator
rf = RandomForestClassifier(random_state=42)
# Define the parameter grid
param_grid = {
'n_estimators': [5, 10, 50],
'max_depth': [None, 5, 10],
'min_samples_split': [2, 5, 10]
}
# Create a GridSearchCV object
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5)
# Fit the GridSearchCV object
grid_search.fit(X, y)
# Access the feature_names_in_ attribute
feature_names_in = grid_search.feature_names_in_
# View the feature names seen during fit
print(feature_names_in)
Running the example gives an output like:
['feature1' 'feature2' 'feature3' 'feature4' 'feature5']
The key steps in this example are:
- Preparing a synthetic classification dataset using
make_classificationand converting it to a pandas DataFrame with named features. - Defining the
RandomForestClassifierestimator and the parameter grid with hyperparameters to tune. - Creating a
GridSearchCVobject with the estimator, parameter grid, and cross-validation strategy. - Fitting the
GridSearchCVobject on the synthetic dataset. - Accessing the
feature_names_in_attribute from the fittedGridSearchCVobject. - Using the
feature_names_in_attribute to verify the feature names seen during fitting.