Permutation Importance is used to measure the impact of each feature on the prediction of a trained model.
The method works by shuffling each feature and evaluating the change in the model’s performance.
It is appropriate for any supervised learning model, including classification and regression.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.inspection import permutation_importance
import numpy as np
# generate regression dataset
X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=1)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create and fit the model
model = RandomForestRegressor(random_state=1)
model.fit(X_train, y_train)
# evaluate model
results = permutation_importance(model, X_test, y_test, n_repeats=10, random_state=1)
importance = results.importances_mean
for i in range(X.shape[1]):
print(f'Feature {i}, Importance: {importance[i]:.3f}')
Running the example gives an output like:
Feature 0, Importance: -0.016
Feature 1, Importance: 1.045
Feature 2, Importance: 0.093
Feature 3, Importance: 0.031
Feature 4, Importance: 0.069
The steps are as follows:
First, generate a synthetic regression dataset with the
make_regression()
function. This creates a dataset with specified samples, features, noise, and a fixed random seed for reproducibility. The dataset is split into training and test sets usingtrain_test_split()
.Next, instantiate and train a
RandomForestRegressor
model on the training data. The model is then fit using thefit()
method.Use the
permutation_importance()
function on the trained model with the test data to compute the importance of each feature. This is done by shuffling each feature and evaluating the model’s performance multiple times (specified byn_repeats
).Display the mean importance score for each feature, indicating how much each feature impacts the model’s predictions. This demonstrates how to evaluate feature importance using permutation importance in scikit-learn.
This example shows how to effectively measure and interpret feature importance using the permutation_importance()
function, aiding in the understanding of model behavior and feature contributions in regression tasks.