Scikit-Learn permutation_importance() Feature Importance

Permutation Importance is used to measure the impact of each feature on the prediction of a trained model.

The method works by shuffling each feature and evaluating the change in the model’s performance.

It is appropriate for any supervised learning model, including classification and regression.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.inspection import permutation_importance
import numpy as np

# generate regression dataset
X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=1)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# create and fit the model
model = RandomForestRegressor(random_state=1)
model.fit(X_train, y_train)

# evaluate model
results = permutation_importance(model, X_test, y_test, n_repeats=10, random_state=1)
importance = results.importances_mean
for i in range(X.shape[1]):
    print(f'Feature {i}, Importance: {importance[i]:.3f}')

Running the example gives an output like:

Feature 0, Importance: -0.016
Feature 1, Importance: 1.045
Feature 2, Importance: 0.093
Feature 3, Importance: 0.031
Feature 4, Importance: 0.069

The steps are as follows:

First, generate a synthetic regression dataset with the make_regression() function. This creates a dataset with specified samples, features, noise, and a fixed random seed for reproducibility. The dataset is split into training and test sets using train_test_split().
Next, instantiate and train a RandomForestRegressor model on the training data. The model is then fit using the fit() method.
Use the permutation_importance() function on the trained model with the test data to compute the importance of each feature. This is done by shuffling each feature and evaluating the model’s performance multiple times (specified by n_repeats).
Display the mean importance score for each feature, indicating how much each feature impacts the model’s predictions. This demonstrates how to evaluate feature importance using permutation importance in scikit-learn.

This example shows how to effectively measure and interpret feature importance using the permutation_importance() function, aiding in the understanding of model behavior and feature contributions in regression tasks.

See Also