Scikit-Learn mutual_info_classif() for Feature Selection

Mutual information measures the dependency between two variables and can be used for feature selection in classification problems. The mutual_info_classif() function from scikit-learn returns a score for each feature, allowing the selection of the most relevant features.

The function is suitable for classification problems and is a useful tool for dimensionality reduction and improving model performance by focusing on the most informative features.

from sklearn.datasets import make_classification
from sklearn.feature_selection import mutual_info_classif
from sklearn.feature_selection import SelectKBest

# generate dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=1)
print(X.shape, y.shape)

# calculate mutual information scores
scorer = mutual_info_classif(X, y)

# select top 5 features
k = 5
top_k_features = SelectKBest(mutual_info_classif, k=k)
top_k_features.fit(X, y)
mask = top_k_features.get_support()

# report scores and selected features
print(scorer)
print(mask)

# create new dataset with selected features
X_new = top_k_features.transform(X)
print(X_new.shape)

Running the example gives an output like:

(1000, 10) (1000,)
[0.07269592 0.08224524 0.04956311 0.16258819 0.01641024 0.0819586
 0.06882626 0.03653101 0.02723712 0.        ]
[ True  True False  True False  True  True False False False]
(1000, 5)

The steps are as follows:

First, a synthetic classification dataset is generated using make_classification(). The shape of the dataset is reported, showing the number of samples and features.
The mutual_info_classif() function is used to calculate the mutual information score between each feature and the target variable. The scores are printed, indicating the level of dependency between each feature and the target.
The SelectKBest class is used to select the top 5 features with the highest mutual information scores. The get_support() method returns a boolean mask indicating which features were selected.
A new dataset X_new is created by transforming the original dataset X using the transform() method of the fitted SelectKBest instance. This new dataset contains only the selected features, effectively reducing the dimensionality of the data.

This example demonstrates how to use mutual information to select the most informative features for a classification problem. By reducing the number of features, the complexity of the model can be reduced, potentially leading to improved performance and faster training times.

See Also