Scikit-Learn mutual_info_regression() for Feature Selection

Mutual information measures the dependency between variables, making it useful for feature selection.

The mutual_info_regression() function in scikit-learn calculates the mutual information between each feature and the target variable for regression problems.

This example demonstrates how to use mutual_info_regression() to select informative features from a dataset with a continuous target variable.

from sklearn.datasets import make_regression
from sklearn.feature_selection import mutual_info_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# generate regression dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, random_state=1)

# calculate mutual information
mi = mutual_info_regression(X, y)
print("Mutual Information Scores:")
for i in range(X.shape[1]):
    print(f"Feature {i+1}: {mi[i]:.3f}")

# select top 5 features
k = 5
top_features = mi.argsort()[-k:][::-1]
X_selected = X[:, top_features]

print(f"\nDataset shape before feature selection: {X.shape}")
print(f"Dataset shape after feature selection: {X_selected.shape}")

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_selected, y, test_size=0.2, random_state=1)

# create and fit model
model = LinearRegression()
model.fit(X_train, y_train)

# evaluate model
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"\nMean Squared Error: {mse:.3f}")

# make a prediction
sample = [[1.21654657, -0.50937264, 1.0628977, -1.20013426, 0.09432147]]
prediction = model.predict(sample)
print(f"\nPrediction for sample: {prediction[0]:.3f}")

Running the example produces output like:

Mutual Information Scores:
Feature 1: 0.000
Feature 2: 0.000
Feature 3: 0.000
Feature 4: 0.021
Feature 5: 0.349
Feature 6: 0.263
Feature 7: 0.053
Feature 8: 0.007
Feature 9: 0.003
Feature 10: 0.013

Dataset shape before feature selection: (1000, 10)
Dataset shape after feature selection: (1000, 5)

Mean Squared Error: 204.584

Prediction for sample: 95.540

The key steps are:

Generate a synthetic regression dataset with a mix of informative and non-informative features using make_regression().
Calculate the mutual information between each feature and the target variable using mutual_info_regression().
Report the mutual information scores to identify the most informative features.
Select the top k features with the highest mutual information scores.
Compare the shape of the dataset before and after feature selection to confirm the reduction in dimensionality.
Split the dataset with selected features into training and test sets.
Fit a LinearRegression model on the training data.
Evaluate the model’s performance on the test set using mean squared error.
Demonstrate using the model to make a prediction on a new data sample.

This example showcases how mutual information can be leveraged to select a subset of informative features from a dataset. By reducing the dimensionality, the selected features can potentially improve model performance and interpretability.

See Also