Scikit-Learn ARDRegression Model

ARDRegression is a Bayesian linear regression model that automatically performs feature selection during the model fitting process. It uses a prior over the model weights that leads to sparse solutions, effectively identifying and downweighting less informative features.

The key hyperparameters of ARDRegression include n_iter (number of iterations), alpha_1 and alpha_2 (shape and scale of Gamma prior over noise precision), and lambda_1 and lambda_2 (shape and scale of inverse Gamma prior over weight precision).

This algorithm is appropriate for regression problems where identifying the most relevant features is desirable for model interpretability and generalization.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ARDRegression
from sklearn.metrics import mean_squared_error

# generate regression dataset with informative and non-informative features
X, y = make_regression(n_samples=100, n_features=10, n_informative=5, noise=0.1, random_state=42)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# create model
model = ARDRegression()

# fit model
model.fit(X_train, y_train)

# evaluate model
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print('Mean Squared Error: %.3f' % mse)

# make a prediction
sample = [[0.1, 0.5, -0.3, 1.2, -0.8, 0.4, -0.2, 0.9, -0.6, 1.1]]
y_pred = model.predict(sample)
print('Predicted value: %.3f' % y_pred[0])

This example gives an output like:

Mean Squared Error: 0.014
Predicted value: 113.806

The steps are as follows:

Generate a synthetic regression dataset using make_regression(), specifying the number of informative and non-informative features. This creates a dataset with a known structure suitable for demonstrating ARDRegression’s feature selection capabilities.
Split the dataset into training and test sets using train_test_split().
Create an instance of the ARDRegression model with default hyperparameters.
Fit the model on the training data using the fit() method. During this process, the model learns the relevance of each feature.
Evaluate the model’s performance by making predictions on the test set and calculating the mean squared error using mean_squared_error().
Demonstrate making a prediction on a single new sample by passing it to the predict() method.

This example showcases how ARDRegression can automatically identify and prioritize informative features during the model fitting process. The evaluation step provides an indication of the model’s performance on unseen data, while the prediction on a new sample illustrates how the fitted model can be used for inference in real-world scenarios.

See Also