SKLearner Home | About | Contact | Examples

Scikit-Learn RegressorChain Model

RegressorChain is a meta-estimator in scikit-learn that allows for chaining individual regression models so that each model in the chain makes use of the predictions of the previous models as additional features. This method is particularly useful for multi-output regression problems where the target variables are interdependent.

The key hyperparameters of RegressorChain include the base_estimator (the underlying regressor to use), order (the order of the chain), and cv (cross-validation strategy).

The algorithm is appropriate for multi-output regression problems.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.multioutput import RegressorChain
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# generate synthetic multi-output regression dataset
X, y = make_regression(n_samples=100, n_features=5, n_targets=3, noise=0.1, random_state=1)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# create base model
base_model = LinearRegression()

# create the RegressorChain model
model = RegressorChain(base_estimator=base_model)

# fit model
model.fit(X_train, y_train)

# evaluate model
yhat = model.predict(X_test)
mse = mean_squared_error(y_test, yhat)
print('Mean Squared Error: %.3f' % mse)

# make a prediction
row = [[0.5, -1.5, 1.0, 0.3, -0.7]]
yhat = model.predict(row)
print('Predicted: %s' % yhat)

Running the example gives an output like:

Mean Squared Error: 0.010
Predicted: [[-29.06057077 -26.77411026  -6.70856859]]

The steps are as follows:

  1. First, a synthetic multi-output regression dataset is generated using the make_regression() function. This creates a dataset with a specified number of samples (n_samples), features (n_features), and target variables (n_targets), with some added noise for realism. The dataset is split into training and test sets using train_test_split().

  2. Next, a LinearRegression model is instantiated as the base model for the RegressorChain.

  3. A RegressorChain model is then created using the base model. The fit() method is used to train the model on the training data.

  4. The performance of the model is evaluated by comparing the predictions (yhat) to the actual values (y_test) using the mean squared error metric.

  5. A single prediction can be made by passing a new data sample to the predict() method.

This example demonstrates how to set up and use a RegressorChain model for multi-output regression tasks, highlighting the ability to chain multiple regressors and handle interdependent target variables effectively. The model is fit on the training data and can be used to make predictions on new data, showcasing its applicability to real-world multi-output regression problems.



See Also