Scikit-Learn PLSRegression Model

PLSRegression (Partial Least Squares Regression) is useful for regression problems where the predictor variables are many and highly collinear. It extracts a set of orthogonal factors (components) that maximize the covariance between the predictors and the response.

The key hyperparameters of PLSRegression include the n_components (number of PLS components to extract) and scale (whether to scale the predictors before applying the model).

The algorithm is appropriate for regression problems.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.cross_decomposition import PLSRegression
from sklearn.metrics import mean_squared_error

# generate regression dataset
X, y = make_regression(n_samples=100, n_features=10, noise=0.1, random_state=1)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# create model
model = PLSRegression(n_components=2)

# fit model
model.fit(X_train, y_train)

# evaluate model
yhat = model.predict(X_test)
mse = mean_squared_error(y_test, yhat)
print('Mean Squared Error: %.3f' % mse)

# make a prediction
row = [[-0.332, 0.029, 0.436, 0.147, 1.206, -0.374, 0.555, 0.098, -0.238, -1.677]]
yhat = model.predict(row)
print('Predicted: %.3f' % yhat[0])

Running the example gives an output like:

Mean Squared Error: 157.686
Predicted: -61.811

The steps are as follows:

A synthetic regression dataset is generated using the make_regression() function. This creates a dataset with a specified number of samples (n_samples), features (n_features), and a fixed random seed (random_state) for reproducibility. The dataset is split into training and testing sets using train_test_split().
Next, a PLSRegression model is instantiated with n_components=2. The model is then fit on the training data using the fit() method.
The performance of the model is evaluated by comparing the predictions (yhat) to the actual values (y_test) using the Mean Squared Error (MSE).
A single prediction can be made by passing a new data sample to the predict() method.

This example demonstrates how to set up and use a PLSRegression model for regression tasks, highlighting its ability to handle datasets with many and collinear predictor variables effectively.

See Also