Scikit-Learn Gaussian Process with "Product" Kernel

Gaussian Process (GP) is a powerful probabilistic model used for regression and classification tasks. It is particularly useful when dealing with small datasets or when a measure of uncertainty is required for predictions.

The Product kernel is a covariance function used in GP that models interactions between features. This kernel is suitable when the relationship between inputs and outputs involves feature interactions, making it a good choice for problems where the target variable is influenced by combinations of input features.

The key hyperparameters for the Product kernel include the length_scale and length_scale_bounds. The length_scale controls the smoothness of the function, while the length_scale_bounds define the range for optimizing this parameter. Common values for length_scale are between 0.1 and 10.

The Product kernel is appropriate for regression problems where interactions between input features are significant.

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, Product
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Prepare a synthetic dataset
X = np.random.uniform(low=-5, high=5, size=(100, 3))
y = X[:, 0] * X[:, 1] + np.sin(X[:, 2]) + np.random.normal(loc=0, scale=1, size=(100,))

# Create an instance of GaussianProcessRegressor with Product kernel
kernel = Product(RBF(length_scale=1.0), RBF(length_scale=1.0))
gp = GaussianProcessRegressor(kernel=kernel, random_state=0)

# Split the dataset into train and test portions
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Fit the model on the training data
gp.fit(X_train, y_train)

# Evaluate the model's performance using mean squared error
y_pred = gp.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

# Make a prediction using the fitted model on a test sample
test_sample = np.array([[1, 2, -1]])
pred = gp.predict(test_sample)
print(f"Predicted value for test sample: {pred[0]:.2f}")

Running the example gives an output like:

Mean Squared Error: 36.53
Predicted value for test sample: 1.42

The key steps in this code example are:

Dataset preparation: A synthetic dataset is generated where the target variable includes interactions between input features and some noise.
Model instantiation and configuration: An instance of GaussianProcessRegressor is created with the Product kernel, composed of two RBF kernels.
Model training: The dataset is split into train and test portions, and the model is fitted on the training data.
Model evaluation: The model’s performance is evaluated using mean squared error on the test set.
Inference on test sample(s): A prediction is made using the fitted model on one test sample, demonstrating how the model can be used for inference on new data.

See Also