Scikit-Learn Gaussian Process with "DotProduct" Kernel

Gaussian Process (GP) is a powerful probabilistic model used for regression and classification tasks. It is particularly useful when dealing with small datasets or when a measure of uncertainty is required for predictions.

The DotProduct kernel is a covariance function used in GP that calculates the dot product between input vectors. This kernel is suitable when the relationship between inputs and outputs is linear, making it a good choice for problems where the target variable is a linear combination of the input features.

The key hyperparameters for the DotProduct kernel are the variance and sigma_0. The variance controls the overall magnitude of the covariance, while sigma_0 is a constant added to the diagonal of the covariance matrix for numerical stability. Common values for variance are between 1 and 10, and sigma_0 is typically set to a small value like 1e-5.

The DotProduct kernel is appropriate for regression problems where a linear relationship between inputs and outputs is expected.

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import DotProduct
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Prepare a synthetic dataset
X = np.random.uniform(low=-5, high=5, size=(100, 3))
y = 2 * X[:, 0] - 3 * X[:, 1] + X[:, 2] + np.random.normal(loc=0, scale=1, size=(100,))

# Create an instance of GaussianProcessRegressor with DotProduct kernel
kernel = DotProduct(sigma_0=1e-5)
gp = GaussianProcessRegressor(kernel=kernel, random_state=0)

# Split the dataset into train and test portions
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Fit the model on the training data
gp.fit(X_train, y_train)

# Evaluate the model's performance using mean squared error
y_pred = gp.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

# Make a prediction using the fitted model on a test sample
test_sample = np.array([[1, -2, 3]])
pred = gp.predict(test_sample)
print(f"Predicted value for test sample: {pred[0]:.2f}")

Running the example gives an output like:

Mean Squared Error: 1.28
Predicted value for test sample: 11.16

The key steps in this code example are:

Dataset preparation: A synthetic dataset is generated where the target variable is a linear combination of the input features, plus some random noise.
Model instantiation and configuration: An instance of GaussianProcessRegressor is created with the DotProduct kernel, and relevant hyperparameters are set.
Model training: The dataset is split into train and test portions, and the model is fitted on the training data.
Model evaluation: The model’s performance is evaluated using mean squared error on the test set.
Inference on test sample(s): A prediction is made using the fitted model on one test sample, demonstrating how the model can be used for inference on new data.

See Also