SKLearner Home | About | Contact | Examples

Scikit-Learn Gaussian Process with "Exponentiation" Kernel

Gaussian Process (GP) is a powerful probabilistic model used for regression and classification tasks. It is particularly useful when dealing with small datasets or when a measure of uncertainty is required for predictions.

The Exponentiation kernel is a covariance function used in GP that raises another kernel to a specified power. This kernel allows for flexible model complexity, making it suitable for capturing nonlinear relationships in the data. A common choice for the base kernel is the Radial Basis Function (RBF) kernel, which is then exponentiated to control the complexity.

The key hyperparameters for the Exponentiation kernel are the base kernel and the exponent. The base kernel typically determines the basic structure, such as the smoothness of the function, while the exponent modifies this structure to allow for more complex relationships. Common values for the exponent are typically integers, often set to 2 or 3.

The Exponentiation kernel is appropriate for regression problems where a flexible model capable of capturing nonlinear relationships is needed.

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, Exponentiation
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Prepare a synthetic dataset
X = np.random.uniform(low=-5, high=5, size=(100, 3))
y = np.sin(X[:, 0]) + np.cos(X[:, 1]) + np.tan(X[:, 2]) + np.random.normal(loc=0, scale=0.1, size=(100,))

# Create an instance of GaussianProcessRegressor with Exponentiation kernel
base_kernel = RBF(length_scale=1.0)
exp_kernel = Exponentiation(base_kernel, exponent=2)
gp = GaussianProcessRegressor(kernel=exp_kernel, random_state=0)

# Split the dataset into train and test portions
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Fit the model on the training data
gp.fit(X_train, y_train)

# Evaluate the model's performance using mean squared error
y_pred = gp.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

# Make a prediction using the fitted model on a test sample
test_sample = np.array([[1, -2, 3]])
pred = gp.predict(test_sample)
print(f"Predicted value for test sample: {pred[0]:.2f}")

Running the example gives an output like:

Mean Squared Error: 8.46
Predicted value for test sample: 0.00

The key steps in this code example are:

  1. Dataset preparation: A synthetic dataset is generated where the target variable is a nonlinear combination of the input features, plus some random noise.

  2. Model instantiation and configuration: An instance of GaussianProcessRegressor is created with an Exponentiation kernel based on an RBF kernel, and relevant hyperparameters are set.

  3. Model training: The dataset is split into train and test portions, and the model is fitted on the training data.

  4. Model evaluation: The model’s performance is evaluated using mean squared error on the test set.

  5. Inference on test sample(s): A prediction is made using the fitted model on one test sample, demonstrating how the model can be used for inference on new data.



See Also