Scikit-Learn Gaussian Process with "ExpSineSquared" Kernel

Gaussian Process (GP) is a versatile probabilistic model commonly used for regression tasks, particularly when dealing with small datasets or when uncertainty estimates are required for predictions.

The ExpSineSquared kernel is a covariance function used in GP that is well-suited for modeling periodic functions. This kernel captures the periodic patterns in the data by combining an exponential term with a sine-squared term.

The key hyperparameters for the ExpSineSquared kernel are the length_scale, periodicity, and length_scale_bounds. The length_scale controls the smoothness of the function, while periodicity determines the period of the repeating pattern. The length_scale_bounds is a tuple specifying the range of valid values for the length_scale during optimization. Common values for length_scale and periodicity depend on the specific dataset and the periodicity of the patterns present.

The ExpSineSquared kernel is appropriate for regression problems where the target variable exhibits periodic patterns or oscillations.

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import ExpSineSquared
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Prepare a synthetic dataset with periodic patterns
X = np.linspace(-5, 5, 100).reshape(-1, 1)
y = np.sin(2 * np.pi * X[:, 0]) + np.random.normal(loc=0, scale=0.1, size=(100,))

# Create an instance of GaussianProcessRegressor with ExpSineSquared kernel
kernel = ExpSineSquared(length_scale=1.0, periodicity=1.0, length_scale_bounds=(1e-3, 1e3))
gp = GaussianProcessRegressor(kernel=kernel, random_state=0)

# Split the dataset into train and test portions
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Fit the model on the training data
gp.fit(X_train, y_train)

# Evaluate the model's performance using mean squared error
y_pred = gp.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")

# Make a prediction using the fitted model on a test sample
test_sample = np.array([[1.5]])
pred = gp.predict(test_sample)
print(f"Predicted value for test sample: {pred[0]:.2f}")

Running the example gives an output like:

Mean Squared Error: 0.0385
Predicted value for test sample: -0.09

The key steps in this code example are:

Dataset preparation: A synthetic dataset with periodic patterns is generated using a sine function plus some random noise.
Model instantiation and configuration: An instance of GaussianProcessRegressor is created with the ExpSineSquared kernel, and relevant hyperparameters are set.
Model training: The dataset is split into train and test portions, and the model is fitted on the training data.
Model evaluation: The model’s performance is evaluated using mean squared error on the test set.
Inference on test sample(s): A prediction is made using the fitted model on one test sample, demonstrating how the model can be used for inference on new data.

See Also