GaussianProcessClassifier is a probabilistic classification model based on Gaussian processes, which provides flexibility in modeling complex data relationships.
The key hyperparameters of GaussianProcessClassifier
include the kernel
(defines the covariance function of the process), optimizer
(method for optimizing the log-marginal likelihood), and n_restarts_optimizer
(number of times the optimizer is restarted).
This classifier is appropriate for binary and multi-class classification problems.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.metrics import accuracy_score
# generate binary classification dataset
X, y = make_classification(n_samples=100, n_features=5, n_classes=2, random_state=1)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# define the model with RBF kernel
kernel = 1.0 * RBF(length_scale=1.0)
model = GaussianProcessClassifier(kernel=kernel, random_state=1)
# fit the model
model.fit(X_train, y_train)
# evaluate the model
yhat = model.predict(X_test)
acc = accuracy_score(y_test, yhat)
print('Accuracy: %.3f' % acc)
# make a prediction
row = [[-1.10325445, -0.49821356, -0.05962247, -0.89224592, -0.70158632]]
yhat = model.predict(row)
print('Predicted: %d' % yhat[0])
Running the example gives an output like:
Accuracy: 0.950
Predicted: 0
The steps are as follows:
First, a synthetic binary classification dataset is generated using the
make_classification()
function. This creates a dataset with a specified number of samples (n_samples
), classes (n_classes
), and a fixed random seed (random_state
) for reproducibility. The dataset is split into training and test sets usingtrain_test_split()
.Next, a
GaussianProcessClassifier
model is instantiated with a Radial Basis Function (RBF) kernel. The kernel function defines the covariance of the process and is a key component of Gaussian process models.The model is then fit on the training data using the
fit()
method.The performance of the model is evaluated by comparing the predictions (
yhat
) to the actual values (y_test
) using the accuracy score metric.A single prediction can be made by passing a new data sample to the
predict()
method.
This example demonstrates how to set up and use a GaussianProcessClassifier
model for binary classification tasks. It showcases the flexibility and effectiveness of Gaussian process models in scikit-learn, particularly for handling complex data relationships.