RBFSampler
is a tool for approximating the Radial Basis Function (RBF) kernel. It maps the data into a higher-dimensional space, making it possible to apply linear algorithms to non-linear problems.
The primary hyperparameter of RBFSampler
is gamma
, which defines the bandwidth of the RBF kernel. This technique is useful for preprocessing in both classification and regression tasks where non-linear relationships are present.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.kernel_approximation import RBFSampler
from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score
# generate a synthetic dataset
X, y = make_classification(n_samples=100, n_features=20, random_state=1)
# split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create an RBFSampler with a specific gamma value
rbf_sampler = RBFSampler(gamma=1.0, random_state=1)
# create a linear classifier pipeline with RBFSampler
model = make_pipeline(rbf_sampler, SGDClassifier(random_state=1))
# fit the model on the training data
model.fit(X_train, y_train)
# evaluate the model
yhat = model.predict(X_test)
acc = accuracy_score(y_test, yhat)
print('Accuracy: %.3f' % acc)
# make a prediction
row = [X_test[0]]
yhat = model.predict(row)
print('Predicted: %d' % yhat[0])
Running the example gives an output like:
Accuracy: 0.500
Predicted: 0
Generate a synthetic classification dataset using
make_classification()
. This creates a dataset with a specified number of samples (n_samples
) and features (n_features
) with a fixed random seed (random_state
) for reproducibility. Split the dataset into training and test sets usingtrain_test_split()
.Create an
RBFSampler
instance with a specifiedgamma
value. This approximates the RBF kernel by mapping the input data into a higher-dimensional space.Create a pipeline using
make_pipeline()
, combiningRBFSampler
andSGDClassifier
, which is a linear classifier.Fit the pipeline on the training data using the
fit()
method.Evaluate the model by predicting on the test set and calculating the accuracy score with
accuracy_score()
.Make a single prediction using the trained model by passing a sample from the test set to the
predict()
method.
This example illustrates how to use RBFSampler
for kernel approximation in scikit-learn, enabling the use of linear classifiers on non-linear data by mapping it to a higher-dimensional space.