AdditiveChi2Sampler
is a kernel approximation method in scikit-learn. It transforms the data to make it suitable for linear classifiers by approximating the chi-squared kernel.
Key parameters include sample_steps
(number of sample steps) and sample_interval
(sample interval).
The algorithm is appropriate for kernel approximation in classification problems involving text and image data.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.kernel_approximation import AdditiveChi2Sampler
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score
import numpy as np
# generate synthetic dataset
X, y = make_classification(n_samples=100, n_features=20, random_state=1)
X = np.abs(X)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create AdditiveChi2Sampler
chi2_feature = AdditiveChi2Sampler(sample_steps=2)
# create a pipeline with AdditiveChi2Sampler and LogisticRegression
model = make_pipeline(chi2_feature, LogisticRegression())
# fit model
model.fit(X_train, y_train)
# evaluate model
yhat = model.predict(X_test)
acc = accuracy_score(y_test, yhat)
print('Accuracy: %.3f' % acc)
# make a prediction
row = [[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]]
yhat = model.predict(row)
print('Predicted: %d' % yhat[0])
Running the example gives an output like:
Accuracy: 0.600
Predicted: 0
The steps are as follows:
Generate a synthetic dataset using
make_classification()
with 100 samples and 20 features. This simulates a classification problem with random data.Split the dataset into training and test sets using
train_test_split()
.Create an
AdditiveChi2Sampler
instance withsample_steps=2
to approximate the chi-squared kernel.Use
make_pipeline()
to create a pipeline that includes theAdditiveChi2Sampler
and aLogisticRegression
model.Fit the pipeline model on the training data.
Evaluate the model by predicting the test data and calculating the accuracy score.
Make a prediction with the fitted model using a new data sample.
This example demonstrates how to use AdditiveChi2Sampler
to transform data for efficient processing with a linear classifier, such as LogisticRegression
. The pipeline simplifies the workflow, integrating both feature transformation and model training.