SKLearner Home | About | Contact | Examples

Scikit-Learn OneClassSVM Model

OneClassSVM is an algorithm for anomaly detection, typically used in unsupervised settings to identify outliers or unusual data points. It works by learning a decision function for novelty detection: classifying new data as similar or different to the training set.

The key hyperparameters of OneClassSVM include the kernel (e.g., ‘rbf’), nu (an upper bound on the fraction of training errors), and gamma (kernel coefficient for ‘rbf’).

The algorithm is appropriate for anomaly detection and outlier detection in datasets.

from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.svm import OneClassSVM
from sklearn.metrics import accuracy_score

# generate a synthetic dataset
X, _ = make_blobs(n_samples=300, centers=1, cluster_std=0.60, random_state=42)

# split into train and test sets
X_train, X_test = train_test_split(X, test_size=0.2, random_state=1)

# create the OneClassSVM model
model = OneClassSVM(kernel='rbf', gamma=0.1, nu=0.1)

# fit the model
model.fit(X_train)

# make predictions
yhat_train = model.predict(X_train)
yhat_test = model.predict(X_test)

# convert predictions to binary output: 1 for inliers, 0 for outliers
yhat_train = [1 if x == 1 else 0 for x in yhat_train]
yhat_test = [1 if x == 1 else 0 for x in yhat_test]

# evaluate the model (here using a mock-up accuracy calculation for illustrative purposes)
acc_train = accuracy_score([1]*len(yhat_train), yhat_train)
acc_test = accuracy_score([1]*len(yhat_test), yhat_test)
print('Train Accuracy: %.3f' % acc_train)
print('Test Accuracy: %.3f' % acc_test)

# make a prediction on a new sample
row = [[0.5, 0.5]]
yhat = model.predict(row)
print('Predicted: %d' % (1 if yhat[0] == 1 else 0))

Running the example gives an output like:

Train Accuracy: 0.896
Test Accuracy: 0.883
Predicted: 0

The steps are as follows:

  1. A synthetic dataset with one cluster of points is generated using make_blobs(), creating a scenario where anomalies can be detected as points far from the cluster center. The dataset is split into training and test sets using train_test_split().

  2. A OneClassSVM model is instantiated with the rbf kernel, a gamma value of 0.1, and nu set to 0.1. The model is fit on the training data.

  3. Predictions are made on both the training and test datasets. Predictions are converted to binary values: 1 for inliers (normal data) and 0 for outliers (anomalies).

  4. The model’s performance is evaluated using accuracy, although in practice, other metrics like precision and recall might be more appropriate for anomaly detection.

  5. Finally, a single new sample is predicted using the fitted model to determine if it is an inlier or an outlier.

This example demonstrates how to set up and use a OneClassSVM model for anomaly detection, showing the simplicity and effectiveness of this algorithm in scikit-learn.



See Also