Elliptic Envelope is an algorithm used for identifying outliers in data. It fits a robust covariance estimate to the dataset, determining the shape of the data distribution and identifying points that deviate significantly.
The key hyperparameters of EllipticEnvelope
include contamination
(the proportion of outliers in the data) and support_fraction
(the proportion of points to be included in the support of the raw MCD estimate).
The algorithm is appropriate for outlier detection in various problem types, particularly useful in anomaly detection scenarios.
from sklearn.datasets import make_blobs
from sklearn.covariance import EllipticEnvelope
import matplotlib.pyplot as plt
import numpy as np
# generate synthetic dataset with outliers
X, _ = make_blobs(n_samples=300, centers=1, cluster_std=1.0, random_state=42)
X_outliers = X[::10] + 5 # introduce some outliers
X = np.concatenate([X, X_outliers], axis=0)
# create EllipticEnvelope model
model = EllipticEnvelope(contamination=0.1)
# fit model
model.fit(X)
# predict outliers
yhat = model.predict(X)
# plot the data and the outliers
plt.scatter(X[:, 0], X[:, 1], c=yhat, cmap='coolwarm', edgecolor='k', s=20)
plt.title('Elliptic Envelope Outlier Detection')
plt.show()
Running the example gives an output like:
The steps are as follows:
First, a synthetic dataset is generated using
make_blobs()
with additional outliers added manually. This creates a dataset with a single cluster and several outliers. Themake_blobs()
function is used to create a dataset with a specified number of samples (n_samples
), and a fixed random seed (random_state
) for reproducibility.The dataset is then used to fit an
EllipticEnvelope
model with a contamination parameter of 0.1, indicating that 10% of the data points are expected to be outliers. The model is fit to the data using thefit()
method.Outliers are predicted using the
predict()
method, and the results are visualized using a scatter plot to show inliers and outliers.
This example demonstrates how to quickly set up and use an EllipticEnvelope
model for outlier detection tasks, showcasing the algorithm’s ability to identify anomalies in a dataset effectively.