Scikit-Learn SpectralClustering Model

Spectral Clustering is an advanced clustering algorithm that applies spectral analysis to reduce the dimensions of the data before clustering. This method is particularly useful for data that is not well-separated in Euclidean space.

The key hyperparameters of SpectralClustering include the n_clusters (number of clusters), affinity (type of affinity matrix, e.g., ’nearest_neighbors’, ‘rbf’), and n_neighbors (number of neighbors to use when constructing the affinity matrix).

The algorithm is appropriate for clustering problems.

from sklearn.datasets import make_blobs
from sklearn.cluster import SpectralClustering
from sklearn.metrics import silhouette_score
import numpy as np

# generate synthetic dataset
X, _ = make_blobs(n_samples=100, centers=3, n_features=2, random_state=1)

# create model
model = SpectralClustering(n_clusters=3, affinity='nearest_neighbors', n_neighbors=10)

# fit and predict clusters
yhat = model.fit_predict(X)

# evaluate model
score = silhouette_score(X, yhat)
print('Silhouette Score: %.3f' % score)

# make a prediction for a new sample
new_samples = np.array([[0, 2], [5, 5], [-3, -2]])
new_clusters = model.fit_predict(np.vstack([X, new_samples]))
print('Predicted clusters for new samples:', new_clusters[-3:])

Running the example gives an output like:

Silhouette Score: 0.770
Predicted clusters for new samples: [0 0 0]

The steps are as follows:

First, a synthetic dataset with three clusters is generated using the make_blobs() function. This creates a dataset with a specified number of samples (n_samples), cluster centers (centers), and features (n_features), with a fixed random seed (random_state) for reproducibility.
Next, a SpectralClustering model is instantiated with 3 clusters, using the ’nearest_neighbors’ affinity and specifying the number of neighbors for the affinity matrix.
The model is then fit on the dataset and cluster assignments are predicted using the fit_predict() method.
The performance of the model is evaluated by calculating the silhouette score, which measures how similar an object is to its own cluster compared to other clusters.
Cluster predictions for new data samples are made by fitting the model on the combined original and new datasets and predicting the cluster assignments.

This example demonstrates how to set up and use a SpectralClustering model for clustering tasks, showcasing its capability to handle complex cluster structures that are not well-separated in Euclidean space.

See Also