The silhouette score measures the quality of clusters by calculating the mean silhouette coefficient for all samples.
It is calculated using the mean intra-cluster distance and the mean nearest-cluster distance for each sample.
Good values close to +1 indicate well-separated clusters; values close to 0 indicate overlapping clusters; and negative values indicate incorrectly assigned samples.
This metric is typically used for clustering problems, especially for comparing different clustering algorithms or the number of clusters. However, it has limitations, such as potential inefficiency for large datasets and difficulty in interpretation for non-convex clusters.
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
# Generate synthetic dataset
X, _ = make_blobs(n_samples=1000, centers=4, random_state=42)
# Apply KMeans clustering
kmeans = KMeans(n_clusters=4, random_state=42)
cluster_labels = kmeans.fit_predict(X)
# Calculate silhouette score
score = silhouette_score(X, cluster_labels)
print(f"Silhouette Score: {score:.2f}")
Running the example gives an output like:
Silhouette Score: 0.79
- Generate a synthetic dataset using
make_blobs()
to create a dataset with 4 centers (clusters). - Apply the KMeans clustering algorithm to the dataset with 4 clusters.
- Use the
fit_predict()
method ofKMeans
to assign cluster labels to each sample. - Calculate the silhouette score using the
silhouette_score()
function, which evaluates the quality of clustering based on the mean silhouette coefficient for all samples. - Print the silhouette score to assess the performance of the clustering algorithm.