Scikit-Learn davies_bouldin_score() Metric

Davies-Bouldin Score is a metric used to evaluate clustering algorithms. It measures the average similarity ratio of each cluster with the most similar cluster, with lower values indicating better clustering.

The davies_bouldin_score() function in scikit-learn calculates this score by averaging the ratio of within-cluster distances to between-cluster distances for each cluster. It takes the feature data and predicted cluster labels as input and returns a float value, with lower scores representing better clustering performance.

Davies-Bouldin Score is useful for comparing the performance of different clustering algorithms. However, it is not suitable for evaluating the performance of classification algorithms.

from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import davies_bouldin_score

# Generate synthetic dataset
X, _ = make_blobs(n_samples=1000, centers=5, random_state=42)

# Fit KMeans clustering
kmeans = KMeans(n_clusters=5, random_state=42)
y_pred = kmeans.fit_predict(X)

# Calculate Davies-Bouldin Score
db_score = davies_bouldin_score(X, y_pred)
print(f"Davies-Bouldin Score: {db_score:.2f}")

Running the example gives an output like:

Davies-Bouldin Score: 0.45

The steps are as follows:

Generate a synthetic clustering dataset using make_blobs().
Fit the KMeans clustering algorithm to the dataset.
Predict cluster labels using fit_predict().
Calculate the Davies-Bouldin Score with davies_bouldin_score() using the features and predicted labels.

First, we generate a synthetic clustering dataset using the make_blobs() function from scikit-learn. This function creates a dataset with 1000 samples and 5 centers, simulating a clustering problem without using real-world data.

Next, we fit a KMeans clustering model to the dataset using the KMeans class from scikit-learn. We specify 5 clusters to match the dataset and set a random state for reproducibility. The fit_predict() method is called on the clustering object, passing in the feature data (X) to assign each sample to a cluster.

Finally, we evaluate the clustering performance using the davies_bouldin_score() function. This function takes the feature data (X) and the predicted cluster labels (y_pred) as input and calculates the Davies-Bouldin Score, where a lower score indicates better clustering quality.

This example demonstrates how to use the davies_bouldin_score() function from scikit-learn to evaluate the performance of a clustering model.

See Also