Davies-Bouldin Score is a metric used to evaluate clustering algorithms. It measures the average similarity ratio of each cluster with the most similar cluster, with lower values indicating better clustering.
The davies_bouldin_score()
function in scikit-learn calculates this score by averaging the ratio of within-cluster distances to between-cluster distances for each cluster. It takes the feature data and predicted cluster labels as input and returns a float value, with lower scores representing better clustering performance.
Davies-Bouldin Score is useful for comparing the performance of different clustering algorithms. However, it is not suitable for evaluating the performance of classification algorithms.
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import davies_bouldin_score
# Generate synthetic dataset
X, _ = make_blobs(n_samples=1000, centers=5, random_state=42)
# Fit KMeans clustering
kmeans = KMeans(n_clusters=5, random_state=42)
y_pred = kmeans.fit_predict(X)
# Calculate Davies-Bouldin Score
db_score = davies_bouldin_score(X, y_pred)
print(f"Davies-Bouldin Score: {db_score:.2f}")
Running the example gives an output like:
Davies-Bouldin Score: 0.45
The steps are as follows:
- Generate a synthetic clustering dataset using
make_blobs()
. - Fit the KMeans clustering algorithm to the dataset.
- Predict cluster labels using
fit_predict()
. - Calculate the Davies-Bouldin Score with
davies_bouldin_score()
using the features and predicted labels.
First, we generate a synthetic clustering dataset using the make_blobs()
function from scikit-learn. This function creates a dataset with 1000 samples and 5 centers, simulating a clustering problem without using real-world data.
Next, we fit a KMeans clustering model to the dataset using the KMeans
class from scikit-learn. We specify 5 clusters to match the dataset and set a random state for reproducibility. The fit_predict()
method is called on the clustering object, passing in the feature data (X
) to assign each sample to a cluster.
Finally, we evaluate the clustering performance using the davies_bouldin_score()
function. This function takes the feature data (X
) and the predicted cluster labels (y_pred
) as input and calculates the Davies-Bouldin Score, where a lower score indicates better clustering quality.
This example demonstrates how to use the davies_bouldin_score()
function from scikit-learn to evaluate the performance of a clustering model.