SKLearner Home | About | Contact | Examples

Scikit-Learn Birch Model

Birch (Balanced Iterative Reducing and Clustering using Hierarchies) is an unsupervised clustering algorithm designed for large datasets. It builds a tree structure (CF Tree) incrementally and dynamically, which helps in partitioning data points into clusters efficiently.

The key hyperparameters of Birch include n_clusters (the number of clusters to form), threshold (the radius of the sub-cluster obtained by merging a new sample and the closest sub-cluster), and branching_factor (maximum number of CF sub-clusters in each node).

The algorithm is appropriate for clustering tasks, especially when dealing with large datasets.

from sklearn.datasets import make_blobs
from sklearn.cluster import Birch
from sklearn.metrics import silhouette_score

# generate synthetic dataset
X, _ = make_blobs(n_samples=100, centers=3, n_features=2, random_state=42)

# create the model
model = Birch(n_clusters=3)

# fit the model
model.fit(X)

# predict clusters
labels = model.predict(X)

# evaluate model
score = silhouette_score(X, labels)
print('Silhouette Score: %.3f' % score)

# make a prediction
new_samples = [[-6.5, -6.5], [0.0, 0.0], [5.5, 5.5]]
predictions = model.predict(new_samples)
print('Predicted clusters: %s' % predictions)

Running the example gives an output like:

Silhouette Score: 0.847
Predicted clusters: [0 1 1]

The steps are as follows:

  1. A synthetic dataset is generated using make_blobs(), creating 100 samples with 3 centers and 2 features. This simulates a typical clustering scenario with distinct groups of data points.

  2. A Birch model is instantiated with the number of clusters (n_clusters) set to 3. The model is then fit to the dataset using the fit() method.

  3. The clusters are predicted by calling predict() on the dataset, assigning each sample to a cluster.

  4. The performance of the clustering is evaluated using the silhouette score, which measures how similar a sample is to its own cluster compared to other clusters.

  5. Predictions are made on new samples to demonstrate how the fitted Birch model assigns clusters to new data points.

This example shows how to set up and use the Birch clustering algorithm for efficient data partitioning, highlighting its suitability for large datasets and its effectiveness in clustering tasks.



See Also