Spectral Co-Clustering is used for clustering data in a co-clustering manner, finding blocks in data matrices. The algorithm is effective for identifying biclusters and works well with both rows and columns of a matrix.
The key hyperparameters of SpectralCoclustering
include n_clusters
(number of clusters) and random_state
(seed for random number generation).
The algorithm is suitable for clustering problems, particularly useful in text mining and bioinformatics.
from sklearn.datasets import make_biclusters
from sklearn.cluster import SpectralCoclustering
from sklearn.metrics import consensus_score
import numpy as np
# generate synthetic bicluster dataset
data, rows, columns = make_biclusters(shape=(300, 300), n_clusters=5, random_state=0)
# create model
model = SpectralCoclustering(n_clusters=5, random_state=0)
# fit model
model.fit(data)
# evaluate model
fit_data = data[np.argsort(model.row_labels_)]
fit_data = fit_data[:, np.argsort(model.column_labels_)]
score = consensus_score(model.biclusters_, (rows, columns))
print('Consensus score: %.3f' % score)
# make a prediction
predicted_row_clusters = model.row_labels_
predicted_column_clusters = model.column_labels_
print('Predicted row clusters:', predicted_row_clusters)
print('Predicted column clusters:', predicted_column_clusters)
Running the example gives an output like:
Consensus score: 1.000
Predicted row clusters: [3 3 2 3 1 4 1 3 0 0 1 2 3 1 2 0 1 0 1 4 3 0 4 4 3 4 3 1 1 1 0 0 2 4 3 4 3
3 1 2 3 0 0 2 0 4 1 0 2 2 0 3 4 0 0 1 4 2 1 4 4 4 0 1 1 3 1 4 2 2 4 1 1 2
4 1 2 0 3 4 4 4 1 3 1 0 3 3 1 1 1 2 1 3 4 4 0 0 3 2 1 1 4 4 0 3 1 3 1 3 4
4 2 0 1 0 4 1 0 2 2 4 2 0 0 0 4 3 4 2 0 4 0 0 4 3 1 1 2 1 2 0 3 2 4 4 2 2
0 1 3 2 1 1 1 0 0 2 1 0 3 0 2 3 0 2 2 4 0 1 1 4 3 0 3 3 4 3 1 2 4 4 1 3 4
4 1 0 3 0 3 4 2 3 4 2 0 1 0 1 1 1 0 0 0 0 1 4 3 2 1 4 4 0 3 0 3 4 1 1 3 2
0 3 0 4 4 4 1 1 4 1 0 0 1 2 4 0 2 0 4 1 0 2 4 1 3 2 0 4 0 2 1 1 2 4 0 1 4
2 3 4 4 1 3 2 2 4 3 1 1 4 1 3 0 3 1 4 1 0 2 4 1 4 1 2 3 0 0 0 1 1 3 2 3 1
3 4 2 4]
Predicted column clusters: [4 0 0 2 2 1 0 3 3 3 4 4 3 4 4 4 4 3 3 2 3 0 4 3 3 1 2 0 0 0 3 4 2 2 2 4 2
2 4 0 3 1 2 3 4 3 1 3 3 2 2 4 2 2 1 0 0 1 2 3 2 2 1 3 3 3 0 0 1 2 1 0 3 3
2 2 2 1 1 2 2 2 3 3 3 2 0 4 4 3 4 0 3 0 3 2 4 3 3 2 4 1 4 1 1 1 2 0 4 0 4
0 1 0 4 0 2 0 0 0 1 3 4 2 1 4 0 3 0 3 3 1 3 3 4 0 0 4 3 2 3 1 0 3 4 0 3 0
3 0 4 0 4 2 4 1 1 0 0 4 0 0 2 3 2 2 0 2 3 4 1 1 1 0 4 3 2 0 0 3 0 4 2 4 3
4 0 0 2 4 2 2 3 2 0 1 1 4 0 2 4 2 3 3 2 2 0 1 3 1 3 1 1 2 3 0 1 2 1 1 3 3
4 0 4 0 2 1 2 2 0 3 1 3 3 2 4 2 4 4 3 2 0 2 1 1 2 3 4 1 2 0 2 4 0 4 2 4 3
1 1 2 4 2 4 1 3 2 3 3 0 0 2 1 0 4 4 0 3 1 4 3 3 0 2 0 4 0 2 2 0 3 2 3 4 0
0 2 1 3]
The steps are as follows:
Generate synthetic bicluster dataset:
- The
make_biclusters()
function creates a dataset with a specified shape (shape
), number of clusters (n_clusters
), and a fixed random seed (random_state
).
- The
Create and fit model:
- Instantiate a
SpectralCoclustering
model with specified hyperparameters. - Fit the model on the synthetic dataset using the
fit()
method.
- Instantiate a
Evaluate model:
- Assess the model’s performance using
consensus_score
, comparing the predicted biclusters to the true biclusters. - The
consensus_score
metric is used to evaluate how well the identified clusters match the actual clusters.
- Assess the model’s performance using
Make a prediction:
- Retrieve and display the predicted row and column clusters using the model’s
row_labels_
andcolumn_labels_
attributes.
- Retrieve and display the predicted row and column clusters using the model’s
This example demonstrates the use of SpectralCoclustering
to identify biclusters in a dataset, showcasing the algorithm’s utility in clustering applications.