Scikit-Learn SpectralCoclustering Model

Spectral Co-Clustering is used for clustering data in a co-clustering manner, finding blocks in data matrices. The algorithm is effective for identifying biclusters and works well with both rows and columns of a matrix.

The key hyperparameters of SpectralCoclustering include n_clusters (number of clusters) and random_state (seed for random number generation).

The algorithm is suitable for clustering problems, particularly useful in text mining and bioinformatics.

from sklearn.datasets import make_biclusters
from sklearn.cluster import SpectralCoclustering
from sklearn.metrics import consensus_score
import numpy as np

# generate synthetic bicluster dataset
data, rows, columns = make_biclusters(shape=(300, 300), n_clusters=5, random_state=0)

# create model
model = SpectralCoclustering(n_clusters=5, random_state=0)

# fit model
model.fit(data)

# evaluate model
fit_data = data[np.argsort(model.row_labels_)]
fit_data = fit_data[:, np.argsort(model.column_labels_)]
score = consensus_score(model.biclusters_, (rows, columns))
print('Consensus score: %.3f' % score)

# make a prediction
predicted_row_clusters = model.row_labels_
predicted_column_clusters = model.column_labels_
print('Predicted row clusters:', predicted_row_clusters)
print('Predicted column clusters:', predicted_column_clusters)

Running the example gives an output like:

Consensus score: 1.000
Predicted row clusters: [3 3 2 3 1 4 1 3 0 0 1 2 3 1 2 0 1 0 1 4 3 0 4 4 3 4 3 1 1 1 0 0 2 4 3 4 3
 3 1 2 3 0 0 2 0 4 1 0 2 2 0 3 4 0 0 1 4 2 1 4 4 4 0 1 1 3 1 4 2 2 4 1 1 2
 4 1 2 0 3 4 4 4 1 3 1 0 3 3 1 1 1 2 1 3 4 4 0 0 3 2 1 1 4 4 0 3 1 3 1 3 4
 4 2 0 1 0 4 1 0 2 2 4 2 0 0 0 4 3 4 2 0 4 0 0 4 3 1 1 2 1 2 0 3 2 4 4 2 2
 0 1 3 2 1 1 1 0 0 2 1 0 3 0 2 3 0 2 2 4 0 1 1 4 3 0 3 3 4 3 1 2 4 4 1 3 4
 4 1 0 3 0 3 4 2 3 4 2 0 1 0 1 1 1 0 0 0 0 1 4 3 2 1 4 4 0 3 0 3 4 1 1 3 2
 0 3 0 4 4 4 1 1 4 1 0 0 1 2 4 0 2 0 4 1 0 2 4 1 3 2 0 4 0 2 1 1 2 4 0 1 4
 2 3 4 4 1 3 2 2 4 3 1 1 4 1 3 0 3 1 4 1 0 2 4 1 4 1 2 3 0 0 0 1 1 3 2 3 1
 3 4 2 4]
Predicted column clusters: [4 0 0 2 2 1 0 3 3 3 4 4 3 4 4 4 4 3 3 2 3 0 4 3 3 1 2 0 0 0 3 4 2 2 2 4 2
 2 4 0 3 1 2 3 4 3 1 3 3 2 2 4 2 2 1 0 0 1 2 3 2 2 1 3 3 3 0 0 1 2 1 0 3 3
 2 2 2 1 1 2 2 2 3 3 3 2 0 4 4 3 4 0 3 0 3 2 4 3 3 2 4 1 4 1 1 1 2 0 4 0 4
 0 1 0 4 0 2 0 0 0 1 3 4 2 1 4 0 3 0 3 3 1 3 3 4 0 0 4 3 2 3 1 0 3 4 0 3 0
 3 0 4 0 4 2 4 1 1 0 0 4 0 0 2 3 2 2 0 2 3 4 1 1 1 0 4 3 2 0 0 3 0 4 2 4 3
 4 0 0 2 4 2 2 3 2 0 1 1 4 0 2 4 2 3 3 2 2 0 1 3 1 3 1 1 2 3 0 1 2 1 1 3 3
 4 0 4 0 2 1 2 2 0 3 1 3 3 2 4 2 4 4 3 2 0 2 1 1 2 3 4 1 2 0 2 4 0 4 2 4 3
 1 1 2 4 2 4 1 3 2 3 3 0 0 2 1 0 4 4 0 3 1 4 3 3 0 2 0 4 0 2 2 0 3 2 3 4 0
 0 2 1 3]

The steps are as follows:

Generate synthetic bicluster dataset:
- The make_biclusters() function creates a dataset with a specified shape (shape), number of clusters (n_clusters), and a fixed random seed (random_state).
Create and fit model:
- Instantiate a SpectralCoclustering model with specified hyperparameters.
- Fit the model on the synthetic dataset using the fit() method.
Evaluate model:
- Assess the model’s performance using consensus_score, comparing the predicted biclusters to the true biclusters.
- The consensus_score metric is used to evaluate how well the identified clusters match the actual clusters.
Make a prediction:
- Retrieve and display the predicted row and column clusters using the model’s row_labels_ and column_labels_ attributes.

This example demonstrates the use of SpectralCoclustering to identify biclusters in a dataset, showcasing the algorithm’s utility in clustering applications.

See Also