We can represent a data set as a collection of object nodes and a collection of attribute nodes, where there is a link between each object and each at- tribute, and where the weight of that link is the value of the object for that attribute. For sparse data, if the value is 0, the link is omitted. Bipartite clustering attempts to partition this graph into disjoint clusters, where each cluster consists of a set of object nodes and a set of attribute nodes. The objective is to maximize the weight of links between the object and attribute nodes of a cluster, while minimizing the weight of links between object and attribute links in different clusters. This type of clustering is also known as co-clustering since the objects and attributes are clustered at the same time.

(a) How is bipartite clustering (co-clustering) different from clustering the
sets of objects and attributes separately?
(b) Are there any cases in which these approaches yield the same clusters?
(c) What are the strengths and weaknesses of co-clustering as compared to
ordinary clustering?

(a) In regular clustering, only one set of constraints, related either to ob-
jects or attributes, is applied. In co-clustering both sets of constraints are applied simultaneously. Thus, partitioning the objects and at-
tributes independently of one another typically does not produce the
same results.
(b) Yes. For example, if a set of attributes is associated only with the
objects in one particular cluster, i.e., has 0 weight for objects in all other
clusters, and conversely, the set of objects in a cluster has 0 weight for
all other attributes, then the clusters found by co-clustering will match
those found by clustering the objects and attributes separately. To use
documents as an example, this would correspond to a document data
set that consists of groups of documents that only contain certain words
and groups of words that only appear in certain documents.
(c) Co-clustering automatically provides a description of a cluster of objects
in terms of attributes, which can be more useful than a description
of clusters as a partitioning of objects. However, the attributes that
distinguish different clusters of objects, may overlap significantly, and
in such cases, co-clustering will not work well.

Computer Science & Information Technology

You might also like to view...

When a user opens a form, the first record in the underlying table is displayed in the ________ View

A) Layout B) Form C) Design D) Print

Computer Science & Information Technology

Which of the following is the process of creating a numeric value that represents the original text?

A. Encryption B. Decryption C. Hashing D. Key management

Computer Science & Information Technology