For the following sets of two-dimensional points, (1) provide a sketch of how they would be split into clusters by K-means for the given number of clusters and (2) indicate approximately where the resulting centroids would be. As- sume that we are using the squared error objective function. If you think that there is more than one possible solution, then please indicate whether each solution is a global or local minimum. Note that the label of each diagram in Figure 8.4 matches the corresponding part of this question, e.g., Figure 8.4(a) goes with part (a).



(a) K = 2. Assuming that the points are uniformly distributed in the circle,

how many possible ways are there (in theory) to partition the points

into two clusters? What can you say about the positions of the two

centroids? (Again, you don’t need to provide exact centroid locations,

just a qualitative description.)

(b) K = 3. The distance between the edges of the circles is slightly greater

than the radii of the circles.

(c) K = 3. The distance between the edges of the circles is much less than

the radii of the circles.

(d) K = 2.

(e) K = 3. Hint: Use the symmetry of the situation and remember that

we are looking for a rough sketch of what the result would be.

(a) In theory, there are an infinite number of ways to split the circle into
two clusters - just take any line that bisects the circle. This line can make any angle 0? ? ? ? 180? with the x axis. The centroids will lie
on the perpendicular bisector of the line that splits the circle into two
clusters and will be symmetrically positioned. All these solutions will
have the same, globally minimal, error.
(b) If you start with initial centroids that are real points, you will necessar-
ily get this solution because of the restriction that the circles are more

than one radius apart. Of course, the bisector could have any angle, as
above, and it could be the other circle that is split. All these solutions
have the same globally minimal error.
(c) The three boxes show the three clusters that will result in the realistic
case that the initial centroids are actual data points.
(d) In both case, the rectangles show the clusters. In the first case, the two
clusters are only a local minimum while in the second case the clusters
represent a globally minimal solution.
(e) For the solution shown in the top figure, the two top clusters are en-
closed in two boxes, while the third cluster is enclosed by the regions

defined by a triangle and a rectangle. (The two smaller clusters in the
drawing are supposed to be symmetrical.) I believe that the second

solution—suggested by a student—is also possible, although it is a lo-
cal minimum and might rarely be seen in practice for this configuration

of points. Note that while the two pie shaped cuts out of the larger cir-
cle are shown as meeting at a point, this is not necessarily the case—it

depends on the exact positions and sizes of the circles. There could be a
gap between the two pie shaped cuts which is filled by the third (larger)
cluster. (Imagine the small circles on opposite sides.) Or the boundary
between the two pie shaped cuts could actually be a line segment.

Computer Science & Information Technology

You might also like to view...

________ are the descriptive names assigned to figures in a document

A) Labels B) Captions C) Title D) References

Computer Science & Information Technology

The first stage in generating an RSA-PSS signature of a message M is to generate from M a fixed-length message digest, called an ______________.

Fill in the blank(s) with the appropriate word(s).

Computer Science & Information Technology