Consider the following set of two-dimensional records:
Also consider two different clustering schemes: (1) where Cluster 1 contains records {1, 2, 3} and Cluster 2 contains records {4, 5, 6} and (2) where Cluster 1 contains records {1, 6} and Cluster 2 contains records {2, 3, 4, 5}. Which scheme is better and why?
Compare the error of the two clustering schemes. The scheme with the
smallest error is better.
For SCHEME (1) we have C1 = {1,2,3} and C2 = {4,5,6}
M1 = ((8+5+2)/3, (4+4+4)/3) = (5,4)
2 2 2 2 2 2
C1_error = (8-5) + (4-4) + (5-5) (4-4) + (2-5) + (4-4)
= 18
For C2 we have
M2 = ((2+2+8)/3, (6+8+6)/3) = (4,6.66)
2 2 2 2 2 2
C2_error = (2-4) + (6-6.66) + (2-4) (8-6.66) + (8-4) + (6-6.66)
= 26.67
C1_error + C2_error = 44.67
For SCHEME (2) we have C1 = {1,6} and C2 = {2,3,4,5}
M1 = ((8+8)/2, (4+6)/2) = (8,5)
2 2 2 2
C1_error = (8-8) + (4-5) + (8-8) (6-5)
= 2
For C2 we have
M2 = ((5+2+2+2)/4, (4+4+6+8)/4) = (2.75,5.5)
C2_error =
2 2 2 2 2 2 2 2
(5-2.75) +(4-5.5) +(2-2.75) +(4-5.5) +(2-2.75) +(6-5.5) +(2-2.75) +(8-5.5)
= 17.74
C1_error + C2_error = 19.74
SCHEME 2 is better since the error associated with it is less than that
of SCHEME (1).
You might also like to view...
RGB stands for Red, Gray, and Beige.
Answer the following statement true (T) or false (F)
What Linux distribution is the most commonly used distribution within organizations today?
A. Mandrake B. SuSE C. Debian D. Red Hat