Given a set of points in Euclidean space, which are being clustered using the K-means algorithm with Euclidean distance, the triangle inequality can be used in the assignment step to avoid calculating all the distances of each point to each cluster centroid. Provide a general discussion of how this might work.
Charles Elkan presented the following theorem in his keynote speech at the
Workshop on Clustering High-Dimensional Data at SIAM 2004.
Lemma 1:Let x be a point, and let b and c be centers.
If d(b, c) ? 2d(x, b) then d(x, c) ? d(x, b).
Proof:
We know d(b, c) ? d(b, x) + d(x, c).
So d(b, c) ? d(x, b) ? d(x, c).
Now d(b, c) ? d(x, b) ? 2d(x, b) ? d(x, b) = d(x, b).
So d(x, b) ? d(x, c).
This theorem can be used to eliminate a large number of unnecessary distance
calculations.
You might also like to view...
To add details or descriptions into the Detail section:
A) type each one in individually on the query. B) drag fields from the Field List into the Detail section of the design grid. C) type each one in individually on the report. D) drag fields from the Field List into the Detail section of the report.
Use a stream member function to set the fill character to '*' for printing in field widths larger than the values being output. Repeat this statement with a stream manipulator.
What will be an ideal response?