Consider a set of documents. Assume that all documents have been normal- ized to have unit length of 1. What is the “shape” of a cluster that consists of all documents whose cosine similarity to a centroid is greater than some specified constant? In other words, cos(d, c) ? ?, where 0 < ? ? 1.

What will be an ideal response?

Once document vectors have been normalized, they lie on am n-dimensional
hypershpere. The constraint that all documents have a minimum cosine
similarity with respect to a centroids is a constraint that the document vectors
lie within a cone, whose intersection with the sphere is a circle on the surface
of the sphere.

Computer Science & Information Technology

You might also like to view...

When copying or cutting a range of cells, the range is:

A) surrounded by a moving dashed green border. B) surrounded by a solid black border. C) surrounded by a moving dashed black border. D) surrounded by a solid green border.

Computer Science & Information Technology

In a Flat Catalog design-sometimes referred to as a(n) _______ design-all elementand attribute definitions have global scope.

Fill in the blank(s) with the appropriate word(s).

Computer Science & Information Technology