Show that 1 minus the Jaccard similarity is a distance measure between two data objects, x and y, that satisfies the metric axioms given on page 70. Specifically, d(x, y)=1 ? J(x, y).

What will be an ideal response?

1(a). Because J(x, y) ? 1, d(x, y) ? 0.
1(b). Because J(x, x)=1, d(x, x)=0
2. Because J(x, y) = J(y, x), d(x, y) = d(y, x)
3. (Proof due to Jeffrey Ullman)
minhash(x) is the index of first nonzero entry of x

prob(minhash(x) = k) is the probability tha minhash(x) = k when x is ran-
domly permuted.

Note that prob(minhash(x) = minhash(y)) = J(x, y) (minhash lemma)
Therefore, d(x, y)=1?prob(minhash(x) = minhash(y)) = prob(minhash(x) =
minhash(y))
We have to show that,
prob(minhash(x) = minhash(z)) ? prob(minhash(x) = minhash(y)) +
prob(minhash(y) = minhash(z)
However, note that whenever minhash(x) = minhash(z), then at least one of
minhash(x) = minhash(y) and minhash(y) = minhash(z) must be true.

Computer Science & Information Technology

You might also like to view...

You can NOT move a task bar if it is ________

Fill in the blank(s) with correct word

Computer Science & Information Technology

The __________ represent(s) the printable area of a page.

a) PrintBounds b) Boundary c) MarginBounds d) Perimeter

Computer Science & Information Technology