For the Partition algorithm, prove that any frequent itemset in the database must appear as a local frequent itemset in at least one partition.
What will be an ideal response?
We can do a proof by contradiction.
Assume M transactions,
N partitions, wlog each contains M/N transactions
frequent itemset I with support S,
where S * M = number of transactions containing I,
We know that since I is a frequent itemset, then S >= min_support
or equivalently, S * M >= min_support * M.
Now assume that I is not frequent within any of the N partitions, Pi,
i.e., the support within a partition Pi is Si which is < min_support, or
equivalently Si * M/N < min_support * M/N.
Hence,
```
(S1 * M/N) + (S2 *M/N) + ... + (SN * M/N) < N * (min_support * M/N)
(S1 * M/N) + (S2 *M/N) + ... + (SN * M/N) < min_support * M
```
This contradicts the fact that the support of itemset I should be
>= min_support or equivalently that the number of transactions containing
I be >= min_support * M.
You might also like to view...
A Booleanfunction answers the question, "What type of thing are you?"
Answer the following statement true (T) or false (F)
Which hard drive type is typically used for servers?
A. SATA B. IDE C. PATA D. SCSI