Discuss why a document-term matrix is an example of a data set that has asymmetric discrete or asymmetric continuous features.
What will be an ideal response?
The ijth entry of a document-term matrix is the number of times that term
j occurs in document i. Most documents contain only a small fraction of
all the possible terms, and thus, zero entries are not very meaningful, either
in describing or comparing documents. Thus, a document-term matrix has
asymmetric discrete features. If we apply a TFIDF normalization to terms
and normalize the documents to have an L2 norm of 1, then this creates a
term-document matrix with continuous features. However, the features are
still asymmetric because these transformations do not create non-zero entries
for any entries that were previously 0, and thus, zero entries are still not very
meaningful.
You might also like to view...
Excel's ________ function returns the current date and time
Fill in the blank(s) with correct word
The ________ portion of a payment is applied to the amount owed
Fill in the blank(s) with the appropriate word(s).