Discuss why a document-term matrix is an example of a data set that has asymmetric discrete or asymmetric continuous features.

What will be an ideal response?

The ijth entry of a document-term matrix is the number of times that term
j occurs in document i. Most documents contain only a small fraction of
all the possible terms, and thus, zero entries are not very meaningful, either
in describing or comparing documents. Thus, a document-term matrix has
asymmetric discrete features. If we apply a TFIDF normalization to terms
and normalize the documents to have an L2 norm of 1, then this creates a
term-document matrix with continuous features. However, the features are
still asymmetric because these transformations do not create non-zero entries
for any entries that were previously 0, and thus, zero entries are still not very
meaningful.

Computer Science & Information Technology

You might also like to view...

Excel's ________ function returns the current date and time

Fill in the blank(s) with correct word

Computer Science & Information Technology

The ________ portion of a payment is applied to the amount owed

Fill in the blank(s) with the appropriate word(s).

Computer Science & Information Technology