Which decision tree is better, according to the MDL principle?
Consider the decision trees shown in Figure 4.3. Assume they are generated
from a data set that contains 16 binary attributes and 3 classes, C1, C2, and
C3. Compute the total description length of each decision tree according to
the minimum description length principle.
• The total description length of a tree is given by:
Cost(tree, data) = Cost(tree) + Cost(data|tree).
• Each internal node of the tree is encoded by the ID of the splitting
attribute. If there are m attributes, the cost of encoding each attribute
is log2 m bits.
• Each leaf is encoded using the ID of the class it is associated with. If
there are k classes, the cost of encoding a class is log2 k bits.
• Cost(tree) is the cost of encoding all the nodes in the tree. To simplify
the computation, you can assume that the total cost of the tree is
obtained by adding up the costs of encoding each internal node and
each leaf node.
• Cost(data|tree) is encoded using the classification errors the tree com-
mits on the training set. Each error is encoded by log2 n bits, where n
is the total number of training instances.
Because there are 16 attributes, the cost for each internal node in the decision
tree is:
log2(m) = log2(16) = 4
Furthermore, because there are 3 classes, the cost for each leaf node is:
log2(k)
=
log2(3)
= 2
The cost for each misclassification error is log2(n).
The overall cost for the decision tree (a) is 2×4+3×2+7×log2 n = 14+7 log2 n
and the overall cost for the decision tree (b) is 4×4+5×2+4×5 = 26+4 log2 n.
According to the MDL principle, tree (a) is better than (b) if n < 16 and is
worse than (b) if n > 16.
You might also like to view...
You should save a picture file as a GIF image, because it produces a good quality compressed picture with a smaller file size than other picture file types
Indicate whether the statement is true or false
A media player is a common device connected to a thick workstation
Indicate whether the statement is true or false