For the data set with the attributes given below, describe how you would con- vert it into a binary transaction data set appropriate for association analysis. Specifically, indicate for each attribute in the original data set
(a) How many binary attributes it would correspond to in the transaction
data set,
(b) How the values of the original attribute would be mapped to values of
the binary attributes, and
(c) If there is any hierarchical structure in the data values of an attribute
that could be useful for grouping the data into fewer binary attributes.
The following is a list of attributes for the data set along with their possible
values. Assume that all attributes are collected on a per-student basis:
Year : Freshman, Sophomore, Junior, Senior, Graduate:Masters, Grad-
uate:PhD, Professional
Zip code : zip code for the home address of a U.S. student, zip code
for the local address of a non-U.S. student
College : Agriculture, Architecture, Continuing Education, Education,
Liberal Arts, Engineering, Natural Sciences, Business, Law, Medical,
Dentistry, Pharmacy, Nursing, Veterinary Medicine
On Campus : 1 if the student lives on campus, 0 otherwise
Each of the following is a separate attribute that has a value of 1 if the
person speaks the language and a value of 0, otherwise.
– Arabic
– Bengali
– Chinese Mandarin
– English
– Portuguese
– Russian
– Spanish
(a) Each attribute value can be represented using an asymmetric bi-
nary attribute. Therefore, there are altogether 7 binary attributes.
(b) There is a one-to-one mapping between the original attribute values
and the asymmetric binary attributes.
(c) We have a hierarchical structure involving the following high-level
concepts: Undergraduate, Graduate, Professional.
(a) Each attribute value is represented by an asymmetric binary at-
tribute. Therefore, we have as many asymmetric binary attributes
as the number of distinct zipcodes.
(b) There is a one-to-one mapping between the original attribute values
and the asymmetric binary attributes.
(c) We can have a hierarchical structure based on geographical regions
(e.g., zipcodes can be grouped according to their corresponding
states).
(a) Each attribute value is represented by an asymmetric binary at-
tribute. Therefore, we have as many asymmetric binary attributes
as the number of distinct colleges.
(b) There is a one-to-one mapping between the original attribute values
and the asymmetric binary attributes.
(c) We can have a hierarchical structure based on the type of school.
For example, colleges of Medical and Medical might be grouped
together as Medical school while Engineering and Natural Sciences
might be grouped together into the same school.
(a) This attribute can be mapped to one binary attribute.
(b) There is no hierarchical structure.
(a) This attribute can be mapped to one binary attribute.
(b) There is no hierarchical structure.
You might also like to view...
Why was it important for the ANSI committee to standardize the C language?
Fill in the blank(s) with the appropriate word(s).
Which of the following is NOT a requirement for laws and policies to deter illegal or unethical activity?
A. fear of penalty B. probability of being penalized C. probability of being caught D. fear of humiliation