Consider the following schema, where the keys are underlined:
Company is an unnormalized relation that contains information on the states where companies
are registered (Name ! RegistrationState is an FD) and the industries where they belong
(company-industry is a many-to-many relationship). The City relation contains information
about the number of people living in each city. Different cities can have the same name
in different states, but city names within the same state are unique. Consider the following
query:
SELECT C.Name, T.Name
FROM Company C, City T
WHERE C.RegistrationState = T.State AND
T.Population > 9000000 AND
C.Industry = 'Apparel'
ORDER BY C.Name
Assume the following statistical information:
20,000 tuples in Company relation
5,000 tuples in City relation
50 tuples/page in each relation
6-page buer
The domain of Population consists of integers in the range of 1000 to 10,000,000
There are 50 industries represented in the database and 50 states.
Indices:
? Company relation:
On Industry: Clustered, hash
? City relation:
On State: Clustered, 2-level B+ tree
Question: Find the best execution plan for the above query, draw it, and estimate its cost.
Solution:
The index on Company cannot be used in a join, so it makes sense to push the selection and
the projection to Company. The size of Name;RegistrationState(Industry=Apparel(Company)) is:
20000/50 = 400 tuples (after selection) = 8 pages. Projection further reduces the size by 2/3
to 5.4 (i.e., 6) pages.
If we use the clustered index on City.State in the join, we would have to use the index 400 times
for each tuple in Name;RegistrationState(Industry=Apparel(Company)). But if we do it smartly only for the states that actually occur in that expression, we still would have to use the index 50times. This is cheaper than computing Population>9M(City), because this selection will need toscan City (100 pages).
Since City is clustered on State, this relation is ordered on the State attribute. Therefore,
we can pull Name;RegistrationState(Industry=Apparel(Company)) through the 6-page buer using just one page and join it with City using the index on City.State.
Thus, the total cost is: 8 + 50 = 58 I/Os.
The query plan is shown below.
You might also like to view...
What is the difference between the Windows and Word Start screens?
What will be an ideal response?
________ monitors and tracks the emails of a particular user. This kind of tracking is possible through digitally time-stamped records that reveal the time and date when the target receives and opens a specific ________.
A. Information tracking / data B. Email tracking / email C. System tracking / data D. Social networking tracking / cache files