Consider the following schema, where the keys are underlined:

Company is an unnormalized relation that contains information on the states where companies

are registered (Name ! RegistrationState is an FD) and the industries where they belong

(company-industry is a many-to-many relationship). The City relation contains information

about the number of people living in each city. Different cities can have the same name

in different states, but city names within the same state are unique. Consider the following

query:




SELECT C.Name, T.Name


FROM Company C, City T


WHERE C.RegistrationState = T.State AND


T.Population > 9000000 AND


C.Industry = 'Apparel'


ORDER BY C.Name

Assume the following statistical information:

20,000 tuples in Company relation

5,000 tuples in City relation

50 tuples/page in each relation

6-page buer

The domain of Population consists of integers in the range of 1000 to 10,000,000

There are 50 industries represented in the database and 50 states.

Indices:

? Company relation:

On Industry: Clustered, hash

? City relation:

On State: Clustered, 2-level B+ tree

Question: Find the best execution plan for the above query, draw it, and estimate its cost.

Solution:

The index on Company cannot be used in a join, so it makes sense to push the selection and

the projection to Company. The size of Name;RegistrationState(Industry=Apparel(Company)) is:

20000/50 = 400 tuples (after selection) = 8 pages. Projection further reduces the size by 2/3

to 5.4 (i.e., 6) pages.

If we use the clustered index on City.State in the join, we would have to use the index 400 times

for each tuple in Name;RegistrationState(Industry=Apparel(Company)). But if we do it smartly only for the states that actually occur in that expression, we still would have to use the index 50times. This is cheaper than computing Population>9M(City), because this selection will need toscan City (100 pages).

Since City is clustered on State, this relation is ordered on the State attribute. Therefore,

we can pull Name;RegistrationState(Industry=Apparel(Company)) through the 6-page buer using just one page and join it with City using the index on City.State.

Thus, the total cost is: 8 + 50 = 58 I/Os.

The query plan is shown below.

Computer Science & Information Technology

You might also like to view...

What is the difference between the Windows and Word Start screens?

What will be an ideal response?

View

Computer Science & Information Technology

________ monitors and tracks the emails of a particular user. This kind of tracking is possible through digitally time-stamped records that reveal the time and date when the target receives and opens a specific ________.

A. Information tracking / data B. Email tracking / email C. System tracking / data D. Social networking tracking / cache files

View

Computer Science & Information Technology