Consider the following schema, where the keys are underlined:
```
Employee(SSN, Name,Dept)
Project(SSN,PID,Name,Budget)
```
The SSN attribute in Project is the Id of the employee working on the project, and PID is the Id of the project. There can be several employees per project, but the functional dependency PID ? Name,Budget holds (so the relation is not normalized). Consider the query
```
SELECT P.Budget, P.Name, E.Name
FROM Employee E, Project P
WHERE E.SSN = P.SSN AND
P.Budget > 99 AND
E.Name = ’John’
ORDER BY P.Budget
```
Assume the following statistical information:
1. 10,000 tuples in Employee relation
2. 20,000 tuples in Project relation
3. 40 tuples per page in each relation
4. 10-page bu?er
5. 1000 di?erent values for E.Name
6. The domain of Budget consists of integers in the range of 1 to 100
7. Indices
– Employee relation
On Name: Unclustered, hash
On SSN: Clustered, 3-level B+ tree
– Project relation
On SSN: Unclustered, hash
On Budget: Clustered, 2-level B+ tree
a. Draw the fully pushed query tree.
b. Find the “best” execution plan and the second-best plan. What is the cost of each? Explain how you arrived at your costs.
a. ![15082|445x389](upload://2J1uMWLwiWwGxCbTWA7igdXiIjZ.png)
b. There are three reasonable plans:
– Select Employee on Name. Since there are about 10000/1000=10 tuples in the result, it would take 1.2*10=12 page transfers. The result contains 10 tuples, 1 page.
Using index-nested loops, join the 10 Employee tuples with Project. There are two projects per employee on the average, so it would require to bring in 1.2 * 10 * 2 = 24 pages, since we will be using an unclustered hash index on SSN in Project. The result will contain 20 tuples (twice as wide as the tuples in the original relations), 1 page. We do not need to write anything out on disk, since after extracting the joined tuples we will be discarding the pages of Project that we bring in for the join.
Select the result on Budget on the ?y, yielding 0.01 * 20 = 1 tuple (at no cost), then project. The total cost is 12 + 24 = 36 page transfers.
– Select Project on Budget. Because we can use a clustered 2-level index, it will take 2 I/Os to ?nd the address of the ?rst relevant page. Project has 20000/40 = 500 pages and we select 0.01 of them, i.e., 5 pages, 200 tuples, at the cost of 2 (index search) + 5 (reading the relevant pages) = 7 page transfers.
Use index-nested loops to join the result with Employee. The cost: at least 200 (project tuples) * 3 (to fetch an Employee page) = 600 I/Os. The resulting number of tuples is about 200, twice the original size. This is about 10 pages.
Apply selection and projection on the ?y, yielding one page (at no cost). Total: over 607 I/Os. Select Employee on Name,andProject on Budget. From before, we know that it takes 12 + 7 I/Os and yields 10 and 200 tuples (1 + 5 pages), respectively.
Use sort-merge-join, then project. We can sort everything in memory (we have 10 pages in the bu?er) and then merge, select and project in one pass (also in memory). Thus, the cost is 19 I/Os.
The best plan is therefore plan #3. The second best is plan #1.
You might also like to view...
The four principal approaches to multithreading are: interleaved (fine-grained), blocked (coarse-grained), simultaneous, and ________.
Fill in the blank(s) with the appropriate word(s).
Compare the throughput of FSCAN with that of SCAN.
What will be an ideal response?