Consider the following relations that represent part of a real estate database:
```
Agent(Id, AgentName)
House(Address, OwnerId, AgentId)
Amenity(Address, Feature)
```
The Agent relation keeps information on real estate agents, the House relation has information on who is selling the house and the agent involved, and the Amenity relation provides information on the features of each house. Each relation has its keys underlined. Consider the following query:
```
SELECT H.OwnerId, A.AgentName
FROM House H, Agent A, Amenity Y
WHERE H.Address=Y.Address AND A.Id = H.AgentId
AND Y.Feature = ’5BR’ AND H.AgentId = ’007’
```
Assume that the bu?er space available for this query has 5 pages and that the following statistics and indices are available:
1. Amenity
10,000 records on 1000 houses, 5 records per page
Clustered 2-level B+ tree index on Address
Unclustered hash index on Feature, 50 features
2. Agent
200 agents with 10 tuples per page
Unclustered hash index on Id
3. House
1000 houses with 4 records per page
Unclustered hash index on AgentId
Clustered 2-level B+ tree index on Address
Answer the following questions (and explain how you arrived at your solutions).
a. Draw a fully pushed query tree corresponding to the above query.
b. Find the best query plan to evaluate the above query and estimate its cost.
c. Find the next-best plan and estimate its cost.
a. ![15189|315x469](upload://qX2W4mTYmxyWQKCI2YgEoaFjKma.png)
b. We c ou l d j o i n House with Agent or Amenity, but in any case it is clear that we should ?rst select House on AgentId, because of the large reduction in size: There are 200 agents, 1000 houses, so agent 007 must be handling about 5 houses. At 4 records per page, this would occupy 2 pages.
Because the index on AgentId is unclustered, it would take 1.2 I/Os to ?nd the bucket and 5 I/Os to fetch the relevant pages: 6.2 I/Os in total.
Next we can join with Agent, which would take 1.2 page I/Os to search the index, since Agent has an unclustered index on Id. In addition, 1 I/O is required to fetch the data page. The result will still have 5 tuples, but the tuples will be
about 50% larger (Agent has 10 tuples/page, while House only 4). However, we can project out Id and AgentId, which will bring the tuple size to about the size of the tuples in House. So, the join will still occupy a little over a page. We keep
the result of the join in the main memory bu?er.
Finally, we join the result with Amenity. Since the statistics indicate that there are about 10 amenities per house, it does not make much sense to select Amenity on Feature: the size will go down by the factor of 10 at a very high price
(unclustered hash or scan of 2,000 blocks of the Amenity relation), and we will loose the index on Address — the attribute used in the join.
So, the best way to do the join is to use index-nested loops join using the clustered index on the attribute Address of Amenity. It would take 2 I/Os to search the B+ tree index for each of the 5 tuples in the result of the previous join
(i.e., 2*5; if we cache the top level of the B+ tree then this search would take 2+4=6 I/Os). The number of tuples retrieved would be 50 (10 features per house * 5 houses), which occupies 10 pages. Therefore, the total needed to retrieve the
matching tuples in Amenity is 16 I/Os.
Note that we still have enough room in the bu?er. The expected size of the join is 50 tuples (5*10), which is too large for our 5 page bu?er. However, we can also select on Feature=’5BR’ on the ?y, reducing the size to about 5 tuples, each
about twice the size of the tuples in House. We can also project )on the ?y) on OwnerId and AgentName further reducing the size. Thus, we will need 2 pages in the bu?er for the result of the join of House and Agent, one page for the input
bu?er that is needed for reading the matching tuples of Amenity,and two to store the ?nal result. This ?ts in the available 5 bu?er pages.
In total, thus, the query will take 6.2 + 1.2 + 1 + 16 = 24.4 I/Os.
c. Similar, except that what is left of House is ?rst joined with Amenity. The number of I/Os is going to be the same, so this is also a best plan. The next plan after that would be to do something silly, like joining House and Agent using nested loops. Since Agent occupies 20 blocks, this can be done in 20 I/Os. Then we could proceed to join with Amenity.
You might also like to view...
What is a tractable problem?
a. A problem solvable by a polynomial algorithm b. A problem solvable by any algorithm c. A problem for which there is a data structure d. A graph theory problem
A fundamental element when finding an information system vendor is the fit between the organization and the vendor
a. true b. false