University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao CMPSCI 445 Midterm Practice Questions NAME: LOGIN: Write all of your answers directly on this paper. Be sure to clearly
of 7
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao CMPSCI 445 Midterm Practice Questions NAME: LOGIN: Write all of your answers directly on this paper. Be sure to clearly indicate your final answer for each question. Also, be sure to state any assumptions that you are making in your answers. Problem Maximum Score 1. True/False Statements 2. Relational Algebra and SQL 3. B+ Tree Indexes 4. Query Evaluation TOTAL Question 1 [5 parts]true/false Statements State if the following statements are TRUE or FALSE. Please write your answers in the boxes below. No explanation is needed. (a) Consider relational operators σ and. For any union compatible relations R1 and R2 and any predicate p, σ p (R1 R2) σ p (R1) σ p (R2) ( means that the left expression and the right expression always return the same answer (b) Consider relational operators π and. For any union compatible relations R1 and R2 and any attribute set s, π S (R1 R2) π S (R1) π S (R2). (c) On average, random accesses to N pages are faster than a sequential scan of N pages because random I/Os tend to access different cylinders and therefore cause less contention. (d) It is a good idea to create as many indexes as possible to expedite query processing because there is no advantage of having many indexes. (e) Using a clustered B+Tree index on age to retrieve records in sorted order of age is always faster than performing a two-pass external merge-sort. Question 2 [3 parts]: Relational Algebra and SQL Consider the following relational schema. Emp(eid, ename, age, salary) Works(eid, did) Dept(did, dname, budget, managerid) An employee can work in more than one department. And managerid in Dept is a foreign key referencing eid in Emp. (a) Print the name and age of each employee who works in both the Hardware department and the Software department. Write an SQL statement for this query. (b) Write an expression in Relational Algebra for the query in Part (a). (c) For each employee who manages some department, print his name and the sum of the budgets of all the departments that he manages. Write an SQL statement for this query. Question 3 [1 part]: B+ Tree Indexes (a) [12 points] Create a B+tree where each node can hold at most 3 keys and 4 pointers when the following keys are inserted in the following order: 1, 10, 2, 11, 3, 4, 8, 5, 7, 6 Show the final tree below. Question 4 [2 parts]: Query Evaluation (a) [10 points] Evaluation of Selection Consider the following schema: Employees(eid: integer, ename: string, sal: integer, title: string, age: integer) Suppose that the following indexes, all using Alternative (2) for data entries, exist: An unclustered B+ tree index on sal, A clustered B+ tree index on age, sal . The Employees relation contains 10,000 pages and 200,000 data records. Each Employees record is 100 bytes long and each index data entry is 20 bytes long. Consider the following selection condition sal 200 title = 'VP' Assume that the reduction factor (RF) for an equality predicate is 1% and that for an inequality predicate is 10%. Compute the cost of the most selective access method (among the file scan and available index scans) for evaluating this selection condition. (b) Evaluation of join Consider the join R R.a=S.b S, given the following information about the relations to be joined. The cost metric is the number of page I/Os unless otherwise noted, and the cost of writing out the result should be uniformly ignored. Relation Num. tuples in total Num. tuples per page R 10, S 4, Attribute b of relation S, S.b, is the primary key for S. There is a primary key index on S.b. Both relations are stored as simple heap files. 52 buffer pages are available. Between the block nested loops join and the index nested loops join, choose a more efficient evaluation method for this join. Make sure that you consider the lowest cost of each join method and then compare the two costs to give the final answer. Answer to Question 1: (a) True. (b) False. Suppose E1 and E2 have the same schema (name, gpa), and E1 has one tuple ( Sam, 4.0) and E2 has one tuple ( Sam, 3.0). π name (E1-E2) returns one tuple ( Sam ), but π name (E1) - π name (E2) returns no result. (c) False. N random I/Os have repeated seek time and rotational delay, hence slower than a sequential scan of N pages. (d) False. Indexes consume a lot space and can slow down insertions. (e) True. Answer to Question 2: (a) SQL statement SELECT E.ename, E.age FROM Emp E, Works W1, Works W2, Dept D1, Dept D2 WHERE E.eid = W1.eid AND W1.did = D1.did AND D1.dname = Hardware AND E.eid = W2.eid AND W2.did = D2.did AND D2.dname = Software (b) Expression in Relational Algebra ρ(r1, π eid (σ dname = Hardware (Dept) Works)) ρ(r2, π eid (σ dname = Software (Dept) Works)) π ename, age ((R1 R2) Emp) (c) SQL statement SELECT E.ename, sum(d.budget) FROM Dept D, Emp E WHERE D.managerid = E.eid GROUP BY D.managerid Answer to Query 3: (a) The final B+ tree: Root: 7 Level 1: (3, 5) (10) Level 2: (1, 2) (3,4) (5,6) (7,8) (10,11) Answer to Query 4: (a) sal 200 title = 'VP' Option 1: filescan with a cost of 10,000 pages. Option 2: Unclustered B+tree. - 2,000 (10,000/(100/20)) leaf pages * 10% matches = 200 pages * 10,000 * 10% matches = 20,000 matches. One I/O per match, then 20,000 I/Os. If use the refinement of sorting, still 10,000 pages. Option 3: Clustered B+tree on age, sal does not help. In this case, the file scan is the best available method to use, with a cost of 10,000. (b) # pages in R = M = 1000, # pages in S = N = 400 Let us consider the block nested loops join first. This time read the outer relation in blocks, and for each block scan the inner relation for matching tuples. So the outer relation is still read once, but the inner relation is scanned only once for each outer block, of which there are #pages_in_outer/ (B 2) = 400/50 = 8. TotalCost = N + N / (B 2) * M= 8,400 We then consider the index nested loops join. For each of the 10,000 R tuples, we use the primary key index on S.b to fetch the ONLY matching S tuple. Given 400 S pages and 52 buffer pages, we can buffer at most the root and its next level of the B+ tree. So given each R tuple, we need to pay 1-2 I/Os to fetch the matching S tuple through the index. This already gives us the cost: M + ( M * T R * cost of finding matching S tuples) = ,000 * 1 to 2 So block nested loops join is a better option.
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks