A PERF solution for distributed query optimization
  Proceedings of the ISCA 15th International Conference on Computers and Their Applications (CATA-2000) !e" #rleans$ %ouisiana$ &SA 'arch 2  *1$ 2000 A PERFSolution ForDistributedQueryOptimizationRamzi A. Haraty and Roula FanyLebanese American UniersityP.O. !o" #$%&'&$!eirut( Lebanon Abstract +uer, optimiation techni.ues aim tominimie the cost of transferring data acrossnet"or/s 'an, techni.ues and algorithmshae een proposed to optimie .ueries #neof the algorithms is the  algorithm usingsemi-3oins !o"ada,s$ a ne" techni.ue calledP46 seems to ring some improement oer semi-3oins 728 P46 3oins are t"o-"a, semi- 3oins using a it ector as their ac/"ard phase #ur research encompasses appl,ingP46 3oins to the  algorithm Programs"ere designed to implement oth the srcinaland the enhanced algorithms Seerale9periments "ere conducted and the resultssho"ed a er, considerale enhancementotained , appl,ing the P46 concept )ey*ords +uer, #ptimiation$ Semi-:oins$ and P46:oins # + ,ntroduction  ;istriuted .uer, processing is the processof retrieing data from different sitesAccessing data from different sites inolestransmission ia communication lin/s thatcreates dela,s The asic challenge is to designand deelop efficient .uer, processingtechni.ues and strategies to minimie thecommunication cost This is the main purposeof .uer, optimiation "hich estimates the costof alternatie .uer, plans in order to choosethe est plan to ans"er .uic/l, and efficientl,$comple9 and e9pensie .ueries 7*8 The .uer, optimiation prolem "asaddressed man, times$ from different perspecties$ and a lot of "or/ has een doneProposed algorithms and techni.ues can ecategoried in t"o main approaches<1-'inimie the cost of data transferredacross the net"or/ , reducing theamount of transmitted information$and 2-'inimie the response time of the.uer, , using parallel processing In this paper$ "e "ill mainl, focus on thefirst approach #ne of the most importantalgorithms suggested for .uer, optimiation"ith minimum cost "as algorithm =4!4A%(total cost) presented , Apers$ >ener and?ao in 1@* 78 The adent of A>? "as areolution in .uer, optimiation domain ecause it introduced semi-3oins as reducers inthe .uer, optimiation process In 15$ Todd Bealor from indsor &niersit,$ Canada presented a ne" algorithmcalled  algorithm as an enhancement oer A>? At the same time$ a ne" techni.uecalled P46 (Partiall, 4ncoded ecord 6ilter)"as presented , enneth oss 728 Thismethod adds to semi-3oins another dimension$"hich is the ac/"ard phase that "ill e usedto eliminate unnecessar, redundant semi-3oins , using it ectors In this paper "e present an improementoer  algorithm using P46 3oins applied to This paper is organied as follo"s< Section2 presents the  algorithm Section *discusses our contriution in the P46algorithm Section  presents thee9perimental results And section 5 concludesthe paper - % /e 0 Al1orit/m  The main aim of this algorithm is tominimie total time , using reducers in order to eliminate unnecessar, data This algorithmis characteried , t"o distinct phases<  Phase 1.  Semi-3oin schedules for constructingeach reducer are formed using a costDenefitanal,sis ased on estimated attriuteselectiit, and sies of partial results  Phase 2.  Schedule is e9ecutedAlgorithm  "or/s as follo"s<1 4stalish schedules for the construction of reducers 6or each 3oin attriute  j  constructschedule for the reducer dE m3  It should enoted that at this leel$ each schedule isconsidered independentl, >ence$ no semi- 3oins are e9ecuted ,et This is achieed in t"o phases<  Phase 1.  Sort attriutes , increasing siesuch that< S(d a3 ) ≤  S (d  3 ) ≤  - - - ≤  S(d m3 )  Phase 2.  4aluate semi-3oins in order  eginning "ith d a3  d  3  Append semi-3ointo schedule if<  Proceedings of the ISCA 15th International Conference on Computers and Their Applications (CATA-2000) !e" #rleans$ %ouisiana$ &SA 'arch 2  *1$ 2000a It is profitale and marginall, profitaleP(d a3  d  3 ) F 0 and 'P (d a3  d  3 ) F 0or$  It is gainful ut not profitale >ence$ P(d a3  d  3 ) G 0 ut = (d a3 d  3 ) F 0If semi-3oin is appended then dE  3  d c3  isealuated ne9t$ else dE a3 d c3  is consideredepeat this process until all semi-3oins in these.uence are ealuated The last attriute inthe se.uence "ill e called the reducer2 49amine the effects of reducers Consider the reduction effects of the reducersH all-applicale relations ,<a Sorting reducers from smallest to largest   4stimating the cost and enefit of a semi- 3oin "ith each admissile relation and for each reducer Profitale semi-3oins areappended to the schedule* eie" of unused semi-3oins 6or non- profitale reducers$ ree9amine the possiilit,of haing profitale semi-3oins for that particular 3oin attriute This phase is doneusing the follo"ing su-steps<a Sort attriutes , increasing sie  4aluate each semi-3oin and append profitale semi-3oins to the final schedule !ote that marginal profit is not considered inthis step 49ecute the schedule ;uring this phase$reducers are constructed and shipped todesignated sites to reduce the correspondingrelations Then$ reduced relations are shippedto the asseml, site This heuristic is simple and efficient Itaims to construct in the cheapest possile "a,$reducers "ho are highl, selectie Thosereducers "ill e then used to eliminate tuplesfrom participating relations prior to shipmentto the .uer, site (asseml, site) It should e noted that algorithm ameliorates the choice of 3oin attriutes andtheir order ut does not eliminate redundanttransmissions ecause schedules are alsotreated separatel, $ % /e PERF0 Al1orit/m  hen appl,ing P46 to the  algorithm$the same concept is presered ut semi-3oinsare replaced , P46 3oins #ur enhancementconsisted of the follo"ing t"o phases that"ere added to the schedule construction<a Build a P46 list "here P46 i i1 3  is setto 1 "hen transmission "as done from  R i  to  R i1  on 3oin attriute  j.   hen calculating transmission cost$ If P46 i i  1 3  J 1 then Cost J 0 4lse Cost J C 0   C 1  E  i/    ( i/   E ? (i  1) /   )D@ "here C 0   C 1  E  i/   is the linear function of transmission cost that is e.ual to the fi9ed cost per ,te transmitted (C 1 ) multiplied , the siein ,tes of the 3oin attriute pro3ected This isthe usual cost of a semi-3oin /no"n as thefor"ard cost$ and ( i/   E ? (i  1) /   )D@ is the ac/"ard cost that is the cost of transmitting ac/ to  R i  the it ector consisting of onl,matching alues of the correspondingattriute 6or simplicit, of this e.uation$ "eare considering attriute k   of "idth 1 ,te As it can e seen$ the P46 ersion of algorithm does not eliminate redundanttransmissions from the schedules ut it ma/estheir cost ero "hen the, occur This can emade possile , adding a little oerhead onthe transmission cost$ "hich is the ac/"ardcost &sing this fact$ if a transmission "asdone from site  A  to site  B  using a 3oin attriute  j $ then eer, other transmission from  A  to  B using  j  "ill hae a ero cost and eer,transmission from  B  to  A  using  j  "ill haealso a ero cost 6rom this point$ a P46 3oincan e seen as a non-redundant s,mmetricfunction This fundamental propert, allo"edus to enhance oer the  algorithm 2 % E"perimental Results  ;ifferent scenarios "ere conceied in order to ealuate the performance of the differentalgorithms and for each scenario programs"ere run 1500 times ;ifferent /inds of resultsare collected including the comparison of allalgorithms ersus the unoptimied method !ote that all programs "ere deelopedusing Kisual C 0 under indo"s 549periments "ere conducted on a Pentium KPC "ith L 'B A' In the first test scenario the attriute "idthis ta/en as 1 ,te for all attriutesT?P4P46P46  Proceedings of the ISCA 15th International Conference on Computers and Their Applications (CATA-2000) !e" #rleans$ %ouisiana$ &SA 'arch 2  *1$ 2000D2-22M**2 *52-**@@M@ 112-5L1@L0L* 5*-2*0L**2L 20*-*1LM*5 2L*-52*L55*2 2L-2152*1 0@L-*M1@L 150-55*55M12 1MM5-251M51 0255-*5L*55*M 0M5-L00@L11 105T#T<M0M2* 215 =raphicall,$ the results are represented asfollo"s< comparing P46 to < "e noticethat P46 outperforms  in all cases 020406080         2    -        2        2    -        4        3    -        3        4    -        2        4    -        4       5    -        3 In the second test scenario the attriute "idthis ta/en as 5 ,tes for all attriutes T?P4P46P46D2-22M5L*112*5L2-*2*1L1102-55255M*-22@L2*0M212*-*0L**2M2L*-521555102-20*511M0@1-*55M1*15-555L@01*@55-250@5511L0*15-*551155@M0ML5-L1LL2@102T#T<L2*@121@  =raphicall,$ the results are represented asfollo"s< comparing P46 to < "e noticethat P46 outperforms  in all cases 020406080         2    -        2        2    -        4        3    -        3        4    -        2        4    -        4       5    -        3 In the third test scenario the attriute "idth ista/en as 50 ,tes for all attriutes T?P4P46P46D2-22@2*1@1*5M2-*2M*LM5022-5M2*L1L0*M*-22@M@*0@520M*-*1LM22M5*-520*522-20@M1L@0@2-*L10M5L15-5ML5LL11@55-251@51ML02@5-*52552*0@15-L0LL1L100T#T<LL0@ML21L  =raphicall,$ the results are represented asfollo"s< comparing P46 to < "e noticealso that P46 outperforms  in all cases 0204060802-22-32-43-23-33-44-24-34-45-25-35-4 e used man, different scenarios in order tostud, the performance of the mentionedalgorithms from different perspecties 6or each scenario$ "e compared the performanceof the algorithms "ith respect to each other&sing different scenarios "e studied etter the ehaior of all algorithms under a ariet, of circumstances e could e ale to note that  Proceedings of the ISCA 15th International Conference on Computers and Their Applications (CATA-2000) !e" #rleans$ %ouisiana$ &SA 'arch 2  *1$ 2000P46 has the est performance for a field"idth of 50 ,tes This result "as e9pected ecause of the oerhead added , P46 to the ac/"ard phase ememer that P46consists of returning ac/ to the srcinal site a it ector representing the matching tuplesThis oerhead is someho" more considerale"hen the srcinal field "idth is GJ 1 ,te ecause it might e more profitale sometimesnot to send ac/ this data But "hen haing a"idth of 50 ,tes$ the ac/"ard cost ecomesnegligile as compared to the for"ard cost 6inall,$ "e can conclude that the results of our e9periments "ere up to the e9pectationsand proed the po"er of P46 3oins and their adantage in optimiing the total time of distriuted .ueries & + 3onclusion  In this paper$ a P46 3oin algorithm has een presented as our contriution to the.uer, optimiation prolem using semi-3oinse hae full, e9posed oth concepts of semi- 3oins and P46 3oins and then$ "e hae ta/enan optimiation algorithm using semi-3oins() and enhanced it , appl,ing P46 3oins(P46) 4 + Re5erences 718Todd Bealor$ NSemi-3oin Strategies6or Total Cost 'inimiation In;istriuted +uer, ProcessingO$'aster Thesis$ &niersit, of indsor$Canada$ 15728he %i$ A oss$ NP46 :oin< AnAlternatie to T"o-a, Semi-:oinand Bloom3oinO$ Columia&niersit,$ !e" ?or/$ 157*8; Barara$  ;u'ouchel$ C6aloustos$ P: >aas$ :' >ellerstein$? Iaonnidies$ >K :agadish$ T:ohnson$  !g$ K Poosala$ Aoss and C Seci/$ NThe !e":erse, ;ata eduction eportO$Bulletin #f The Technical Committee#n ;ata 4ngineering$ Pages< *-5$;ecemer 1M78Peter '= Apers$ Alan  >ener and S Bing ?ao$ N#ptimiationAlgorithms 6or ;istriuted +ueriesO$I444 Transactions #n Soft"are4ngineering$ Kol Se-$ !o1$ Pages<5M-L@$ :anuar, 1@*758oula 6an,$ NP46 Solutions for ;istriuted +uer, #ptimiationO$'asters Thesis$ %eanese American&niersit,$ Septemer 1
