Description

A PERF solution for distributed query optimization

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Transcript

Proceedings of the ISCA 15th International Conference on Computers and Their Applications (CATA-2000) !e" #rleans$ %ouisiana$ &SA 'arch 2 *1$ 2000
A PERFSolution ForDistributedQueryOptimizationRamzi A. Haraty and Roula FanyLebanese American UniersityP.O. !o" #$%&'&$!eirut( Lebanon
Abstract
+uer, optimiation techni.ues aim tominimie the cost of transferring data acrossnet"or/s 'an, techni.ues and algorithmshae een proposed to optimie .ueries #neof the algorithms is the algorithm usingsemi-3oins !o"ada,s$ a ne" techni.ue calledP46 seems to ring some improement oer semi-3oins 728 P46 3oins are t"o-"a, semi- 3oins using a it ector as their ac/"ard phase #ur research encompasses appl,ingP46 3oins to the algorithm Programs"ere designed to implement oth the srcinaland the enhanced algorithms Seerale9periments "ere conducted and the resultssho"ed a er, considerale enhancementotained , appl,ing the P46 concept
)ey*ords
+uer, #ptimiation$ Semi-:oins$ and P46:oins
# + ,ntroduction
;istriuted .uer, processing is the processof retrieing data from different sitesAccessing data from different sites inolestransmission ia communication lin/s thatcreates dela,s The asic challenge is to designand deelop efficient .uer, processingtechni.ues and strategies to minimie thecommunication cost This is the main purposeof .uer, optimiation "hich estimates the costof alternatie .uer, plans in order to choosethe est plan to ans"er .uic/l, and efficientl,$comple9 and e9pensie .ueries 7*8 The .uer, optimiation prolem "asaddressed man, times$ from different perspecties$ and a lot of "or/ has een doneProposed algorithms and techni.ues can ecategoried in t"o main approaches<1-'inimie the cost of data transferredacross the net"or/ , reducing theamount of transmitted information$and 2-'inimie the response time of the.uer, , using parallel processing In this paper$ "e "ill mainl, focus on thefirst approach #ne of the most importantalgorithms suggested for .uer, optimiation"ith minimum cost "as algorithm =4!4A%(total cost) presented , Apers$ >ener and?ao in 1@* 78 The adent of A>? "as areolution in .uer, optimiation domain ecause it introduced semi-3oins as reducers inthe .uer, optimiation process In 15$ Todd Bealor from indsor &niersit,$ Canada presented a ne" algorithmcalled algorithm as an enhancement oer A>? At the same time$ a ne" techni.uecalled P46 (Partiall, 4ncoded ecord 6ilter)"as presented , enneth oss 728 Thismethod adds to semi-3oins another dimension$"hich is the ac/"ard phase that "ill e usedto eliminate unnecessar, redundant semi-3oins , using it ectors In this paper "e present an improementoer algorithm using P46 3oins applied to This paper is organied as follo"s< Section2 presents the algorithm Section *discusses our contriution in the P46algorithm Section presents thee9perimental results And section 5 concludesthe paper
- % /e 0 Al1orit/m
The main aim of this algorithm is tominimie total time , using reducers in order to eliminate unnecessar, data This algorithmis characteried , t"o distinct phases<
Phase 1.
Semi-3oin schedules for constructingeach reducer are formed using a costDenefitanal,sis ased on estimated attriuteselectiit, and sies of partial results
Phase 2.
Schedule is e9ecutedAlgorithm "or/s as follo"s<1 4stalish schedules for the construction of reducers 6or each 3oin attriute
j
constructschedule for the reducer dE
m3
It should enoted that at this leel$ each schedule isconsidered independentl, >ence$ no semi- 3oins are e9ecuted ,et This is achieed in t"o phases<
Phase 1.
Sort attriutes , increasing siesuch that< S(d
a3
)
≤
S (d
3
)
≤
- - -
≤
S(d
m3
)
Phase 2.
4aluate semi-3oins in order eginning "ith d
a3
d
3
Append semi-3ointo schedule if<
Proceedings of the ISCA 15th International Conference on Computers and Their Applications (CATA-2000) !e" #rleans$ %ouisiana$ &SA 'arch 2 *1$ 2000a It is profitale and marginall, profitaleP(d
a3
d
3
) F 0 and 'P (d
a3
d
3
) F 0or$ It is gainful ut not profitale >ence$ P(d
a3
d
3
) G 0 ut = (d
a3
d
3
) F 0If semi-3oin is appended then dE
3
d
c3
isealuated ne9t$ else dE
a3
d
c3
is consideredepeat this process until all semi-3oins in these.uence are ealuated The last attriute inthe se.uence "ill e called the reducer2 49amine the effects of reducers Consider the reduction effects of the reducersH all-applicale relations ,<a Sorting reducers from smallest to largest 4stimating the cost and enefit of a semi- 3oin "ith each admissile relation and for each reducer Profitale semi-3oins areappended to the schedule* eie" of unused semi-3oins 6or non- profitale reducers$ ree9amine the possiilit,of haing profitale semi-3oins for that particular 3oin attriute This phase is doneusing the follo"ing su-steps<a Sort attriutes , increasing sie 4aluate each semi-3oin and append profitale semi-3oins to the final schedule !ote that marginal profit is not considered inthis step 49ecute the schedule ;uring this phase$reducers are constructed and shipped todesignated sites to reduce the correspondingrelations Then$ reduced relations are shippedto the asseml, site This heuristic is simple and efficient Itaims to construct in the cheapest possile "a,$reducers "ho are highl, selectie Thosereducers "ill e then used to eliminate tuplesfrom participating relations prior to shipmentto the .uer, site (asseml, site) It should e noted that algorithm ameliorates the choice of 3oin attriutes andtheir order ut does not eliminate redundanttransmissions ecause schedules are alsotreated separatel,
$ % /e PERF0 Al1orit/m
hen appl,ing P46 to the algorithm$the same concept is presered ut semi-3oinsare replaced , P46 3oins #ur enhancementconsisted of the follo"ing t"o phases that"ere added to the schedule construction<a Build a P46 list "here P46
i i1 3
is setto 1 "hen transmission "as done from
R
i
to
R
i1
on 3oin attriute
j.
hen calculating transmission cost$ If P46
i i 1 3
J 1 then Cost J 0 4lse Cost J C
0
C
1
E
i/
(
i/
E
?
(i 1) /
)D@ "here C
0
C
1
E
i/
is the linear function of transmission cost that is e.ual to the fi9ed cost per ,te transmitted (C
1
) multiplied , the siein ,tes of the 3oin attriute pro3ected This isthe usual cost of a semi-3oin /no"n as thefor"ard cost$ and (
i/
E
?
(i 1) /
)D@ is the ac/"ard cost that is the cost of transmitting ac/ to
R
i
the it ector consisting of onl,matching alues of the correspondingattriute 6or simplicit, of this e.uation$ "eare considering attriute
k
of "idth 1 ,te As it can e seen$ the P46 ersion of algorithm does not eliminate redundanttransmissions from the schedules ut it ma/estheir cost ero "hen the, occur This can emade possile , adding a little oerhead onthe transmission cost$ "hich is the ac/"ardcost &sing this fact$ if a transmission "asdone from site
A
to site
B
using a 3oin attriute
j
$ then eer, other transmission from
A
to
B
using
j
"ill hae a ero cost and eer,transmission from
B
to
A
using
j
"ill haealso a ero cost 6rom this point$ a P46 3oincan e seen as a non-redundant s,mmetricfunction This fundamental propert, allo"edus to enhance oer the algorithm
2 % E"perimental Results
;ifferent scenarios "ere conceied in order to ealuate the performance of the differentalgorithms and for each scenario programs"ere run 1500 times ;ifferent /inds of resultsare collected including the comparison of allalgorithms ersus the unoptimied method !ote that all programs "ere deelopedusing Kisual C 0 under indo"s 549periments "ere conducted on a Pentium KPC "ith L 'B A' In the first test scenario the attriute "idthis ta/en as 1 ,te for all attriutesT?P4P46P46
Proceedings of the ISCA 15th International Conference on Computers and Their Applications (CATA-2000) !e" #rleans$ %ouisiana$ &SA 'arch 2 *1$ 2000D2-22M**2 *52-**@@M@ 112-5L1@L0L* 5*-2*0L**2L 20*-*1LM*5 2L*-52*L55*2 2L-2152*1 0@L-*M1@L 150-55*55M12 1MM5-251M51 0255-*5L*55*M 0M5-L00@L11 105T#T<M0M2* 215 =raphicall,$ the results are represented asfollo"s< comparing P46 to < "e noticethat P46 outperforms in all cases
020406080
2 - 2 2 - 4 3 - 3 4 - 2 4 - 4 5 - 3
In the second test scenario the attriute "idthis ta/en as 5 ,tes for all attriutes
T?P4P46P46D2-22M5L*112*5L2-*2*1L1102-55255M*-22@L2*0M212*-*0L**2M2L*-521555102-20*511M0@1-*55M1*15-555L@01*@55-250@5511L0*15-*551155@M0ML5-L1LL2@102T#T<L2*@121@
=raphicall,$ the results are represented asfollo"s< comparing P46 to < "e noticethat P46 outperforms in all cases
020406080
2 - 2 2 - 4 3 - 3 4 - 2 4 - 4 5 - 3
In the third test scenario the attriute "idth ista/en as 50 ,tes for all attriutes
T?P4P46P46D2-22@2*1@1*5M2-*2M*LM5022-5M2*L1L0*M*-22@M@*0@520M*-*1LM22M5*-520*522-20@M1L@0@2-*L10M5L15-5ML5LL11@55-251@51ML02@5-*52552*0@15-L0LL1L100T#T<LL0@ML21L
=raphicall,$ the results are represented asfollo"s< comparing P46 to < "e noticealso that P46 outperforms in all cases
0204060802-22-32-43-23-33-44-24-34-45-25-35-4
e used man, different scenarios in order tostud, the performance of the mentionedalgorithms from different perspecties 6or each scenario$ "e compared the performanceof the algorithms "ith respect to each other&sing different scenarios "e studied etter the ehaior of all algorithms under a ariet, of circumstances e could e ale to note that
Proceedings of the ISCA 15th International Conference on Computers and Their Applications (CATA-2000) !e" #rleans$ %ouisiana$ &SA 'arch 2 *1$ 2000P46 has the est performance for a field"idth of 50 ,tes This result "as e9pected ecause of the oerhead added , P46 to the ac/"ard phase ememer that P46consists of returning ac/ to the srcinal site a it ector representing the matching tuplesThis oerhead is someho" more considerale"hen the srcinal field "idth is GJ 1 ,te ecause it might e more profitale sometimesnot to send ac/ this data But "hen haing a"idth of 50 ,tes$ the ac/"ard cost ecomesnegligile as compared to the for"ard cost 6inall,$ "e can conclude that the results of our e9periments "ere up to the e9pectationsand proed the po"er of P46 3oins and their adantage in optimiing the total time of distriuted .ueries
& + 3onclusion
In this paper$ a P46 3oin algorithm has een presented as our contriution to the.uer, optimiation prolem using semi-3oinse hae full, e9posed oth concepts of semi- 3oins and P46 3oins and then$ "e hae ta/enan optimiation algorithm using semi-3oins() and enhanced it , appl,ing P46 3oins(P46)
4 + Re5erences
718Todd Bealor$ NSemi-3oin Strategies6or Total Cost 'inimiation In;istriuted +uer, ProcessingO$'aster Thesis$ &niersit, of indsor$Canada$ 15728he %i$ A oss$ NP46 :oin< AnAlternatie to T"o-a, Semi-:oinand Bloom3oinO$ Columia&niersit,$ !e" ?or/$ 157*8; Barara$ ;u'ouchel$ C6aloustos$ P: >aas$ :' >ellerstein$? Iaonnidies$ >K :agadish$ T:ohnson$ !g$ K Poosala$ Aoss and C Seci/$ NThe !e":erse, ;ata eduction eportO$Bulletin #f The Technical Committee#n ;ata 4ngineering$ Pages< *-5$;ecemer 1M78Peter '= Apers$ Alan >ener and S Bing ?ao$ N#ptimiationAlgorithms 6or ;istriuted +ueriesO$I444 Transactions #n Soft"are4ngineering$ Kol Se-$ !o1$ Pages<5M-L@$ :anuar, 1@*758oula 6an,$ NP46 Solutions for ;istriuted +uer, #ptimiationO$'asters Thesis$ %eanese American&niersit,$ Septemer 1

Search

Similar documents

Tags

Related Search

Query OptimizationPerformance Estimation models for DistributedA New Taxonomy for Particle Swarm OptimizatioNumerical solution for system of stiff OrdinaA power perspective for cross-cultural managea different reason for the building of SilburNUMERICAL SOLUTION FOR GROWTH BRAIN TUMORHousing solution for urban poor Across the Border - A New Avatar for Indias Solution for War

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks