Devices & Hardware

A Statistical Theory of Chord Under Churn

Description
A Statistical Theory of Chord Under Churn
Published
of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  A Statistical Theory of Chord under Churn  ∗ Supriya Krishnamurthy 1 , Sameh El-Ansary 1 , Erik Aurell 1 , 2 and Seif Haridi 1 , 31 Swedish Institute of Computer Science (SICS), Sweden 2 Department of Physics, KTH-Royal Institute of Technology, Sweden 3 IMIT, KTH-Royal Institute of Technology, Sweden { supriya,sameh,eaurell,seif  } @sics.se Abstract.  Most earlier studies of DHTs under churn haveeither depended on simulations as the primary investigationtool, or on establishing bounds for DHTs to function. In this paper, we present a complete analytical study of churn usinga master-equation-based approach, used traditionally in non-equilibrium statistical mechanics to describe steady-state or transient phenomena. Simulations are used to verify all the-oretical predictions. We demonstrate the application of our methodology to the Chord system. For any rate of churn and stabilization rates, and any system size, we accurately predict the fraction of failed or incorrect successor and finger point-ers and show how we can use these quantities to predict the performance and consistency of lookups under churn. We alsodiscuss briefly how churn may actually be of different ’types’and the implications this will have for the functioning of DHTsin general. 1 Introduction Theoretical studies of asymptotic performance bounds of DHTs under churn have been conducted in works like [6, 2].However, within these bounds, performance can vary substan-tially as a function of different design decisions and config-uration parameters. Hence simulation-based studies such as[5, 8, 3] often provide more realistic insights into the perfor-mance of DHTs. Relying on an understanding based on sim-ulations alone is however not satisfactory either, since in thiscase, the DHT is treated as a black box and is only empiricallyevaluated, under certain operation conditions. In this paper wepresent an alternative theoretical approach to analyzing and un-derstanding DHTs, which aims for an accurate prediction of performance, rather than on placing asymptotic performancebounds. Simulations are then used to verify all theoretical pre-dictions.Our approach is based on constructing and working withmaster equations, a widely used tool wherever the mathemati-cal theory of stochastic processes is applied to real-world phe-nomena [7]. We demonstrate the applicability of this approachto one specific DHT: Chord [9]. For Chord, it is natural to de-fine the state of the system as the state of all its nodes, wherethe state of an alive node is specified by the states of all itspointers. These pointers (either fingers or successors) are then ∗ This work is funded by the Swedish VINNOVA AMRAM and PPCprojects, the European IST-FET PEPITO and 6th FP EVERGROW projects. in one of three states: alive and correct, alive and incorrect orfailed. A master equation for this system is simply an equa-tion for the time evolution of the probability that the system isin a particular state. Writing such an equation involves keep-ing track of all the gain/loss terms which add/detract from thisprobability, given the details of the dynamics. This approachis applicable to any P2P system (or indeed any system with adiscrete set of states).Our main result is that, for every outgoing pointer of a Chordnode, we systematically compute the probability that it is inany one of the three possible states, by computing all the gainand loss terms that arise from the details of the Chord proto-col under churn. This probability is different for each of thesuccessor and finger pointers. We then use this information topredict both lookup consistency (number of failed lookups) aswell as lookup performance (latency) as a function of the pa-rameters involved. All our results are verified by simulations.The main novelty of our analysis is that it is carried out en-tirely from first principles  i.e.  all quantities are predicted solelyas a function of the parameters of the problem: the churn rate,the stabilization rate and the number of nodes in the system. Itthus differs from earlier related theoretical studies where quan-tities similar to those we predict, were either assumed to be given  [10], or  measured   numerically [1].Closest in spirit to our work is the informal derivation inthe srcinal Chord paper [9] of the average number of time-outs encountered by a lookup. This quantity was approximatedthere by the product of the average number of fingers used ina lookup times the probability that a given finger points to adeparted node. Our methodology not only allows us to de-rive the latter quantity rigorously but also demonstrates howthis probability depends on which finger (or successor) is in-volved. Further we are able to derive an exact relation relatingthis probability to lookup performance and consistency accu-rately at any value of the system parameters. 2 Assumptions & Definitions Basic Notation.  In what follows, we assume that the reader isfamiliar with Chord. However we introduce the notation usedbelow. We use K to mean the size of the Chord key space and N   the number of nodes. Let M = log 2 K be the number of fin-gers of a node and S  the length of the immediate successor list,usually set to a value  =  O (log( N  )) . We refer to nodes by their  keys, so a node  n  implies a node with key  n ∈ 0 ···K− 1 . Weuse  p  to refer to the predecessor,  s  for referring to the successorlist as a whole, and  s i  for the  i th successor. Data structures of different nodes are distinguished by prefixing them with a nodekey e.g.  n ′ .s 1 , etc. Let  fin i .start   denote the start of the  i th fin-ger (Where for a node  n , ∀ i ∈ 1 .. M ,  n.fin i .start  =  n +2 i − 1 )and  fin i .node  denote the actual node pointed to by that finger. Steady State Assumption.  λ  j  is the rate of joins per node, λ f   the rate of failures per node and  λ s  the rate of stabilizationsper node. We carry out our analysis for the general case whentherateofdoingsuccessorstabilizations αλ s , isnotnecessarilythe same as the rate at which finger stabilizations  (1 − α ) λ s are performed. In all that follows, we impose the steady statecondition  λ  j  =  λ f  . Further it is useful to define  r  ≡  λ s λ f  whichis the relevant ratio on which all the quantities we are interestedin will depend, e.g,  r  = 50  means that a join/fail event takesplace every half an hour for a stabilization which takes placeonce every  36  seconds. Parameters.  The parameters of the problem are hence:  K , N  ,  α  and  r . All relevant measurable quantities should be en-tirely expressible in terms of these parameters. Chord Simulation.  We use our own discrete event simula-tion environment implemented in Java which can be retrievedfrom [4]. We assume the familiarity of the reader with Chord,however an exact analysis necessitates the provision of a fewdetails. Successor stabilizations performed by a node n on n.s 1 accomplish two main goals:  i )  Retrieving the predecessor andsuccessor list of of   n.s 1  and reconciling with  n ’s state.  ii ) Informing  n.s 1  that  n  is alive/newly joined. A finger stabiliza-tion picks one finger at random and looks up its start. Lookupsdo not use the optimization of checking the successor list be-fore using the fingers. However, the successor list is used as alast resort if fingers could not provide progress. Lookups areassumed not to change the state of a node. For joins, a newnode  u  finds its successor  v  through some initial random con-tact and performs successor stabilization on that successor. Allfingers of  u that have v  as an acceptable finger node are set to v .The rest of the fingers are computed as best estimates from  v ′ s routing table. All failures are ungraceful. We make the simpli-fying assumption that communication delays due to a limitednumber of hops is much smaller than the average time intervalbetween joins, failures or stabilization events. However, we donot expect that the results will change much even if this werenot satisfied. Averaging.  Since we are collecting statistics like the proba-bility of a particular finger pointer to be wrong, we need to re-peateachexperiment 100 timesbeforeobtainingwell-averagedresults. The total simulation sequential real time for obtainingthe results of this paper was about  1800  hours that was par-allelized on a cluster of   14  nodes where we had  N   = 1000 , K = 2 20 , S   = 6 ,  200 ≤ r  ≤ 2000  and  0 . 25 ≤ α ≤ 0 . 75 . 3 The Analysis 3.1 Distribution of Inter-Node Distances During churn, the inter-node distance (the difference betweenthe keys of two consecutive nodes) is a fluctuating variable. Animportant quantity used throughout the analysis is the pdf of inter-node distances. We define this quantity below and statea theorem giving its functional form. We then mention threeproperties of this distribution which are needed in the ensuinganalysis. Due to space limitations, we omit the proof of thistheorem and the properties here and provide them in [4]. Definition 3.1  Let   Int ( x )  be the number of intervals of length x  , i.e. the number of pairs of consecutive nodes which are sep-arated by a distance of   x  keys on the ring. Theorem 3.1  For a process in which nodes join or leave withequal rates (and the number of nodes in the network is almost constant) independently of each other and uniformly on thering, The probability ( P  ( x )  ≡  Int ( x ) N   ) of finding an intervalof length  x  is: P  ( x ) =  ρ x − 1 (1 − ρ )  where  ρ  =  K− N  K  and   1 − ρ  =  N  K The derivation of the distribution  P  ( x )  is independent of anydetails of the Chord implementation and depends solely on the join and leave process. It is hence applicable to any DHT thatdeploys a ring. Property 3.1  For any two keys  u  and   v  , where  v  =  u  +  x  ,let   b i  be the probability that the first node encountered inbe-tween these two keys is at   u  +  i  (where  0  ≤  i < x − 1 ). Then b i  ≡  ρ i (1 − ρ ) . The probability that there is definitely atleast one node between u and  v  is:  a ( x ) ≡ 1 − ρ x . Hence the condi-tional probability that the first node is at a distance  i  given that there is atleast one node in the interval is  bc ( i,x ) ≡ b ( i ) /a ( x ) . Property 3.2  The probability that a node and atleast oneof its immediate predecessors share the same  k th  finger is  p 1 ( k )  ≡  ρ 1+ ρ (1  −  ρ 2 k − 2 ) . This is  ∼  1 / 2  for   K  >>  1  and  N <<  K .Clearly  p 1  = 0  for   k  = 1 . It is straightforward (though tedious) to derive similar expressions for   p 2 ( k )  the probability that a node and atleast two of its immediate pre-decessors share the same  k th  finger,  p 3 ( k )  and so on. Property 3.3  We can similarly assess the probability that the join protocol (see previous section) results in further replica-tion of the  k th  pointer. That is, the probability that a newly joined node will choose the  k th entry of its successor’s finger table as its own  k th entry is  p  j oin ( k ) ∼ ρ (1 − ρ 2 k − 2 − 2 )+(1 − ρ )(1 − ρ 2 k − 2 − 2 ) − (1 − ρ ) ρ (2 k − 2 − 2) ρ 2 k − 2 − 3 . The function  p  j oin ( k ) = 0  for small  k  and   1  for large  k .   0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 200 400 600 800 1000 1200 1400 1600 1800 2000   w    1    (  r ,      α    ) ,   d    1    (  r ,      α    ) Rate of Stabilisation /Rate of failure (r= λ s  /  λ f  )w 1 (r,0.25) Simulationw 1 (r,0.5) Simulationw 1 (r,0.75) Simulationw 1 (r, 0.25 ) Theoryw 1 (r, 0.5 ) Theoryw 1 (r, 0.75 ) Theoryd 1 (r,0.75) Simulationd 1 (r, 0.75) Theory 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 0.022 200 400 600 800 1000 1200 1400 1600 1800 2000    I   (  r ,      α    ) Rate of Stabilisation of Successors/Rate of failure ( α r= αλ s  /  λ f  )I(r,0.25) SimulationI(r,0.5) SimulationI(r,0.75) SimulationI(r, 0.25 ) theoryI(r, 0.5 ) theoryI(r, 0.75 ) theory Figure 1: Theory and Simulation for  w 1 ( r,α ) ,  d 1 ( r,α ) ,  I  ( r,α ) Figure 2: Changes in  W  1 , the number of wrong (failed or out-dated)  s 1  pointers, due to joins, failures and stabilizations. 3.2 Successor Pointers In order to get a master-equation description which keeps allthe details of the system and is still tractable, we make theansatz that the state of the system is the product of the statesof its nodes, which in turn is the product of the states of allits pointers. As we will see this ansatz works very well. Nowwe need only consider how many kinds of pointers there arein the system and the states these can be in. Consider first thesuccessor pointers.Let  w k ( r,α ) ,  d k ( r,α )  denote the fraction of nodes hav-ing a  wrong  k th successor pointer or a  failed   one respectivelyand  W  k ( r,α ) ,  D k ( r,α )  be the respective  numbers  . A  failed  pointer is one which points to a departed node and a  wrong pointer points either to an incorrect node (alive but not correct)or a dead one. As we will see, both these quantities play a rolein predicting lookup consistency and lookup length.By the protocol for stabilizing successors in Chord, a nodeChange in  W  1 ( r,α )  Rate of Change W  1 ( t  + ∆ t ) =  W  1 ( t ) + 1  c 1  = ( λ  j ∆ t )(1 − w 1 ) W  1 ( t  + ∆ t ) =  W  1 ( t ) + 1  c 2  =  λ f  (1 − w 1 ) 2 ∆ tW  1 ( t  + ∆ t ) =  W  1 ( t ) − 1  c 3  =  λ f  w 21 ∆ tW  1 ( t  + ∆ t ) =  W  1 ( t ) − 1  c 4  =  αλ s w 1 ∆ tW  1 ( t  + ∆ t ) =  W  1 ( t ) 1 − ( c 1  +  c 2  +  c 3  +  c 4 ) Table 1: Gain and loss terms for  W  1 ( r,α ) : the number of wrong first successors as a function of   r  and  α .periodically contacts its first successor, possibly correcting itand reconciling with its successor list. Therefore, the numberof wrong  k th successor pointers are not independent quantitiesbut depend on the number of wrong first successor pointers.We consider only  s 1  here.We write an equation for  W  1 ( r,α )  by accounting for all theevents that can change it in a micro event of time  ∆ t . An illus-tration of the different cases in which changes in  W  1  take placedue to joins, failures and stabilizations is provided in figure 2.In some cases  W  1  increases/decreases while in others it staysunchanged. For each increase/decrease, table 1 provides thecorresponding probability.By our implementation of the join protocol, a new node  n y , joining between two nodes  n x  and  n z , has its  s 1  pointer alwayscorrectafterthejoin. Howeverthestateof  n x .s 1  beforethejoinmakes a difference. If  n x .s 1  was correct (pointing to n z ) beforethe join, then after the join it will be wrong and therefore  W  1 increases by  1 . If   n x .s 1  was wrong before the join, then it willremain wrong after the join and  W  1  is unaffected. Thus, weneed to account for the former case only. The probability that n x .s 1  is correct is  1 − w 1  and from that follows the term  c 1 .For failures, we have  4  cases. To illustrate them we usenodes  n x ,  n y ,  n z  and assume that  n y  is going to fail. First,  if both  n x .s 1  and  n y .s 1  were correct, then the failure of   n y will make  n x .s 1  wrong and hence  W  1  increases by  1 . Sec-ond, if   n x .s 1  and  n y .s 1  were both wrong, then the failure of   n y will decrease  W  1  by one, since one wrong pointer disappears.Third, if   n x .s 1  was wrong and  n y .s 1  was correct, then  W  1  isunaffected. Fourth, if   n x .s 1  was correct and  n y .s 1  was wrong,then the wrong pointer of   n y  disappeared and  n x .s 1  becamewrong, therefore  W  1  is unaffected. For the first case to happen,we need to pick two nodes with correct pointers, the probabil-ity of this is  (1 − w 1 ) 2 . For the second case to happen, we needto pick two nodes with wrong pointers, the probability of thisis  w 21 . From these probabilities follow the terms  c 2  and  c 3 .Finally, a successor stabilization does not affect  W  1 , unlessthe stabilizing node had a wrong pointer. The probability of picking such a node is  w 1 . From this follows the term  c 4 .Hence the equation for  W  1 ( r,α )  is: dW  1 dt  =  λ  j (1 − w 1 ) +  λ f  (1 − w 1 ) 2 − λ f  w 21 − αλ s w 1 Solving for  w 1  in the steady state and putting  λ  j  =  λ f  , we get: w 1 ( r,α ) = 23 +  rα  ≈  2 rα  (1)This expression matches well with the simulation results asshown in figure 1.  d 1 ( r,α )  is then  ≈  12 w 1 ( r,α )  since when λ  j  =  λ f  , about half the number of wrong pointers are incorrectand about half point to dead nodes. Thus  d 1 ( r,α ) ≈  1 rα  whichalso matches well the simulations as shown in figure 1. We canalso use the above reasoning to iteratively get  w k ( r,α )  for any k . Lookup Consistency  By the lookup protocol, a lookup isinconsistent if the immediate predecessor of the sought keyhas an wrong  s 1  pointer. However, we need only consider thecase when the  s 1  pointer is pointing to an alive (but incorrect)node since our implementation of the protocol always requiresthe lookup to return an alive node as an answer to the query.The probability that a lookup is inconsistent  I  ( r,α )  is hence w 1 ( r,α )  −  d 1 ( r,α ) . This prediction matches the simulationresults very well, as shown in figure 1. 3.3 Failure of Fingers We now turn to estimating the fraction of finger pointers whichpoint to failed nodes. As we will see this is an important quan-tity for predicting lookups. Unlike members of the successorlist, alive fingers even if outdated, always bring a query closerto the destination and do not affect consistency. Therefore weconsider fingers in only two states, alive or dead (failed).Let f  k ( r,α )  denote the fractionofnodes having their k th fin-ger pointing to a failed node and  F  k ( r,α )  denote the respectiveFigure 4: Changes in  F  k , the number of failed  fin k  pointers,due to joins, failures and stabilizations. F  k ( t  + ∆ t )  Rate of Change =  F  k ( t ) + 1  c 1  = ( λ  j ∆ t )  p  j oin ( k ) f  k =  F  k ( t ) − 1  c 2  = (1 − α )  1 M f  k ( λ s ∆ t )=  F  k ( t ) + 1  c 3  = (1 − f  k ) 2 [1 −  p 1 ( k )]( λ f  ∆ t )=  F  k ( t ) + 2  c 4  = (1 − f  k ) 2 (  p 1 ( k ) −  p 2 ( k ))( λ f  ∆ t )=  F  k ( t ) + 3  c 5  = (1 − f  k ) 2 (  p 2 ( k ) −  p 3 ( k ))( λ f  ∆ t )=  F  k ( t ) 1 − ( c 1  +  c 2  +  c 3  +  c 4  +  c 5 ) Table 2: Some of the relevant gain and loss terms for  F  k , thenumber of nodes whose  kth  fingers are pointing to a failednode for  k >  1 .number. For notational simplicity, we write these as simply  F  k and  f  k . We can predict this function for any  k  by again esti-mating the gain and loss terms for this quantity, caused by a join, failure or stabilization event, and keeping only the mostrelevant terms. These are listed in table 2.A join event can play a role here by increasing the numberof   F  k  pointers if the successor of the joinee had a failed  k th pointer (occurs with probability  f  k ) and the joinee replicatedthis from the successor (occurs with probability  p  j oin ( k )  fromproperty 3.3).A stabilization evicts a failed pointer if there was one to be-gin with. The stabilization rate is divided by M , since a nodestabilizes any one finger randomly, every time it decides to sta-bilize a finger at rate  (1 − α ) λ s .Given a node  n  with an alive  k th finger (occurs with prob-ability  1 − f  k ), when the node pointed to by that finger fails,the number of failed  k th fingers ( F  k ) increases. The amountof this increase depends on the number of immediate predeces-sors of   n  that were pointing to the failed node with their  k th finger. That number of predecessors could be  0 ,  1 ,  2 ,.. etc. Us-ing property 3.2 the respective probabilities of those cases are: 1 −  p 1 ( k ) ,  p 1 ( k ) −  p 2 ( k ) ,  p 2 ( k ) −  p 3 ( k ) ,... etc.   0 0.05 0.1 0.15 0.2 0.25 0.3 100 200 300 400 500 600 700 800 900 1000    f    k    (  r ,      α    ) Rate of Stabilisation of Fingers/Rate of failure ((1- α )r=(1- α ) λ s  /  λ f  )f  7 (r,0.5) Simulationf  7 (r,0.5) Theoryf  9 (r,0.5) Simulationf  9 (r,0.5) Theoryf  11 (r,0.5) Simulationf  11 (r,0.5) Theoryf  14 (r,0.5) Simulationf  14 (r,0.5) Theory 6 6.5 7 7.5 8 8.5 9 9.5 10 0 100 200 300 400 500 600 700 800 900 1000    L  o  o   k  u  p   l  a   t  e  n  c  y   (   h  o  p  s   +   t   i  m  e  o  u   t  s   )   L   (  r ,      α    ) Rate of Stabilisation of Fingers/Rate of failure ((1- α )r=(1- α ) λ s  /  λ f  )L(r,0.5) SimulationL(r,0.5) Theory Figure 3: Theory and Simulation for  f  k ( r,α ) , and  L ( r,α ) Solving for  f  k  in the steady state, we get: f  k  =  2 ˜ P  rep ( k ) + 2 −  p  j oin ( k ) +  r (1 − α ) M  2(1 + ˜ P  rep ( k )) −   2 ˜ P  rep ( k ) + 2 −  p  j oin ( k ) +  r (1 − α ) M  2 − 4(1 + ˜ P  rep ( k )) 2 2(1 + ˜ P  rep ( k )) (2)where  ˜ P  rep ( k ) = Σ  p i ( k ) . In principle its enough to keepeven three terms in the sum. The above expressions match verywell with the simulation results (figure 3). 3.4 Cost of Finger Stabilizations and Lookups In this section, we demonstrate how the information about thefailed fingers and successors can be used to predict the costof stabilizations, lookups or in general the cost for reachingany key in the id space. By cost we mean the number of hopsneeded to reach the destination  including  the number of time-outs encountered en-route. For this analysis, we consider time-outs and hops to add equally to the cost. We can easily gener-alize this analysis to investigate the case when a timeout costssome factor  n  times the cost of a hop.Define  C  t ( r,α )  (also denoted  C  t ) to be the expected cost fora given node to reach some target key which is  t  keys awayfrom it (which means reaching the first successor of this key).For example,  C  1  would then be the cost of looking up the adja-cent key ( 1  key away). Since the adjacent key is always storedatthefirstalivesuccessor, thereforeifthefirstsuccessorisalive(occurs with probability  1 − d 1 ), the cost will be  1  hop. If thefirst successor is dead but the second is alive (occurs with prob-ability  d 1 (1 − d 2 ) ), the cost will be 1 hop + 1 timeout =  2  andthe  expected   cost is  2 × d 1 (1 − d 2 )  and so forth. Therefore, wehave  C  1  = 1 − d 1 +2 × d 1 (1 − d 2 )+3 × d 1 d 2 (1 − d 3 )+ ···≈ 1 +  d 1  = 1 + 1 / ( αr ) .For finding the expected cost of reaching a general distance t  we need to follow closely the Chord protocol, which wouldlookup  t  by first finding the closest preceding finger. For no-tational simplicity, let us define  ξ   to be the start of the finger(say the  k th ) that most closely precedes  t . Thus  t  =  ξ   +  m ,i.e. there are  m  keys between the sought target  t  and the startof the most closely preceding finger. With that, we can write arecursion relation for  C  ξ + m  as follows: C  ξ + m  =  C  ξ  [1 − a ( m )]+ (1 − f  k )  a ( m ) + m  i =1 b m +1 − i C  i  +  f  k a ( m )  1 + k − 1  i =1 h k ( i ) ξ/ 2 i  l =1 bc ( l,ξ/ 2 i )(1 +  C  ξ i +1 − l + m ) + 2 h k ( k )  (3)where  ξ  i  ≡   m =1 ,i  ξ/ 2 m and  h k ( i )  is the probability thata node is forced to use its  k  − i th finger owing to the deathof its  k th finger. The probabilities  a,b,bc  have already beenintroduced in section 3.The lookup equation though rather complicated at first sightmerely accounts for all the possibilities that a Chord lookupwill encounter, and deals with them exactly as the protocol dic-tates. The first term accounts for the eventuality that there is nonode intervening between  ξ   and  ξ   +  m  (occurs with probabil-ity  1 − a ( m ) ). In this case, the cost of looking for  ξ   +  m  isthe same as the cost for looking for  ξ  . The second term ac-counts for the situation when a node does intervene inbetween
Search
Similar documents
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks