Religious & Philosophical

An efficient logging algorithm for incremental replay of message-passing applications

An efficient logging algorithm for incremental replay of message-passing applications
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  An Efficient Logging Algorithm for IncrementalReplay of Message-Passing Applications Franco ZambonelliDipartimento di Scienze dell’Ingegneria – Universit`a di Modena e Reggio EmiliaVia Campi 213-b – 41100 Modena, Italyfranco.zambonelli@unimo.itRobert H.B. NetzerDepartment of Computer ScienceBrown University, Providence (RI) Abstract To support incremental replay of message-passing ap- plications, processes must periodically checkpoint and thecontent of some messages must be logged, to break depen-dencies of the current state of the execution on past events.The paper presents a new adaptive logging algorithm that dynamically decides whether to log a message based on de- pendencies the incoming message introduces on past eventsof the execution. The paper discusses the implementationissues of the algorithm and evaluates its performances onseveral applications, showing how it improves previouslyknown schemes. 1. Introduction Debugging 1 long-running parallel/distributed programsrequires the capability of incremental replay, i.e., of replay-ing selected intervals of an execution. Because programsthat last hours or days are common, one should not beforced to replay the whole execution from the beginning toisolate a bug that manifested itself in a well defined sectionof the program.To permit incrementalreplay,each process must periodi-callycheckpoint,i.e., saveits computationalstateto a stable 1 Copyright 1999 IEEE. Published in the Proceedings of IPPS/SPDP99, April 1999 at San Juan, Puerto Rico. Personal use of this materialis permitted. However, permission to reprint/republish this material foradvertising or promotional purposes or for creating new collective worksfor resale or redistribution to servers or lists, or to reuse any copyrightedcomponent of this work in other works, must be obtained from the IEEE.Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Tele-phone: + Intl. 732-562-3966. storage, to allow its execution to be restarted from any oneof its checkpoints and not from the beginning. Though co-ordinated checkpointing strategies are possible and widelyexplored [1], the paper addresses independent checkpoint-ing: application processes can checkpoint independently of each other and without any coordination [7, 8].In the message-passing model, several problems arise inreplaying the execution of a process from one of its check-points. On the one hand, the orderof messagedeliverymustbe traced and preserved during the replay, to grant deter-ministic re-execution. On the other hand, all the messagesreceivedbyaprocessafterthecheckpointofinterestandun-til the next one need to be reproduced. The paper assumestoolsareavailableforpreservingthedeliveryorder– widelystudied in past works [4, 5] – and focuses on the latter prob-lem. Two main techniques are possible: (i) all the messagesreceived by a process are logged and restored during thereplay [4]; (ii) the intervals of those processes from whichmessages have been received during the replay interval arere-executed too, in order to re-compute the messages andsend them again [9]. The former technique is ineffective,because logging a message has high costs in both executiontime and storage space. The latter technique, even if it wereto introduce no overhead in the execution, does not grantany bound on the amount of computation that must be re-executed to replay a given interval: in the worst case, thewhole executionmay need to be replayed to re-compute theneeded messages.An alternative technique, known as adaptive messagelogging , can significantly reduce the logging effort whilelimiting the amount of computation needed to replay. Thisis done by introducing on-line algorithms to dynamicallydetect whether a message needs to be logged, on the basis  of the dependencies on past events of the execution that itintroduces on the receiver process [7]. Dependency infor-mation can be made dynamically available to processes bypiggybacking it into the computation messages, thus avoid-ing the overhead of additional control messages.This paper presents a new adaptive logging algorithmthat grants a bound on the amount of computation neededto replay any given interval of the execution. The presentedalgorithm improves previously known logging algorithmsin several ways: (i) it logs, in the average, a lower percent-age of messages, as confirmed by tests on a set of message-passing applications; (ii) its behavior can be tuned to meetuser needs; (iii) it prevents deadlocks during replay, a prob-lem with some past schemes. 2. The model A parallel/distributed program based on message-passingcanbemodeledas a setofprocesses   ¢¡¤£¥§¦¨£©©©£¥  § that execute either internal or communication events. Inter-nal events of a process can include local checkpoint events.The paper indicates as  the !" the checkpoint takenby a process    and as # its !" checkpoint interval, i.e.,the set of all the events included between  and $¡ .Communication events include message sending and re-ceiving. Message delivery is not required to be FIFO.Given a program, different executions are possible, de-pending on both the order in which messages are deliveredand the time at which processes take local checkpoints. 2.1. The replay dependence relation Given an execution of a checkpointed message-passingprogram, the replay dependence relation ( %'&(!) ) shows howevents would depend on one another during a replay [7].An event 0 is said to be replay dependent  on an event 1 ( 1%'&(2)0 ) if and only if  1 must be re-executed before 0 can be re-executed, either because 1 precedes 0 in the sameprocess and no checkpointing occurs between 1 and 0 orbecause a sequence of  unlogged  messages was sent from 1 (or a following event) to 0 (or a preceding event). Therelation is transitive.Let us consider the execution in figure 1, where horizon-tal arrows represent the execution of each process, black dots represent local checkpoints, and inter-process arrowsrepresent messages exchanged between processes (solid ar-rows for unlogged messages, dashed arrows for loggedones). In this execution, the receipt of  354 makes the suc-cessive events of    §¦ (until its next checkpoint ¦¦ ) replaydependenton the events of    §6 precedingthe sending of  374 .The receipt of  398 from @¦ makes   ¢¡ replay dependent on   §¦ and, transitively, on   @6 .The %'&(A) relation is included ( B ) but not equivalent to P3P2P1 m1m2m4 m6 Left Frontier Right FrontierC CC CC C CCC 1,11,2 1,32,02,1 2,2 2,33,13,2 CC 1,03,0 m3m5m7 Figure 1. the replay set of #¦¡ the happened before ( CD(!) ) relation [1]. In fact: (i) the firstevent after a checkpoint is not replay dependent on the lastevent before that checkpoint; (ii) a logged messagedoes notintroduce any replay dependence relation. With referenceto figure 1, events of  ¦ past ¦¡ are not replay dependenton any event of    @¦ preceding ¦¡ . Events of  @6 are notdependent on events of  §¦ past ¦¦ because 37E is logged. 2.2. The replay set The replay dependence relation can be extended tocheckpoint intervals. The notation #%'&(2)#GFH indicatesthat there are events in #¥FH that are replay dependent on # . Then, # must be replayed in order to replay #¥FH .Note that the relation #I%'&(!)#FH does not imply that allthe events of  #P introduce dependencies on #FH . In figure1, #¦¡ is replay dependent on #6Q and, then, #6Q must bere-executed to replay #¦¡ . However, events of  #6Q past thesending of  354 do not introduce dependencies on #¦¡ .Given one interval # of an execution, there exists a set of checkpoint intervals that introduce replay dependence on itand need to be re-executed when replaying # . We call thisthe replay set  associated with the interval. Formally: Definition: For a given checkpoint interval # of an execu-tion, the replay set  RTSVU#XW is the set of intervals RTSVU#XWTY `#ba#%'&(!)#dc In addition, the two sets of the earliest and of the lat-est checkpoints that delimitate the extension of the re-play set are defined as its left  and right frontiers , respec-tively. For example (figure 1), the replay set of the in-terval #¦¡ , (emphasized by width lines) includes the in-tervals #¡Q£#¡¦¨£#¦¡¤£#6Q£#6¡ ; its left frontier is com-posed of  ¡Q£¦¡£6Q ; its right frontier is composed of  ¡6£¦¦£6¦ .  3. The algorithm The proposed adaptive logging algorithm is fully dis-tributed: a process locally decides whether to log a messageor not at the moment the message is received. The deci-sion is based on information, locally stored by each pro-cess, about the replay set of its current checkpoint intervaland on information, piggybacked with each message, aboutthe replay set of the sender.Thebasic ideaofthe algorithmissimple: becausea mes-sage transitively induces on the receiver the replay depen-dence relations of the sender, a receiver will log a messageif it would make the size of its replay set – i.e., the numberof checkpoint intervals it is composed of – increase over atoleratedbound. Note that the replay set relationis includedin the happened before one and, then, is detectable on-line[1]. Because the proposed algorithm on-line detects the ex-act shapeofthe replay set,it canbe definedas full informed  .To keep track of its replay set, each process stores, forall the processes it is replay dependent on, the indexes of the associatedcheckpoint intervalsthat introduce replay de-pendence on it.At any new checkpoint, all replay dependence relationsare voided, i.e., for an interval # , ReSfU#¨Whgi# , because aninterval is always replay dependent on itself. At any sendevent, the information about the local replay set is piggy-backed with the message. At any received and not loggedmessage 3 , the new replay set of the receiver becomes theunion of its current replay set and of the one of the sender(piggybackedwith 3 and denoted as 3©RTS ), i.e., for an in-terval # , ReSfU#¨WVpqReSfU#¨W¤rs3 ©RTS . The intervals that arepresent in both replay sets are counted once in the union.If the message is logged, it does not introduce any new de-pendency and does not require the update of the replay set.This scheme can be summarized as follows: m=receive();if(size_of_union_of_RSs(RS, m.RS)>Bound)log(m);elseRS=compute_union_of_RSs(RS, m.RS));fi In general, the replay set of a process is likely to grow insize as the execution proceeds in a checkpoint interval andnew messages are received. However, the proposed algo-rithm bounds the size of the replay set and, consequently,bounds the amount of information needed to keep track of it. The maximum amount of information one process willever store and piggyback is tvuxwy! integers, where t is the number of application processes and wy2 is thetolerated size of the replay set. For example, in an applica-tion composed of 16 processes, a bound of 16 for the sizeof the replay set requires to store and piggyback at most 32integers with each message. Table 1. description of the test programs ProgramExecutionExchangedAvg.MessageTime(sec)Data(Mbytes)Size(bytes)matrixdeterminant48.343.13183fastfouriertransform417.0243.223233finitedifferences199.419.01241circuittestgenerator144.035.21641VLSIchannelrouter1358.067.9182 4. Evaluation To evaluate the effectiveness of the presented algorithm,we adopted five message-passing programs as testbeds andsimulated the execution of the algorithm from messagetraces of (non-checkpointed) executions of each program.The test programs, developed for a 16-node   Se¨ hy-percube, include programs to compute the determinant of a matrix, the fast fourier transform and finite differencesover a grid; a circuit test generator and a S¢# channelrouter. Table 1 reports the basic characteristics of the testprograms, which are heterogeneous in both execution timesandamount ofdata exchanged, aswell asin expressedcom-munication patterns. Then, though the programs are notverylong running, they can be consideredrepresentativeforthe evaluation of the replay algorithms, whose behaviour isdeterminedbythecommunicationpatternsofanapplicationrather than by the global execution time.Checkpoint events have been artificially inserted in themessage traces with different time periods: the interval be-tween two checkpoints in a process has been varied from1% to 50% of the application execution time. This rangecovers all practical cases and more: from a checkpoint ev-ery few seconds (see table 1) up to just one or two check-points in the whole execution. Checkpoints have been in-serted in the traces with a random skew from their basiccheckpoint period, to simulate the likely behavior of unco-ordinated checkpointing.Three indicators are significant towards the evaluationof the algorithm: (i) the percentage of messages logged dur-ing an execution measures the on-line replay cost, i.e., thelogging overhead; (ii) the average and (iii) the maximumnumber of replay intervals per process required to replaythe intervals of an execution (i.e., by considering all the in-tervals, the average size of the associated replay sets andthe size of the largest one, divided by the total number of processes) measure the average and the worst case off-linereplay costs, respectively.One could criticize different metrics need to be intro-duced toward the effectiveness of the replay, such as thelength of the longest sequential path needed to replay [6].However, most of todays parallel and distributed architec-tures are not widely available at a cheap cost, and the replayactivity cannot assume the availability of parallel executing  resources. That makesit preferable to limit the total amountof computation rather than the parallel execution time, theformer measure being independent on the amount of avail-able computing resources. 4.1. Evaluation of the fullinformed algorithm Thefive testprogramsexhibitsimilarbehaviorsw.r.t. theappliance of the full informed algorithm. For this reason,the data relative to the different programs have been ag-gregated and averaged, to alleviate the presentation withoutsignificant loss of information.Figure 2 shows the average and the maximum numberof replay intervals per process, applying the full informedalgorithm with different bounds on the size of the replayset. These figures also report the number of intervals re-quired to replay in absence of any logging algorithm (nologging case). Figure 3 plots the corresponding percentageof logged messages.As a first consideration, one can see from figure 2 thatthe algorithm generally achieves a significant reduction –w.r.t. the no logging case – in the amount of checkpointintervals needed to replay, both in the average and in theworst case. However, the larger the application checkpointperiod (and the length of checkpoint intervals) the less therelative reduction achieved by the algorithm – w.r.t. the nologgingcase – in the amount of checkpointintervalsneededto replay. In the case of very large checkpoint periods, theon-line logging efforts (mostly independent on the check-point period, as from figure 3), are not counterbalanced bya comparable reduction of the off-line replay costs. Thisidentifiesageneralrequirementforeffectiveincrementalre-play rather than a peculiar limit of the proposed algorithm:an application must checkpoint frequently enough to makecheckpointintervalssignificantlyshorterthanthe global ex-ecution time.Apart from the above extreme situation, the behavior of the algorithm depends on the imposed bound on the size of the replay set. A too strict bound forces the algorithm to loga high amount of messages: for example, a bound of 16 in-tervalsfor thesizeofthereplay setcausesloggingabout 25-35% of the messages (figure 3). In this case, however, boththe average and the maximum number of replay intervalsper process are kept low, granting fast and low-cost incre-mental replay (figure 2). Larger bounds reduce the amountof logged messages and permit limiting the computation re-quired for replay. A bound of 32 intervals on the size of thereplay set limits the averagenumber of replay intervals(fig-ure 2-up) to about 1 per process and reduces the percentageof logged messages to 10-15% (figure 3). The worst case(figure 2-down) is bounded by 2.The possibility of tuning the internal parameters of thealgorithm permits users to select the preferred trade-off be-tweenon-line(logging)andoff-line(amountofreplayinter- 01234567805101520253035404550    A  v  e  r  a  g  e   #  o   f   I  n   t  e  r  v  a   l  s Checkpoint period (%)8 intervals16 intervals24 intervals32 intervals40 intervalsno logging0510152005101520253035404550    M  a  x   i  m  u  m   #  o   f   I  n   t  e  r  v  a   l  s Checkpoint period (%)8 intervals16 intervals24 intervals32 intervals40 intervalsno logging Figure 2. full informed algorithm: average(up) and maximum (down) number of replayintervals per process depending on the toler-ated size of the replay set vals) replay costs by selecting the most appropriate boundon the size of the replay set. If a user wants to minimize theon-line logging overhead, (s)he can choose a large boundfor the replay set, tolerating a slower and more expensivereplay activity. Conversely, if a user is in need of fast andcheap replay, (s)he can impose a very strict bound on thesize of the replay set, paying the price of a higher on-lineoverhead. 4.2. Comparison with the domino algorithm The domino algorithm for adaptive logging, proposed in[7], does not aim to exactly compute the replay set but onlyits left frontier. A vector of checkpoint indexes is locallystored by each process and piggybacked with messages, totrack the earliest checkpoint interval of each process, if any,onwhichthecurrent intervalis replaydependent. A processlogs a message if it introduces replay dependencies on pastintervals of the process itself, i.e., domino dependencies.  010203040506005101520253035404550    L  o  g  g  e   d   M  e  s  s  a  g  e  s   (   %   ) Checkpoint period (%)8 intervals16 intervals24 intervals32 intervals40 intervals Figure 3. full informed algorithm: percentageof logged messages depending on the toler-ated size of the replay set As a first consideration, the domino algorithm is lessflexible than the full informed one, because the loggingfunction cannot be parameterized, thus precluding tuningthe algorithm behavior to user needs. In addition, evaluatedwith the same test programs, it exhibits worse performanceand can also deadlock the replay, as discusses later.By setting the bound of the full informed algorithm sothat it logs about the same percentage of messages as thedomino algorithm, the two algorithms behave comparablyw.r.t. the average number of replay intervals per process(figure 4-up). Instead, the maximum number of replay in-tervals per process is lower for the full informed algorithm(figure 4-down). In other words, with the same on-linecosts, the full informed algorithm grants a lower worst casefor the off-line replay costs.By setting the bound in the full informed algorithm sothat the maximum number of replay intervals per process isabout the same in the two algorithms (40 intervals, 2,5 perprocess), the full informed algorithm logs a lower percent-ageofmessages(figure5). Inotherwords, thefullinformedalgorithm induces lower on-line replay costs to grant thesame worst case for the off-line replay costs.One could criticize that the performance improvementof the full informed algorithm over the domino one is notsignificant enough to justify its adoption, especially consid-ering that the domino algorithm needs a reduced and fixedamount of information to be piggybacked with messages (avector of checkpoint indexes). In our opinion, instead, re-ducing the amount of logging is more important than reduc-ing by a few bytes the length of application messages. Withreference to table 1, one can see that a reduction of 10% inlogged messages saves several Mbytes of disk storage (andthe cost of accessing it). On the other hand, increasing by a 01234567805101520253035404550    A  v  e  r  a  g  e   #  o   f   I  n   t  e  r  v  a   l  s Checkpoint period (%)no loggingdomino algorithmfull informed - 32 intervals0510152005101520253035404550    M  a  x   i  m  u  m   #  o   f   I  n   t  e  r  v  a   l  s Checkpoint period (%)no loggingdomino algorithmfull informed - 32 intervals Figure 4. comparison of the algorithms: av-erage (up) and maximum (down) number ofreplay intervals per process when loggingabout 15% of messages few bytes the amount of piggybacked information inducesa limited additional overhead on applications, especially inthe presence of efficient interconnection networks. 5. Replay schemes After an execution, to replay a given interval, its execu-tion must be resumed from the corresponding checkpoint,togetherwith the executionof all the intervalsthat belongtoits replay set. Depending on the logging algorithm adopted,the post-mortem detection of these intervals and their re-execution introduce different problems. Replay Scheme 1 : In the full informed algorithm, eachprocess has available on-line exact information about itscurrent replay set. This information,if storedat eachcheck-point, trivially grants the post-mortem detection of the re-play set of each interval. Then, to execute a replay requiresonly to resume the execution of each interval of the replayset from the corresponding checkpoint. With reference to
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!