A semi-partitioned approach for parallel real-time scheduling

A semi-partitioned approach for parallel real-time scheduling
of 10
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  A semi-partitioned approach forparallel real-time scheduling Benjamin Bado Université Libre de Bruxelles benjamin.bado@ulb.ac.beLaurent George ECE Paris - LACSC laurent.george@ece.frPierre Courbin ECE Paris - LACSC pierre.courbin@ece.frJoël Goossens Université Libre de Bruxelles  joel.goossens@ulb.ac.be ABSTRACT In this paper, we consider the problem of scheduling peri-odic Multi-Phase Multi-Thread tasks on a set of   m  identicalprocessors with Earliest Deadline First (EDF) scheduling.Each periodic task is defined by a sequence of phases withoffsets that can be possibly parallelized. We use a portionedsemi-partitioned approach with migrations at local dead-lines assigned to each phase. We extend this approach totake into account phase parallelism. The phase parallelismwe consider is an extension of the popular job parallelism.A phase, if parallelizable, can be decomposed into parallelthreads run on a configurable number of processors. Weonly require simultaneous execution of threads inside a win-dow equal to the local deadline of their associated phase. Todecide on the schedulability of a Multi-Phase Multi-Threadtask, we extend the popular uniprocessor EDF feasibilitycondition for periodic asynchronous tasks. We propose twonew schedulability tests for EDF that significantly improvethe well known Leung and Merill feasibility test based on thefeasibility interval [ O min ,O max  +2 P  ], where  O min  and  O max are respectively the minimum and maximum phase offsetsand  P   the least common multiple of the task periods. Thefirst schedulability test is used when an EDF simulation isneeded and gives, by simulation, a 44% gain in simulationspeed. The second method provides a sufficient schedula-bility test with a time interval of length  P   based on thedemand bound function. Finally, we study three local dead-line assignment heuristics assigned to parallelizable phases.We compare and analyze the performances obtained by sim-ulation for those three local deadline assignment heuristics. Keywords real-time scheduling theory Categories and Subject Descriptors C.3 [ Real-time and embedded systems ]: Permission to make digital or hard copies of part or all of this work forpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmade or distributed for profit or commercial advantage and that copies bearthis notice and the full citation on the first page. Copyrights for componentsof this work owned by others than ACM must be honored. Abstracting withcredit is permitted. To copy otherwise, to republish, to post on servers or toredistribute to lists, requires prior specific permission and/or a fee.  RTNS’12,  November 08–09 2012, Pont a Mousson, FranceCopyright 2012 ACM 978-1-4503-1409-1/12/11 ...$15.00. General Terms Theory 1. INTRODUCTION The study of parallel multiprocessor real-time systems isquite recent, and few results have yet been found. How-ever there exist different models that represent this problem.Parallelism adds the possibility for some specific parts of thetasks to be executed at the same time on different proces-sors. In this article we focus on the problem of multiproces-sor parallel real-time systems for tasks following a periodicMulti-Phase Multi-Thread task model (see Section 2.1).Each task is represented as a sequence of parallel and se-quential executions. After a sequential execution, a fork isdone, which means splitting the phase into threads that areexecuted on a set of processors. After a parallel execution,a join is made to synchronize and end the parallel threads.There are three different approaches for scheduling taskson multiprocessor system: global scheduling, partitionedscheduling and semi-partitioned scheduling. Global scheduling.  This approach aims to find a globalstrategy for scheduling jobs (task instances), which are al-lowed to migrate between processors. Partitionedscheduling.  This approach aims to find a strat-egy for partitioning tasks on processors; once a task is as-signed to a processor, it cannot migrate. Once the distribu-tion of the tasks has been done, a uniprocessor schedulingfeasibility condition is used to decide on the schedulabilityof the system. Semi-Partitioned scheduling.  This approach aims to as-sign tasks in such a way that the migrations of a task followa migration pattern. There are two approaches: •  One where a job can be executed on only one processor,but an other job of the same task can be executed ondifferent processors. This approach is also referred toas the restricted migration case [4]. •  One where the Worst Case Execution Time (WCET)of a job is split into different parts that can be assignedto different processors. This approach is known as jobportion migration [9]. In this paper, we focus on this approach.  Besides there are three categories of parallel tasks: (1)rigid: the number of processors simultaneously assigned toa task is fixed a priori and cannot change over time, (2)moldable: the number of processors assigned simultaneouslyto a task can vary, but it does not vary with time for agiven job, (3) malleable: the number of processors assignedsimultaneously to a job can vary.In this paper we focus on Semi-Partitioned schedulingwith moldable parallel tasks. 1.1 Our contributions Parallel execution means, in the state-of-the-art, simulta-neous executions of threads. In this paper, we extend theparallelism of a job by requiring that all parallel threadsassociated to a phase must be executed in the same timeinterval but not necessarily at the same time for all theirdurations. In our model, we split tasks into phases in a waythat the phase of a task cannot begin before its previousphase. This is done by assigning offsets and local deadlinesto each phase. This construction cancels the phenomenonof jitter between threads. For parallel threads, this leads toreleasing all threads at the same time.We present a two-steps method to schedule parallelizablephases with a semi-partitioned approach and provide threealgorithms to assign local deadlines to phases. We testthe efficiency of each of the three algorithms and we com-pare them by simulation. One of this algorithms uses EDFuniprocessor simulations to assign local deadlines and localoffsets to phases. For those kinds of tasks, Leung et al. [11] proved that a schedulability test consists in verifying thatall tasks meet their deadlines when scheduled EDF in thetime interval [ O min ,O max  + 2 P  ], where  O min  and  O max  arerespectively the minimum and maximum phase offsets and P   the least common multiple of the task periods.We provide new schedulability results for EDF uniproces-sor scheduling of periodic asynchronous tasks. As in [2], we improve the popular study interval [ O min ,O max  + 2 P  ] [11] with two new schedulability tests.For the first one, we extend the result of  [2] by proving that the feasibility interval of EDF can be restricted to the firstidle time after  O max + P   (if any) or to  O max +2 P   otherwise.The second method provides a sufficient feasibility con-dition for periodic asynchronous tasks scheduled by EDFbased on a modification of the demand bound function. Weshow that we must only consider jobs having release timesand absolute deadlines in the time interval [ O max + P,O max +2 P  ] w.r.t to the corresponding scenario with offsets. 1.2 Organization of the paper The remainder of this paper is organized as follows. Sec-tion 2 describes our model and introduces some definitions.Section 3 presents a state of the art for parallel schedul-ing. Section 4 presents our semi-partitioned approach. Wethen focus on the local deadline assignment problem in Sec-tion 5 where we present three different methods to allocatelocal deadlines to phases. Section 6 presents new results onuniprocessor asynchronous systems. Section 7 presents thesimulation results and performance evaluations of the algo-rithm defined in Section 6. Finally, we conclude in Section 8. 2. CONCEPTS AND NOTATIONS In this work we consider a platform of   m  unit-capacityidentical processors. In the next subsections we present our T  i D i s 1 i φ 1 i C  1 , 1 i s 2 i  = s 1 i  + f  1 i φ 2 i C  2 , 3 i C  2 , 3 i C  2 , 3 i s 3 i  = s 2 i  + f  2 i φ 3 i C  3 , 1 i s 4 i  = s 3 i  + f  3 i φ 4 i C  4 , 2 i C  4 , 2 i s 4 i  + f  4 i  = D i Time Figure 1: Representation of a Multi-Phase Multi-Thread Task Multi-Phase Multi-Thread Task model and some definitionsand notations used in this paper. 2.1 Multi-Phase Multi-Thread Task model We define a task set  τ   =  { τ  1 ,...,τ  n } , composed of   n periodic multi-phase multi-thread tasks. A periodic multi-phase multi-thread task  τ  i  (Figure 1) is characterized by the4-tuple  τ  i  = ( O i ,D i ,T  i , Φ i ) where •  O i  is the  first arrival time   of   τ  i , i.e., the time of the firstactivation of the task since the system initialization. •  D i  is the  relative deadline   of   τ  i , i.e., the time by whichthe current instance of the task has to finish its execu-tion relatively to its arrival time. •  T  i  is the  period  , i.e., the exact inter-arrival time be-tween two successive activations of   τ  i . •  Φ i  is a vector of the   i  phases of   τ  i  such as Φ i  =  φ 1 i ,...,φ  i i   .A phase  φ ji  is characterized by a 5-tuple  φ ji  = ( C  ji ,s ji ,f  ji  ,v ji ,γ  ji )where •  C  ji  is the total WCET of the phase. •  s ji  is the relative arrival offset of the phase, i.e., foreach arrival time  t  of the task  τ  i , the phase will bereleased at time  t  +  s ji . •  f  ji  is the relative deadline of the phase. •  v ji  ≤  m  is the number of threads in which the phase isdecomposed. •  γ  ji  = ( γ  j, 1 i  ,γ  j, 2 i  ,...,γ  j,mi  ) is a vector that representsthe job parallelism costs. A phase  φ ji  parallelized on  k processors executes for  C  j,ki  =  C  ji k  (1 +  γ  j,ki  ) on the  k processors.Note that for each phase  φ ji :  s ji ,  f  ji  and  v ji  will be assignedby one of our algorithms in Section 5.In this paper, we constraint the model using the followingrelations: •  The deadline of each task is less than or equal to itsperiod,  ∀ i,D i  ≤  T  i , also referred to as the  constrained deadline   model.  •  s 1 i  = 0, i.e., the arrival time of the first phase of thetask corresponds to the arrival time of the task itself. • ∀  j >  1 ,s ji  =  s j − 1 i  + f  j − 1 i  , i.e., the relative arrival offsetof a phase is larger than the deadline of the previousphase. In other words, we solve the precedence con-straint between successive phases using relative arrivaloffsets and local deadlines. •  s  i i  +  f   i i  =  D i , i.e., the deadline of the last phasecorresponds to the deadline of   τ  i . •  All tasks are independent of each other. This meansthat no mutual exclusion can exist between two tasks. •  We follow the ideas of the Fork-Join task model since: –  we impose a sequence of sequential and parallelphases. Each phase  φ ji  with  j  is an odd numberwill be a sequential phase. –  we impose that each task has to start and finishwith a sequential phase. Note that   i  is then anodd number. 2.2 Definitions for our model •  We will say that a phase cannot be parallelized if theparallelization cost  γ  j,ki  is equal to  ∞  for all values of the vector. •  C  i def  =   i j =1 C  ji  is the total WCET of the task  τ  i . •  C  i, Pardef  =   i j =1 | γ  j,vjii   = ∞ C  ji  is the WCET part of thetask  τ  i  which can be parallelized (set to 0 if the taskcannot be parallelized). •  u i def  =  C  i T  i is the utilization of the task  τ  i . •  U   def  =  ni =1 u i  is the total system utilization. •  λ i def  =  C  i min( T  i ,D i )  is the density of the task  τ  i . •  λ i,Par def  =  C  i,Par min( T  i ,D i )  is the density of parallel phases of the task  τ  i . Definition 1. Phase parallelism   A phase  φ ji  is executed inparallel if it is executed on more than one processor and if all its threads are executed in the same time interval [ t  + s ji ,t + f  ji  ] where  t  represents the time when a job of the task τ  i  is released.This definition of parallelism is weaker than the populardefinition of parallelism which requires that all threads areexecuted at the same time for all their durations (i.e. theGang Model). In our model, the phases have to be executedin the same time interval (not necessarily at the same time). 2.3 Definitions for uniprocessor systems •  A feasibility interval for EDF [11] with U   ≤  1 is [0 ,O max +2 P  ] where  O maxdef  = max { O 1 ,...,O n }  and  P   denotesthe least common multiple (LCM) of all the periods intask set  τ  :  P   def  = lcm { T  1 ,...,T  n } . •  An idle time  t  is a time such that all requests releasedstrictly before  t  have completed their execution beforeor at time  t •  DBF ∗ ( t ) (Demand Bound Function) is the amount of processing time required by all tasks, whose releasetimes and absolute deadlines are in the time interval[0 ,t ] in a given offset scenario. 3. STATE OF THE ART •  Han et al. [8] have proved the NP-Completeness for the problem of assigning fixed priority to parallel tasks.An heuristic algorithm is also proposed, where thenumber of processors assigned to the job is chosen bythe scheduler. •  Manimaran et al. [14] defined a way to improve the benefits of task parallelism by an offline scheduling al-gorithm followed by a run-time scheduling algorithmwith tasks that can be added dynamically to the sys-tem. A task can be split into parallel subtasks(phases)that have to start at the same time. The offline sched-uler is a non-preemptive EDF algorithm where theminimum parallelism is used when a deadline cannotbe met. The run-time scheduler plays on the degree of parallelism of a task to avoid run-time anomaly (whichmay cause some of the schedulable tasks sets to be nonschedulable). By simulation, the algorithm is shown tobe more efficient than the non-parallel version of non-preemptive EDF algorithm. •  Collette et al. [3] use a malleable tasks assignment model and introduce the notion of work-limited par-allelism. A cost is assigned to the parallelism, so thegain in performance of a task being executed on  j  + 1processors instead of   j  will be higher than or equal tothe gain in performance of a task being executed on  j  + 2 instead of   j . They propose a scheduling algo-rithm using the notion of a minimal required numberof processors needed to execute a job and provide anutilization bound for this kind of system. •  Lakshamanan et al. [10] use a moldable task assign- ment model on a basic Fork-Join task model, wherea task must begin and end with a single sequentialthread (no parallel part at the beginning and at theend of a task). They found a worst task set scenariofor a fork-join structure such that the system is notschedulable with the smaller utilization, and they alsofound a best task set scenario such that the system isfeasible with the highest utilization. The article givealso a transformation model to the tasks to minimizethe Fork-Join structure as much as possible by using astretch transformation. Finally a deadline-monotonicpartitioning algorithm is used to provide a resourceaugmentation bound of 3 . 42, which means that anytask set that is feasible on  m  unit speed processorscan be scheduled by their algorithm on  m  processorsthat are 3 . 42 times faster. •  Berten et al. [1] introduce a specific multi-thread par- allel task model and provide a necessary schedulabilitytest and integrate precedence constraints in their cal-culation.  •  Saifullah et al. [15] proposed a model where a task can contain many segments, that can run on an arbitrarynumber of processors. Playing on the slack (the slackis defined as the difference between the deadline andthe WCET of the task) of a task to find local deadlinesand offsets, a decomposition of task into a set of phases(corresponding to the different segments of the task) isproposed. The authors prove a resource augmentationbound of 2 . 62 in a global scheduling and of 3 . 42 in apartitioned scheduling when using this decompositionmethod. In this paper, a DAG model is also analyzed.To the best of our knowledge, no study has been carriedout on semi-partitioned approaches applied to parallel tasks. 4. SCHEDULING The semi-partitioned scheduler will work in two steps.First it will allocate as many tasks as possible to the pro-cessors using a partitioned approach (step  S1 ). After thefirst step, the scheduler will use another algorithm to splitthe tasks into phases assigned to a subset of processors (step S2  ). In doing so, our approach dominates the partitionedapproach. All the results of the partitioned approach can bereused for step  S1 . 4.1 The first step  ‘S1’  The goal of this step is to use a partitioned approach to al-locate as many tasks as possible to the processors. The aimis to avoid job migration as much as possible and use paral-lelism when the partitioned approach fails to assign a task.We use the Worst-Fit heuristic [12, 7] for this step: tasks are allocated sequentially in such a way that the remain-ing processor utilization is maximized. At the end of thisheuristic, the remaining tasks that could not be assigned toa single processor, will reduce their WCET by parallelizingtheir phases. To improve the opportunities for paralleliza-tion in the second step, we will assign the less interestingtasks in term of parallelism in the first step. For this, wewill classify the tasks in three levels. L1  This level is composed of the tasks that cannot be par-allelized, which means that they have only one phase,composed of one non parallelizable thread. These tasksare sorted by decreasing values of their density, as thisoffers the best performance according to [13]. L2   This level is composed of the tasks that respect theequation:  C  i  =   i j =1 C  ji  ≤  D i . These are the tasksthat can be scheduled on one processor. At this levelwe will sort the tasks by increasing value of   C  i,Par .In this way, the tasks that cannot be scheduled in thefirst step, will be those that are the most interestingto parallelize. L3   This level is composed of the tasks that respect theequation:  C  i  =   i j =1 C  ji  > D i . These are the tasksthat must be parallelized in order to respect their dead-line, i.e. the partitioning algorithm cannot assign atask from  L3   to only one processor. Knowing that thegoal of step  S1  is to assign as many tasks as possiblebefore the second step, the tasks of   L3   must be as-signed last in step  S1 . These tasks will be consideredaccording to their need for parallelism. The need inparallelism is the density of a sequential task  τ  i  thatneeds to be parallelized so that the sequential part of  τ  i  can be scheduled on one processor. Mathematicallythis need is represented by  C  i − D i min( T  i ,D i ) . If this value isnegative or equal to zero, that means that the taskcan be scheduled on one processor without being par-allelized.The goal of this step is to assign as many tasks as possiblewith a partitioned approach, and keeping the tasks that arethe most interesting to parallelize for step  S2  . That’s whylevel  L1  will be the first to be assigned with the Worst-Fitheuristic, level  L2   the second, and level  L3   the third. Oncethere are no more tasks that can be entirely assigned to oneprocessor, the algorithm will enter step  S2  . 4.2 The second step  ‘S2’  In this step we allocate the remaining tasks, i.e. taskswhich are not allocated in the first step. If there are stillsome tasks remaining after level  L1 , it means that there arestill some tasks that cannot be parallelized, we will allocatethese tasks using the portioned semi-partitioned schedulingmethod proposed by George et al. [5]. In practice, if a job cannot be executed on a single processor, it is portioned anddifferent parts of the task are executed on different proces-sors in a way that the number of migrations is limited. Weuse their method for the tasks defined in their article (taskswithout parallelism) and we introduce new methods that usethe benefits of parallelism for the tasks that can be paral-lelized. For the other tasks, we first try to assign those thatmay need the most processors, i.e. those of level  L3   beforethose of level  L2  . We also begin by sorting the processorsby decreasing value of density.  Remark. Knowing that the cost of a  Fork-Join   operation, of a mi-gration between processor, and the cost of parallelism, wewill try in this method to parallelize tasks as less as possible.In our method, when a parallelizable task τ  i  cannot be sched-uled on one processor  P  , we will decrease the WCET of   τ  i that will be assigned to  P   by parallelizing its phases. Toensure that the threads of the different phases will executesin the same time interval, we must define local offsets andlocal deadlines to each phase. For this, we have defined 3methods of assigning local deadlines that will be describedin next Section. In this method, we also use the notion of task  τ  temp i  that is defined as follows : Definition 2.  The task  τ  temp i  of a task  τ  i  represents thebiggest part of the task  τ  i  that will be assigned to a singleprocessor, and is constructed by taking the first threads of each phase of   τ  i .Figure 2 illustrates the concept of task  τ  temp i  . The part of algorithm representing step 2 is defined in Algorithm 1.  Remark. The method to assign the threads of a task  tau i  that arenot in  τ  temp i  (in Algorithm 1 line 16 and 18) just have torespect the following implicit rules: two threads of a samephase can not be assigned to a same processor. Task Transformation An important choice has to be made in the way of splittingthreads and assigning them local deadlines and offsets. Since  Algorithm 1:  Step 2 Require:  Remaining tasks of Step 11: assign the tasks of   L1  with the method defined in [5] 2:  for  Each task  τ  i  taken sequentially and  τ  i  can still beparallelized  do 3:  for  Each parallelizable phase  φ ji  of task  τ  i  do 4: Compute the gain in WCET of the phase. A phaseparallelized on  k  processors has a gain in WCETequal to ( C  i,j k  (1 +  γ  j,ki  ) − C  i,j k +1 (1 +  γ  j,ki  ))5: (a negative gain means that the task can no morebe parallelized)6:  end for 7: Let  j  be the identifier of the phase with the biggestgain in WCET.8:  if   the gain in WCET of   φ ji  is a negative value  then 9: The task can no more be parallelized, the systemis thus not schedulable10:  else 11: assign a new processor to phase  φ ji .12: Update the thread of phase  φ ji  with the newcreated thread and with the new cost of parallelism.13: Create task  τ  temp i  of   τ  i .14: Use a method of local deadline and local offsetallocation on  τ  temp i 15:  if   The task  τ  temp i  can be allocated to oneprocessor  x  then 16:  if   the threads of   τ  i  that are not in  τ  temp i  can beassigned to the other processors  then 17: assign the threads of   τ  i  that are in  τ  temp i  toprocessor  x 18: assign the remaining threads of   τ  i  to otherprocessors.19:  return  true // the system is schedulable20:  end if  21:  end if  22:  end if  23:  end for 24:  return  false // the system is not schedulable τ  tempi T  i D i s 1 i φ 1 i C  1 , 1 i s 2 i  = s 1 i  + f  1 i φ 2 i C  2 , 3 i C  2 , 3 i C  2 , 3 i s 3 i  = s 2 i  + f  2 i φ 3 i C  3 , 1 i s 4 i  = s 3 i  + f  3 i φ 4 i C  4 , 2 i C  4 , 2 i s 4 i  + f  4 i  = D i Time Figure 2: Example of temporary task  τ  tempi offsets of a phase depend on deadline of the previous phase,we can reduce this problem to just finding local deadlines.In the following, we propose three local deadline assignmentmethods that take the remaining tasks and try to assignthem sequentially. We then compare the efficiency of thosemethods in section 7. 5. “LOCALDEADLINES”ASSIGNMENTMETH-ODS In this part we propose three local deadline assignmentsto phases, and analyze their benefits and limitations. Goal.  Knowing that the costs of parallelism and migrationsare non negligible, we will try to limit the parallelism to theminimum required. The idea of minimizing the parallelismwas used in a Stretch transformation algorithm [10]. Our strategy takes the tasks that were not scheduled sequentiallyduring step 1. We try to assign these tasks in such a waythat we assign to one processor the longest part of WCETas possible.As reminder, the following three methods work for tasks τ  tempi  that means tasks composed of a set of phases withonly one thread. 5.1 Local Fair Deadline assignment This allocation gives the same deadline to each phaseequal to the ratio  C  ji C  i × D i  . The part of   D i  that was notassigned is given to the phase that is the most parallelized(the biggest  v ji ) to minimize the utilization on the biggestnumber of processors. 5.2 Minimum Local Deadline assignment This strategy aims to assign each phase  φ ji  of a sequentialtask  τ  i  in a sequential way on a processor  p . In practice: •  The offset of   φ 1 i  will be equal to the offset of   τ  i . Theinitial value of   f  1 i  =  C  1 i •  Try to assign  φ 1 i  to processor  p . •  While the phase can not be scheduled on processor  p ,increment the value of   f  1 i •  If the value of   f  1 i  is bigger than  D i  − C  i  +  C  1 i  , thatmeans that the other phases could not be scheduled:The part of   D i  that can still be given to phases thatwere not yet assigned to processor  p  is lower than thesum of   C  ji  of those phases. That means that  τ  i  is notschedulable on processor  p . •  If   φ 1 i  was assigned, try the same schema with the otherphases. 5.3 Local Deadline set to the worst case re-sponse time Theorem  1.  When a task set is schedulable with EDF on the study interval, we can assign to the local deadlines of the phases of a task, the worst response time that was experienced by this phase during the simulation. Proof.  There are two results to demonstrate: (1) withthe new deadlines, the task won’t miss any deadline and (2)the other tasks won’t miss any deadline.(1) trivial, knowing that each phase will have a deadlinelower than or equal to the deadline of the task, it willhave at least the same priority, so it will also be sched-uled.
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks