Games & Puzzles

Energy-Aware Fast Scheduling Heuristics in Heterogeneous Computing Systems

Energy-Aware Fast Scheduling Heuristics in Heterogeneous Computing Systems
of 7
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  Energy-Aware Fast Scheduling Heuristics in Heterogeneous Computing Systems Cesar O. Diaz, Mateusz Guzek, Johnatan E. Pecero, Gregoire Danoy and Pascal Bouvry CSC Research Unit, University of Luxembourg, Luxembourg {  firstname.lastname } Samee U. Khan  Department of Electrical and Computer Engineering North Dakota State University, Fargo, ND 58108  { } ABSTRACT  In heterogeneous computing systems it is crucial to sched-ule tasks in a manner that exploits the heterogeneity of the resources and applications to optimize systems perfor-mance. Moreover, the energy efficiency in these systems isof a great interest due to different concerns such as opera-tional costs and environmental issues associated to carbonemissions. In this paper, we present a series of srcinal lowcomplexity energy efficient algorithms for scheduling. Themain idea is to map a task to the machine that executes it  fastest while the energy consumption is minimum. On the practical side, the set of experimental results showed that the proposed heuristics perform as efficiently as related ap- proaches, demonstrating their applicability for the consid-ered problem and its good scalability. KEYWORDS:  Heterogeneous computing systems, en-ergy efficiency, scheduling, optimization 1. INTRODUCTION Modern-day computing platforms such as grid or cloudcomputing are composed of many new features that enablesharing, selection, and aggregation of highly heterogeneousresources for solving large scale and complex real prob-lems. All these heterogeneous computing systems (HCS)are widely used as a cheap way of obtaining powerful par-allel and distributed systems. However, the required electri-cal power to run these systems and to cool them is of a greatinterest due to different concerns. This results in extremelylarge electricity bills, reduced system reliability and envi-ronmental issues due to carbon emissions [1]. Therefore,energy efficiency in HCS is the great interest.HCS comprises different hardware architectures, operatingsystems and computing power. In this paper, heterogene-ity refers to the processing power of computing resourcesand to the different requirements of the applications. Totake advantage of the different capabilities of a suite of het-erogeneous resources, a scheduler commonly allocates thetasks to the resources and determines a date to start the ex-ecution of the tasks. In this paper, we assume that energy isthe amount of power used over a specific time interval [2].For each task the information on its processing time and thevoltage rate of the processor to execute one unit of time issufficient to measure the energy consumption for that task.In this context, we additionally promote the heterogeneitycapability of the computing system to efficiently use the en-ergy of the system [3–5]. The main idea is to match eachtask with the best resource to execute it, that is, the resourcethat optimizes the completion time of the task and executesit fastest with minimum energy.The main objective of our work is to contribute to the effi-cient energy consumption in HCS. This target is achievedby providing a new set of scheduling algorithms. Thesealgorithms take advantage of the resource capabilities andfeature a very low overhead. The algorithms are based onlist scheduling approaches and they are considered as batchmode dynamic scheduling heuristics [6]. In the batch mode,the applications are scheduled after predefined time inter-vals.As a part of this work, we compare the proposed algorithmsby analyzing the results of numerous simulations featuringhigh heterogeneity of resources, and/or high heterogeneityof applications. Simulations studies are performed to com-pare these algorithms with the min-min algorithm [7,8]. Weused min-min as a basis of comparison because it is one of the most used algorithm in the literature in the context of HCS, and it has a good performance behavior [3, 8]. Wehave considered the minimization of the makespan (i.e., themaximum completion time) and energy as a basis of com- 978-1-61284-383-4/11/$26.00 ©2011 IEEE 478  parison. Most of related work consider only the makespanas a performance criterion. The goal has been to find a fea-sible schedule such that the total energy consumption overthe entire time horizon is as small as possible. It givesinsight into effective energy conservation, however, it ig-nores the important aspect that users typically expect goodresponse times for their job [9]. In this context, we alsocompare these heuristics based on the  flowtime . The flow-time of a task is the length of the time interval between therelease time and completion time of the task. Flowtime iscommonly used as a quality of service measure that allowsguaranting good response times. The large set of exper-imental results shown that the investigated heuristics per-form as efficiently as the related approach for most of thestudied instances although their low running time, showingtheir applicability for the considered scheduling problem.The remainder of this paper is organized as follows. Thesystem, energy and scheduling models are introduced inSection 2. Section 3 briefly reviews some related ap-proaches. We provide the resource allocation and schedul-ing heuristics in Section 4. Experimental results are givenin Section 5. Section 6 concludes the paper. 2. MODELS2.1. System and Application Models We consider a HCS composed of a set of   M   = { m 1 ,...,m m }  machines. We assume that the machinesare incorporated with an effective energy-saving mecha-nism for idle time slots [10]. The energy consumption of an idle resource at any given time is set using a minimumvoltage based on the processor’s architecture. In this pa-per, we consider two voltage levels:  maximum , when theprocessor is performing work or it is in an active state and idle  level, when processor is in an idle state. We considera set of independent tasks  T   =  { t 1 ,...,t n }  to be executedonto the system. The tasks are considered as an indivisibleunit of workload. Each task has to be processed completelyon a single machine. The computational model we con-sider in this work is the ETC model. In this model, it isassumed that we dispose of estimation or prediction of thecomputational load of each task, the computing capacity of each resource, and an estimation of the prior load of the re-sources. Moreover, we assume that the  ETC   matrix of size t  ×  m  is known. Each position  ETC  [ t i ][ m j ]  in the ma-trix indicates the expected time to compute task   t i  on ma-chine  m j . This model allows to represent the heterogeneityamong tasks and machines. Machine heterogeneity evalu-ates the variation of execution times for a given task acrossthe computing resources.  Low machine  heterogeneity rep-resents computing systems composed by similar comput-ing resources (almost-homogeneous). On the contrary,  highmachine  heterogeneity represents computing systems inte-grated by resources of different type and capacity power.Forthecaseoftaskheterogeneity, itrepresentsthedegreeof variation among the execution time of tasks for a given ma-chine.  Low task   heterogeneity models the case when tasksare quasi homogeneous (i.e., when the complexity, and thecomputational requirement of tasks are quite similar), theyhave similar execution times for a given machine.  Hightask   heterogeneity describes those scenarios in which dif-ferent types of applications are submitted to execute in theheterogeneous computing system ranging from simple ap-plications to complex programs which require large compu-tational time to be performed. Additionally, the ETC modelalso tries to reflect the characteristics of different scenariosusing different ETC matrix consistencies defined by the re-lation between a task and how it is executed in the machinesaccording to heterogeneity of each one [8]. The scenariosare  consistent  ,  semi-consistent   and  inconsistent  . The con-sistent scenario models the SPMD applications executingwith local input data, that is if a given machine  m j  exe-cutes any task   t i  faster than machine  m k , then machine  m j executes all tasks faster than machine  m k . The inconsis-tent scenario representsthe most genericscenario for a HCSsystem that receives different tasks, from easy applicationsto complex parallel programs. Finally, the semi-consistentscenario models those inconsistent systems that include aconsistent subsystem. 2.2. Energy Model The energy model used in this work is derived from thepower consumption model in digital complementary metal-oxide semiconductor (CMOS) logic circuitry. The powerconsumption of a CMOS-based microprocessor is definedto be the summation of capacitive power, which is dissi-pated whenever active computations are carried out, short-circuit and leakage power (static power dissipation). Thecapacitive power ( P  c ) (dynamic power dissipation) is themost significant factor of the power consumption. It is di-rectly related to frequency and supply voltage, and it is de-fined as [11]: P  c  =  AC  eff  V   2 f,  (1)where  A  is the number of switches per clock cycle,  C  eff  denotes the effective charged capacitance,  V    is the supplyvoltage, and  f   denotes the operational frequency. The en-ergy consumption of any machines in this paper is definedas: E  c  = n  i =1 AC  ef  V   2 i  fETC  [ i ][ M  [ i ]] ,  (2) 479  where  M  [ i ]  represents a vector containing the machine  m j where task   t i  is allocated,  V   i  is the supply voltage of themachine  m j . The energy consumption during idle time isdefined as: E  i  = m  j =1  idle jk ∈ IDLES  j AC  ef  Vmin 2 j I  jk ,  (3)where  IDLES  j  is the set of idling slots on machine  m j , V   minj  is the lowest supply voltage on  m j , and  I  jk  is theamount of idling time for idle jk . Then the total energy con-sumption is defined as: E  t  =  E  c  + E  i  (4) 2.2. Scheduling Model The scheduling problem is formulated as follows. Formally,given the heterogeneous computing systems composed of the set of   m  machines, and the set of   n  tasks. Any task is scheduled without preemption from time  σ ( t i )  on ma-chine  m j , with an execution time  ETC  [ t i ][ m j ] . The task  t i  completes at time  C  i  equals to  σ ( t i )  +  ETC  [ t i ][ m j ] .The objective is to minimize the maximum completion time( C  max  =  max ( C  i ) ) or makespan with minimum energy E  t  used to execute the tasks. Additionally, in this paperwe also aim to guarantee good response times. In this con-text, response time is modeled as flowtime. As we alreadymentioned, the flowtime of a task is the length of the timeinterval between the completion time and release time. Weconsider that the release time is 0 for all the tasks. Hence,the flowtime represents the sum of completion time of jobs,that is,  ni =1 C  i , the aim is to minimize  ni =1 C  i . Let usmention that the tasks considered in our study are not asso-ciated with deadlines, which is the case for many computa-tional systems. 3. RELATED WORK The job scheduling problem in heterogeneous computingsystems without energy consideration has been shown tobe NP-complete [7]. Therefore, a large number of heuris-tics have been developed. One of the most widely usedbatch mode dynamic heuristic for scheduling independenttasks in the heterogeneous computing system is the min-min algorithm. It begins by considering that all tasks arenot mapped. It works in two phases. In the first phase, thealgorithm establishes the minimum completion time for ev-ery unscheduled job. In the second phase, the task with theoverall minimum expected completion time is selected andassigned to the corresponding machine. The task is thenremoved from the set and the process is repeated until alltasks are mapped. The run time of min-min is  O ( t 2 m ) .Some strategies for energy optimization in HCS systems byexploiting heterogeneity have been proposed and investi-gated. In [12], the authors investigated the tradeoff betweenenergy and performance. The authors proposed a methodfor finding the best match of the number of cluster nodesand their uniform frequency. However, the authors did notconsider much about the effect of scheduling algorithms.The authors in [5] introduced an online dynamic powermanagement strategy with multiple power-saving states.Then, they proposed an energy-aware scheduling algorithmto reduce energy consumption in heterogeneous comput-ing systems. The proposed algorithm is based on min-min.In [3] the authors considered the problem of schedulingtasks with different priorities and deadline constrained inan ad hoc grid environment with limited battery capacitythat used DVS for power management. In this context, theresource manager needed to exploit the heterogeneity of thetasksand resourceswhilemanaging theenergy. Theauthorsintroduced several online and batch mode dynamic heuris-tics and they showed by simulation that batch mode heuris-tics performed the best. These heuristics were based onmin-min. However, they required significantly more time. 4. PROPOSED ALGORITHMS In the scheduling problem on heterogeneous computingsystems, near-optimal solutions would suffice rather thansearching for optimality for most practical applications.Therefore, we investigate low-cost heuristics with goodquality schedules and low energy consumption. Theseheuristics are based on the min-min algorithm. However,we took special care to decrease the computational com-plexity. The main idea is to avoid the loop on all the pairs of machines and tasks in the min-min algorithm correspondingto the first phase of the heuristic. One alternative is to con-sider one task at a time, the task that should be schedulednext. For that we propose to order the tasks by a predefinedpriority, so that they can be selected in constant time. Oncethe order of tasks is determined, in the second phase thatwe call the mapping event, we consider assigning the task to the machine that minimizes its expected completion timeas well as its execution time. This modification lies essen-tially with the calculation of a mapping event of an appli-cation to a machine. We propose a weighted function, thatwe named the  score function  SF  ( t i ,m j )  (see Eq. 5), thattries to balances both objectives. The rational is to mini-mize the workload of machines and intrinsically minimizethe energy used to carry out the work. This is the mainprinciple of the scheduling heuristics we are interested inthis work. However, the main difference among them is the 480  priority used to construct the list. To optimize the flowtimewe apply the classical shortest processing time rule on eachmachine after the schedule is constructed. 4.1. Low-cost heuristics Algorithm 1 depicts the general structure of the proposedheuristics. It is based on classical list scheduling algorithmsfor what well founded theoretical performance guaranteeshave been proven [7]. The heuristics start by computing the Priority  of each task according to some objective (line 1).Hence, we compute the sorted list of tasks (line 2). The or-der of the list is not modified during the execution of theheuristics. Next, the heuristics proceed to allocate the tasksto the machines and determine the starting date for them(main loop line 3). One task at a time is scheduled. Theheuristics always consider the task   t i  at the top of the list(highest priority) and remove it from that (line 4). A scorefunction SF  ( t i ,m j )  for the selected task is evaluated on allthe machines (lines 5 and 6). Then each heuristic selectsthe machine for which the value of the score function is op-timized for task  t i  and we schedule the task on that machine(line 8). It corresponds to the second phase of the min-minheuristic, with a different evaluation function. In the case of min-min, the evaluation function is only based on the com-pletion time of the selected task on all the machines. Then,the algorithm selects the machine that gives the minimumcompletion time for that task and the task is assigned onthat machine. The list is updated (line 9) and we restart themain loop. Once all task have been scheduled we apply theshortest processing time rule on all machines to optimizethe flowtime (lines 11 and 12).The score of each mapping event is calculated as in equa-tion 5. For each machine  m j , SF  ( t i ) =  λ ·  C  i  mk =1 C  ik + (1  − λ )  ·  ETC  [ t i ][ m j ]  mk =1 ETC  [ t i ][ m k ] , (5)where  mk =1 C  ik  is the sum of the completion time of thetask   t i  over all machines and  mk =1 ETC  [ t i ][ m k ]  is thesum of the expected time to complete of task   t i  over allmachines. The first term of equation 5 aims to minimizethe completion time of the tasks  t i , while the second termaims to assign the task to the fastest machine or the ma-chine on which the task takes the minimum expected timeto complete. The heuristics differ on the objective used tocompute the priorities. For that,  maximum  (Algorithm 2), minimum  (Algorithm 3) and  average  (Algorithm 4) com-pletion time of the task are used as if it was the only task to be scheduled on the computing system. Let’s note that itcorresponds to the execution time ( ETC  [ t i ][ m j ] ) of task   t i on machine  m j . The name of the heuristics are MaxMaxMin (Algorithm 2) (maximum completion time of tasks, Algorithm 1  Pseudo-code for the low-cost heuristics 1:  Compute  Priority  of each task   t i  ∈  T   according to somepredefined objective; 2:  Build the list L of the tasks sorted in decreasing order of Pri-ority; 3:  while  L  =Ø do 4:  Remove the first task   t i  from  L ; 5:  for  each machine  m j  do 6:  Evaluate Score Function SF( t i ); 7:  end for 8:  Assign  t i  to the machine  m j  that optimize the Score Func-tion; 9:  Update the list L; 10:  end while 11:  for all  machine  m j  do 12:  Sort the tasks  t k  on  m j  in increasing ETC[ t k ][ m j ]; 13:  end for sorted in decreasing order of its maximum completion time,and scheduled based on the minimum completion time).MinMax Min (Algorithm 3) (minimum completion time of tasks, sorted in decreasing order of its minimum comple-tion time, and scheduled based on the minimum completiontime). MinMean Min (Algorithm 4) (average completiontime of tasks, sorted in decreasing order of its average com-pletion time, and scheduled based on the minimum comple-tion time). Algorithm 2  Pseudo-code for heuristic (MaxMax min) 1:  for all  task   t i  do 2:  for  each machine  m j  do 3:  Evaluate CompletionTime( t i ,  m j ); 4:  end for 5:  Select the maximum completion time for each task   t i ; 6:  end for Algorithm 3  Pseudo-code for heuristic (MinMax min) 1:  for all  task   t i  do 2:  for  each machine  m j  do 3:  Evaluate CompletionTime( t i ,  m j ); 4:  end for 5:  Select the minimum completion time for each task   t i ; 6:  end for Algorithm 4  Pseudo-code for heuristic (MinMean min) 1:  for all  task   t i  do 2:  for  each machine  m j  do 3:  Evaluate CompletionTime( t i ,  m j ); 4:  end for 5:  Compute the average completion time for each task   t i ; 6:  end for 481  4.2. Computational complexity of the heuris-tics The computational complexity of the algorithms is as fol-lows: computing the value of the priorities for the tasks andthe construction of the sorted list have an overall cost of  O ( tm log t ) . The execution of the main loop (line 3) inAlgorithm 1 has an overall cost of  O ( tm ) . Sorting the tasksfor all the machines takes  O ( km log k )  (line 12), where k  ≤  t . Therefore, the asymptotic overall cost of the heuris-tics is  O ( tm log t ) , which is less than one order of magni-tude to the related approaches. 5. EXPERIMENTAL EVALUATION In this section, we compare by simulations the proposedalgorithms and min-min on a set of randomly built ETCs.Table 1 shows the twelve combinations of heterogeneitytypes (tasks and machines) and consistency classificationsin the ETC model that we use in this paper. The consis-tency categories are named for the correspondent initial let-ter ( c  stands for consistent,  i  for inconsistent,  s  for semi-consistent,  lo  stands for low heterogeneity and  hi  for highheterogeneity). Hence, a matrix named  c hihi  correspondsto a consistent scenario with hi task heterogeneity and himachine heterogeneity. Table 1. Consistency and heterogeneity combinations inthe ETC model Consistency Consistent Semi-consistent Inconsistent  c lolo s lolo i loloc lohi s lohi i lohic hilo s hilo i hiloc hihi s hihi i hihi 5.1. Experiments For the generation of these ETC matrices we have usedthe coefficient of variation based method (COV) introducedin [13]. To simulate different heterogeneous computing en-vironments we have changed the parameters  µ task ,  V   task and  V   machine , which represent the mean task executiontime, the task heterogeneity, and the machine heterogene-ity, respectively. We have used the following parameters: V   task  and  V   machine  equal to 0.1 for low case respectivelyand 0.6 for high case, and  µ task  = 100 . The heterogeneousranges were chosen to reflect the fact that in real situationsthere is more variability across the execution time for differ-ent tasks on a given machine than that across the executiontime for a single task on different machines [14].As we are considering batch mode algorithms, we assumein both cases that all tasks have arrived to the system beforethe scheduling event. Furthermore, we consider that all themachinesareidleoravailableattimezero, thiscanbepossi-ble by considering advance reservation. We have generated1200 instances, 100 for each twelve cases to evaluate theperformance of the heuristics. We have generated instanceswith 512 tasks in size to be scheduled on 16 machines. Ad-ditionally, we have considered different voltages for the ma-chines. We randomly assigned these voltages to machinesby choosing among three different set. The first set con-siders  1 . 95  and  0 . 8  Volts for active state and idle state, re-spectively. The second set is  1 . 75  Volts at maximum stateand  0 . 9  Volts at idle state. Finally, the last set considers  1 . 6 Volts for active level and  0 . 7  Volts at idle level. 5.2. Results The results for the algorithms are depicted from Figure 1to 3. We show normalized values of makespan, flowtimeand energy for each heuristic against min-min for  λ -valuesin the interval [0, 1]. The normalized data were gener-ated by dividing the results for each heuristic by the max-imum result computed by these heuristics. We only showthe curves for the high task and high machine heterogeneityfor the three different scenarios which are the most signifi-cant results. The legends  m-m n mksp ,  m-m n flow  and  m-mn energy  in the figures stand for makespan, flowtime andenergy of min-min.We can observe from these figures that the proposed heuris-tics follow the same performance behavior according to thescenarios. Relative values range are biggest for the consis-tent instances than semi-consistent and inconsistent. Theresults clearly demonstrate that energy efficiency is the bestfor the consistent instances. It may be related to the fact,that the makespan has worse results. However, for value of  λ  = 0.8 the proposed heuristics can perform as well as min-min for all the three considered metrics. We can also ob-serve that the proposed algorithms can improve makespanand flowtime results for lambda for semi-consistent and in-consistent instances. Interestingly, if the instance is moreinconsistent, the new algorithms performs better. The ben-efit of exploiting the heterogeneity of the applications andresources to maximize the performance of the system andenergy is more apparent. This is mainly because these in-stances are the ones presenting the highest inconsistencyand heterogeneity. In terms of flowtime, all the heuristicsare as efficient as min-min, however, the proposed heuris-tics have lower complexity. 482
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks