Journal of Parallel and Distributed Computing
62
, 1338–1361 (2002)doi:10.1006/jpdc.2002.1850
An Integrated Technique for Task Matching andScheduling onto Distributed HeterogeneousComputing Systems
1
Muhammad K. Dhodhi
2
Lucent Technologies, InterNetworking Systems, 1 Robbins Road, Westford, Massachusetts 01886
Email: dhohdi@lucent.com
Imtiaz Ahmad and Anwar Yatama
Department of Computer Engineering, Kuwait University, P.O. Box 5969, Safat 13060, Kuwait
andIshfaq Ahmad
Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas
Received March 29, 2000; accepted January 29, 2002
This paper presents a problemspace genetic algorithm (PSGA)basedtechnique for efﬁcient matching and scheduling of an application program thatcan be represented by a directed acyclic graph, onto a mixedmachinedistributed heterogeneous computing (DHC) system. PSGA is an evolutionarytechnique that combines the search capability of genetic algorithms with aknown fast problemspeciﬁc heuristic to provide the bestpossible solution to aproblem in an efﬁcient manner as compared to other probabilistic techniques.The goal of the algorithm is to reduce the overall completion time throughproper task matching, task scheduling, and intermachine data transferscheduling in an integrated fashion. The algorithm is based on a newevolutionary technique that embeds a known problemspeciﬁc fast heuristicinto genetic algorithms (GAs). The algorithm is robust in the sense that itexplores a large and complex solution space in smaller CPU time and uses lessmemory space as compared to traditional GAs. Consequently, the proposedtechnique schedules an application program with a comparable schedulelength in a very short CPU time, as compared to GAbased heuristics. Thepaper includes a performance comparison showing the viability andeffectiveness of the proposed technique through comparison with existingGAbased techniques.
#
2002 Elsevier Science (USA)
1
This research is funded by Kuwait University Grant EE073.
2
To whom correspondence should be addressed.
1338
07437315/02 $35.00
#
2002 Elsevier Science (USA)
All rights reserved.
Key Words:
distributed heterogeneous computing; heterogeneous processing; matching and scheduling; evolutionary computation; genetic algorithms;heuristics.
1. INTRODUCTION
A mixedmachine distributed heterogeneous computing (DHC) system generallyconsists of a heterogeneous suite of machines, highspeed networks, communicationprotocols, operating systems, and programming environments [8, 16, 24, 28, 30, 32].
DHC is emerging as a costeffective solution to highperformance computing asopposed to expensive parallel machines. For effective utilization of a suite of diversemachines in a DHC system, an application program can be partitioned into a set of tasks (program segments) represented by an edgeweighted directed acyclic graph(DAG), such that each task is computationally homogeneous and can be assigned tothe bestsuited machine in the DHC system [24, 28, 30]. The matching andscheduling problem is the assignment of the tasks of a DAG to a suite of heterogeneous machines, sequencing the order of task execution for each machinesuch that precedence relationships between the tasks are not violated, andorchestrating intermachine data transfers with the objective to minimize the totalcompletion time [6, 7, 9, 15, 19, 22–25, 28–32, 35]. In general, the problem of task
assignment and scheduling is known to be NPcomplete [10]. Because of theintractable nature of the matching and scheduling problem, new efﬁcient techniquesare always desirable to obtain the bestpossible solution within an acceptable CPUtime.Genetic algorithms (GAs) [11, 13, 17, 21], well known for their robustness, are
probabilistic techniques that start from an initial population of randomly generatedpotential solutions to a problem, and gradually evolve towards better solutionsthrough a repetitive application of genetic operators such as selection, crossover andmutation. The evolution process proceeds through generations by allowing selectedmembers of the current population, chosen on the basis of some ﬁtness criteria, tocombine through a crossover operator to produce offspring thus forming a newpopulation. The evolution process is repeated until certain criteria are met. GAs havebeen applied successfully to solve scheduling problems in a variety of ﬁelds, such as, job shop scheduling [4], sequence scheduling [33, 34], timetable problems [21],
schedule optimization [27], task scheduling and allocation onto homogeneousmultiprocessor systems [13, 17, 20], and task scheduling and matching in hetero
geneous computing environments [23, 25, 30].This paper proposes a technique based on a problemspace genetic algorithm(PSGA) [5, 26], which performs task matching, task scheduling, and intermachine
message scheduling in an integrated fashion. PSGA is an evolutionary technique thatcombines the search capability of genetic algorithms with a known fast problemspeciﬁc heuristic to provide the bestpossible solution to a problem in an efﬁcientmanner as compared to other probabilistic techniques.In a PSGA [5, 26], the chromosome is based on the problem data and all the
genetic operators are applied in the problem space, so there is no need to modify
INTEGRATED TECHNIQUE FOR TASK MATCHING AND SCHEDULING
1339
genetic operators for each application. The solution is obtained by applying a simpleand fast known heuristic to map from problemspace to solutionspace, where eachchromosome guides the heuristic to generate a different solution. Therefore, thesearch can be conducted much more efﬁciently.PSGA uses a fast heuristic to map from problemspace to solution space, thereforeit avoids disadvantages of other probabilistic approaches (such as local ﬁne tuning inthe last stage of standard GAs) and moreover, PSGA has a fast convergence rate ascompared to the standard GAs [5]. PSGAbased techniques have been applied tosolve many resource constrained combinatorial optimization problems such asdatapath synthesis [5], static task assignment in homogeneous distributed computingsystems [2] and static task scheduling onto homogeneous multiprocessors withoutconsidering the interprocessor communication cost [3].The rest of the paper is organized as follows: Section 2 formulates the problem,and Section 3 presents the related work. Section 4 explains the problemspace genetic
algorithmbased matching basic technique and the scheduling and assignmentalgorithm. Section 5 provides the experimental results, and ﬁnally Section 6
concludes the paper with some ﬁnal remarks and summarizing comments.
2. PROBLEM FORMULATION
In a DHC system, an application program is partitioned into a set of tasksmodeled by a DAG and can be represented as
G
¼ ð
T
;
5
;
E
Þ
, where
T
¼ f
t
i
;
i
¼
1
;
. . .
;
n
g
is a set of
n
tasks.
5
represents a partial order on
T
. For any two tasks
t
i
;
t
k
2
T
, the existence of the partial order
t
i
5
t
k
means that task
t
k
cannot bescheduled until task
t
i
has been completed, hence
t
i
is a predecessor of
t
k
and
t
k
is asuccessor of
t
i
.
E
is the set of directed edges or arcs. A weight
D
i
;
k
is associated witheach arc that represents the amount of data to be transferred from task
t
i
to task
t
k
inbytes.A mixedmachine distributed heterogeneous computing system consists of a set
H
¼ f
H
j
:
j
¼
0
;
. . .
;
m
1
g
of
m
independent different types of machines (includingsequential and parallel computers) interconnected by a highspeed arbitrarynetwork. The bandwidth (data transfer rate) of the links between different machinesin a DHC system may be different depending on the kind of the network. The datatransfer rate is represented by an
m
m
matrix,
R
m
m
. The estimated computationtime (ECT) of a task
t
i
on a machine
H
j
is denoted as
ECT
ij
, where 0
4
i
5
n
and0
4
j
5
m
. The
ECT
value of a task may be different on different machines dependingon the machine’s computational capability. For static task scheduling, the
ECT
value for each task–machine pair is assumed to be available a priori. An example of aDAG consisting of seven tasks adopted from [30] is shown in Fig. 1(a) and a DHC
system consisting of fully connected three heterogeneous machines is shown inFig. 1(b). We assume that the data transfer rate for each link is 1.0, hence the arcweight and the data transfer rate will be the same. The
ECT
value of each task ondifferent machines (
H
0
–
H
2
) for this example is given in Table 1.Furthermore, we make the following assumptions:
*
Each machine in the heterogeneous system can perform communication andcomputation simultaneously.
DHODHI ET AL.
1340
*
Task execution can start only after all the data have been received from itspredecessors tasks.
*
All machines and intermachine networks are available for exclusive use of the application program.
*
Communication cost is zero when two tasks
t
i
;
t
k
are assigned to the samemachine, otherwise data have to be transferred from the machine on which task
t
i
isassigned to the machine where task
t
k
is assigned. This data transfer incurs thecommunication cost (CommCost) given by
CommCost
ð
t
i
;
t
k
Þ ¼
D
i
;
k
R
½
H
ð
i
Þ
;
H
ð
k
Þ
;
ð
1
Þ
where
D
i
;
k
is equal to the amount of data to be transferred from task
t
i
to
t
k
and
R
½
H
ð
i
Þ
;
H
ð
k
Þ
represents bandwidth (data transfer rate) of the link between themachines onto which tasks
t
i
, and
t
k
have been assigned.The problem of static task matching and scheduling in a distributed heterogeneouscomputing environment is a mapping
p
:
T

H
that assigns the set of tasks
T
onto aset of heterogeneous machines
H
, determines the start and the ﬁnish times of each
FIG. 1.
An example DAG and a mixedmachine DHC system.
TABLE 1The ECT Values of the Tasks for the System of Fig. 1
Task
H
0
H
1
H
2
t
1
872 898 708
t
2
251 624 778
t
3
542 786 23
t
4
40 737 258
t
5
742 247 535
t
6
970 749 776
t
7
457 451 15
INTEGRATED TECHNIQUE FOR TASK MATCHING AND SCHEDULING
1341
task, and determines the start and the ﬁnish times of each intermachine datatransfer so that precedence constraints are maintained and the schedule length (
SL
)that is, the overall program completion time, given by Eq. (2) is minimized.
SL
¼
max
f
F
0
;
F
1
;
. . .
;
F
m
1
g
;
ð
2
Þ
where
F
j
is the overall ﬁnish time of machine
H
j
;
j
¼
0
;
. . .
;
m
1. The ﬁnish timeincludes the computation time, the communication time and the waiting timebecause of the precedence constraints. This is an intractable combinatorialoptimization problem with conﬂicting constraints.
3. RELATED WORK
Task matching and scheduling techniques [6, 7, 9, 15, 19, 22–32, 35] can be
categorized into optimal selection theorybased approaches [9, 31], graphbased
approaches [6, 22], genetic algorithmbased techniques [23, 25, 30] and other
heuristics [6, 7, 19, 29, 35]. Most of the work has been carried out to ﬁnd a near
optimal solution. We describe a levelized mintime (LMT) scheduling heuristicpresented in [15] and three evolutionary techniques based on genetic algorithms thathave been developed by Wang
et al.
[30], Singh
et al.
[25] and Shroff
et al.
[23].Iverson
et al.
[15], proposed a twophase approach called LMT heuristic toaccount for both precedence constraints and variable execution times of each task ondifferent machines. During the ﬁrst phase, the socalled level sorting is used to obtainnonprecedence constrained subtasks in each level. In the second stage, analgorithm called MinTime is applied for each level. In this stage for each task ina level a search is carried out for a processor that does not have a task assigned to itand the summation of task’s execution time and the transfer time taken by all therequired data for this task is minimum. The task is then assigned to that processor.In [30], a GAbased approach is used to solve the optimization problem underdiscussion. A predeﬁned number of chromosomes were generated for the initialpopulation, which also includes a chromosome obtained by using the LMT heuristic[15]. Each chromosome is represented by the twotuple
h
mat
;
ss
i
, where
mat
is thematching string and
ss
is the scheduling string. When generating a chromosome, anew matching string is obtained by assigning each task to a machine randomly.While for the scheduling string the SPDAG is ﬁrst topologically sorted to form abasic scheduling string, which later on is mutated a random number of times togenerate the
ss
vector. The standard GA approach as described in [11] is then usedfor obtaining next generations. For smallscale problems, multiple optimal solutionsobtained in reasonable CPU time, but the CPU times taken by the large problemsare extremely large.In [25], a search for a better set of parameters for the GAbased algorithm iscarried out. Each parameter is varied over some range, keeping other parametersconstant. The objective is to come up with such a parameter set for which the ﬁtnessof the chromosome improves. Fitness of the chromosome in this case is deﬁned asthe completion time of the last task. It is observed that window and linear scaling,
DHODHI ET AL.
1342