Description

Optimal Task Assignment in Heterogeneous Distributed Computing Systems

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Transcript

42
1092-3063/98/$10.00 © 1998 IEEE
IEEEConcurrency
Optimal Task Assignmentin HeterogeneousDistributed ComputingSystems
exploit effective parallelism on a distrib-uted system, tasks must be properly allo-cated to the processors. This problem,
task assignment
, is well-known to be NP-hard in most cases.
1
A task-assignment algorithm seeks an assignment that opti-mizes a certain cost function—for exam-ple, maximum throughput or minimumturnaround time. However, most re-ported algorithms yield suboptimal solu-tions. In general, optimal solutions canbe found through an exhaustive search,but because there are
n
m
ways in which
m
tasks can be assigned to
n
processors,an exhaustive search is often not possi-ble. Thus, optimal-solution algorithmsexist only for restricted cases or very small problems. The other possibility isto use an
informed
search to reduce thestate space. The A* algorithm, an informed-searchalgorithm, guarantees an optimal solution,but doesn’t work for large problems be-cause of its high time and space complexity. Thus, we require a further-reduced statespace, a faster search process, or both.
Problem definition
Like other NP-hard problems, there arethree common ways to tackle task assign-ment:•
Relaxation
. You can relax some of therequirements or restrict the problem.•
Enumerative optimization
. If you can’t compromise the solution’s optimality, you can use enumerative methods, suchas dynamic programming and branch-and-bound.•
Approximate optimization
. You can useheuristics to solve the problem whileaiming for a near-optimal or goodsolution.Our goal is to assign a given task graph(see the “Related work” sidebar) to a net- work of processors to minimize the time
A distributed systemcomprising networked heterogeneous pro-cessors requires efficient task-to-processor as- signment to achieve fast turnaround time. Theauthors propose twoalgorithms based on the A* technique, which areconsiderably faster, aremore memory-efficient,and give optimal solutions. The first is a sequential algorithmthat reduces the search space. The second proposes to lower timecomplexity, by runningthe assignment algorithm in parallel, and achieves significant speedup.
Complex Distributed Systems
A
s computer networks and sequential computers advance,distributed computing systems—such as a network of het-erogeneous workstations or PCs—become an attractivealternative to expensive, massively parallel machines. To
Muhammad Kafil and Ishfaq Ahmad
The Hong Kong University of Science and Technology
.
July–September 199843
required for program completion. Weconsider this problem, also known as theallocation or mapping problem,
2
usingrelaxed assumptions—such as arbitrary computation and task-graph communi-cation requirements, and a network of heterogeneous processors connected by an arbitrary topology. We use the
taskinteractinggraph
model, in which the par-allel program is represented by an undi-rected graph: G
T
= (V
T
, E
T
), where V
T
isthe set of vertices, {
t
1
,
t
2
, ...,
t
m
}, and E
T
isa set of edges labeled by the communica-tion requirements among tasks. We canalso represent the network of processorsas an undirected graph, where verticesrepresent the processors, and the edgesrepresent the processors’ communicationlinks. We represent the interconnectionnetwork of
n
processors, {
p
1
,
p
2
, ...,
p
n
}, by an
n
×
n
link matrix
L
, where an entry
L
ij
is1, if processors
i
and
j
are directly con-nected, and 0 otherwise. We do not con-sider the case where
i
and
j
are not directly connected. We can execute a task
t
i
from the set V
T
on any one of the system’s
n
processors.Each task has an associated execution cost on a given processor. A matrix
X
gives task-execution costs, where
X
ip
is the executioncost of task
i
on processor
p
. Two tasks,
t
i
and
t
j
, executing on two different proces-sors, incur a communications cost whenthey need to exchange data. Task mapping will assign two communicating tasks to thesame processor or to two different, directly connected processors. A matrix
C
repre-sents communication among tasks, where
C
ij
is the communication cost betweentasks
i
and
j
, if they reside on two different processors. A processor’s load comprises all theexecution and communication costs as-sociated with its assigned tasks. The timeneeded by the heaviest-loaded processor will determine the entire program’scompletion time. The task-assignment problem must find a mapping of the set of
m
tasks to
n
processors that will min-imize program completion time. Task mapping, or assignment to processors, isgiven by a matrix
A
, where
A
ip
is 1, if task
i
is assigned to processor
p
, and 0 other- wise. The following equation then givesthe load on
p
: The first part of the equation is the totalexecution cost of the tasks assigned to
p
. The second part is the communica-tion overhead on
p
.
A
ip
and
A
jq
indicatethat task
i
and
j
are assigned to two dif-ferent processors (
p
and
q
), and
L
pq
indi-cates that
p
and
q
are directly connected. To find the processor with the heaviest load, you need to compute the load oneach of the
n
processors. The optimalassignment out of all possible assign-ments will allot the minimum load tothe heaviest-loaded processor.
Task assignmentusing the A* algorithm
A* is a best-first search algorithm,
3
whichhas been used extensively in artificial-intelligence problem solving. Program-mers can use the algorithm to search atree or a graph. For a tree search, it startsfrom the root, called the start node (usu-ally a null solution of the problem). Inter-mediate tree nodes represent the partialsolutions, and leaf nodes represent thecomplete solutions or goals. A cost func-tion
f
computes each node’s associatedcost. The value of
f
for a node
n
, which isthe estimated cost of the cheapest solu-tion through
n
, is computed as
f
(
n
) =
g
(
n
)+
h
(
n
), where
g
(
n
) is the search-path cost from the start node to the current node
n
and
h
(
n
) is a lower-bound estimate of thepath cost from
n
to the goal node (solu-tion). To expand a node means to gener-ate all of its successors or children and tocompute the
f
value for each of them. The nodes are ordered for search accord-ing to cost; that is, the algorithm first selects the node with the minimum ex-pansion cost. The algorithm maintains asorted list, called
OPEN
, of nodes (accord-ing to their
f
values) and always selects anode with the best expansion cost. Be-cause the algorithm always selects thebest-cost node, it guarantees an optimalsolution.For the task-assignment problem un-der consideration, •the search space is a tree; •the initial node (the root) is a null-assignment node—that is, no task isassigned as yet; •intermediate nodes are partial-assign-ment nodes—that is, only some tasksare assigned; and •a solution (goal) node is a complete-assignment node—that is, all the tasksare assigned. To compute the cost function,
g
(
n
) is thecost of partial assignment (A) at node
n
—the load on the heaviest-loaded (
p
); thiscan be done using the equation from theprevious section. For the computation of
h
(
n
), two sets
T
p
(the set of tasks that areassigned to the heaviest-loaded
p
) and
U
(the set of tasks that are unassigned at thisstage of the search and have one or morecommunication link with any task in set
T
p
) are defined. Each task
t
i
in
U
will beassigned either to
p
or any other proces-sor
q
that has a direct communication link with
p
. So, you can associate two kinds of costs with each
t
i
’s assignment: either
X
ip
(the execution cost of
t
i
on
p
) or the sumof the communication costs of all thetasks in set
T
p
that have a link with
t
i
. Thisimplies that to consider
t
i
’s assignment, we must decide whether
t
i
should go to
p
or not (by taking these two cases’ mini-mum cost). Let cost(
t
i
) be the minimumof these two costs, then we compute
h
(
n
)as.
h n t
i t U
i
( )
=
( )
∑
cost
ε
X AC A A L
ip ipi mij ip jq pq j mi mqn
q p
• +
( )
====
∑∑∑∑
≠
1111
( )
A* is a best-firstsearch algorithm,which has been usedextensively inartificial-intelligenceproblem solving.
.
44IEEEConcurrency
Applying the algorithm
Shen and Tsai
4
first used the A* algo-rithm for the task-assignment problem. They ordered the tasks considered forassignment simply by starting with task 1 at the tree’s first level, task 2 at the sec-ond, and so on. Ramakrishnan and col-leagues showed that the order in whichan algorithm considers tasks for alloca-tion considerably impacts its perfor-mance.
5
Their study indicated significant performance improvement by consider-ing, at shallow tree levels, tasks that havemore weight in the optimal-cost compu-tation. They proposed a number of heur-istics to reorder the tasks, out of whichthe
minimax sequencing
heuristic per-formed the best. Minimax sequencing works as follows: Consider a matrix
H
of
m
rows and
n
columns where
m
is thenumber of tasks and
n
is the number of processors. The entry
H
(
i
,
k
) of thematrix is given by
H
(
i
,
k
) =
X
ik
+
h
(
ν
), where
h
(
v
) is given by , where
U
is the set of unassigned tasksthat communicate with
t
i
. The minimax value,
mm
(
t
i
) of task
t
i
, is defined as
mm
(
t
i
) = min {
H
(
i
,
k
), 1
≤
k
≤
n
}. The mini-max sequence is then defined as
Π
=
τ
1
,
τ
2
, …,
τ
m
},
mm
(
τ
I
)
≥
mm
(
τ
i
+
1
),
∀
i
.
An example
Let’s illustrate the A* algorithm’s oper-ation for the assignment problem. Givena set of five tasks {
t
0
,
t
1
,
t
2
,
t
3
,
t
4
} and a set of three processors {
p
0
,
p
1
,
p
2
} (see Fig-ure 1), we give the resulting search treesusing the techniques proposed by Shenand Tsai (see Figure 2) and Ramakrish-nan and his colleagues (see Figure 3). We will refer to these algorithms as A*O (A*Original) and A*R (A* with Reordering). A search-tree node includes partial as-signment of tasks to processors, and the value of
f
(the cost of partial assignment). The assignment of
m
tasks to
n
proces-sors is indicated by an
m
-digit string,
a
0
,
a
1
,...,
a
m
–
1
, where
a
i
(0
≤
i
≤
m
– 1) rep-resents the processor (0 to
n
– 1) to whichthe algorithm has assigned the
i
th task. A partial assignment means that sometasks are unassigned; the value of
a
i
equalto X indicates that
i
th task has not beenassigned yet. Eachlevel of the tree cor-responds to a task;thus, assignment of this task to a proces-sor replaces an X value in the assignment string with someprocessor number. Node expansionmeans adding a new task assignment tothe partial assignment. Thus, the searchtree’s depth
d
equals the number of
m
tasks, and any node of the tree can havea maximum of
n
successors. The root node includes the set of allunassigned tasks XXXXX. Next, for ex-ample, in Figure 2, we consider the allo-cations of
t
0
to
p
0
(0XXXX),
t
0
to
p
1
(1XXXX), and
t
0
to
p
2
(2XXXX), by deter-mining the assignment costs at the tree’sfirst level. Assigning
t
0
to
p
0
(0XXXX)results in the total cost
f
(
n
) that is equal to30. The
g
(
n
), in this case, equals 15, whichis the cost of executing
t
0
on
p
0
. The
h
(
n
)is also equal to 15, which is the sum of theminimum execution or communicationcosts of
t
1
and
t
4
(tasks communicating with
t
0
). We similarly calculate the costsof assigning
t
0
to
p
1
(26) and
t
0
to
p
2
(24). The algorithm inserts these three nodesinto the list
OPEN
. Because 24 is the min-imum cost, the algorithm selects the node2XXXX for expansion. The algorithm expands node 2XXXXin the following manner. At the tree’s sec-ond level, the algorithm will consider
t
1
for assignment, and 20XXX, 21XXX, and22XXX are three possible assignments. The value of
f
(
n
) for 20XXX is 28, and iscomputed as follows: first, the processor with the heaviest load is selected, which is
p
0
in this case.
g
(
n
) is equal to 22, which isthe cost of executing
t
1
on
p
0
(14) plus thecost of communication between
t
1
and
t
0
(8), because they are assigned to two dif-ferent processors.
h
(
n
) is equal to 6, whichis the minimum execution or communi-cation cost of
t
2
(the only unassigned task communicating with
t
1
). We similarly compute the values of
f
(
n
) for 21XXX and22XXX. At this point, nodes 0XXXX,1XXXX, 20XXX, 21XXX, and 22XXXare in the
OPEN
list. Because node 1XXXXhas the minimum node cost, the algorithmexpands it next, resulting in nodes 10XXX,11XXX, and 12XXX. Attached to some of the nodes, thenumbers in circles show the sequence in which nodes are selected for expansion.Bold lines show the edges connectingthe nodes that lead to an optimal assign-ment. The search continues until theprocess selects the node with the com-plete assignment (20112) for expansion. At this point, because this node has acomplete assignment and the minimumcosts, it is the goal node. All assignment strings are unique.In Figure 2, the order in which thealgorithm considers tasks for assign-ment is {
t
0
,
t
1
,
t
2
,
t
3
,
t
4
} and, during thesearch for an optimal solution, 42nodes are generated and 14 are ex-panded. As Figure 3 shows, the A*R algorithm generates the minimax se-quence {
t
0
,
t
1
,
t
2
,
t
4
,
t
3
}; therefore,
t
4
isconsidered before
t
3
. You can similarly trace this example as demonstratedabove, while considering the new task order. In this case, 39 nodes are gen-erated, and 13 nodes are expanded. The optimal assignment is 20112, withthe same optimal solution cost (28). Incomparison, an exhaustive search willgenerate
n
m
= 243 nodes.
The proposedalgorithms
We’ll now describe our proposed algo-rithms. The first is a sequential algorithmthat has considerably fewer memory re-quirements than the A*O and A*R algo-rithms. The second is a parallel algorithmthat, compared with its serial counterpart,generates optimal solutions with goodspeedup.
Sequential search
The Optimal Assignment with Sequen-tial Search (OASS) algorithm (see Figure
h v
jk ij j U
( )
=
( )
∈
∑
min ,
X C
t
1
t
3
t
2
t
0
t
4
t
0
t
1
t
2
t
3
t
4
P
0
151416510
P
1
11121349
P
2
9863786574
P
0
P
1
P
2
(a) (b) (c)
Figure 1. Examples of (a) a task graph and (b) a pro-cessor network, and the (c) execution-cost matrix oftasks on various processors.
.
July–September 199845
4) uses the A* search technique, but withtwo distinct features. First, it generates arandom solution and prunes all the nodes with costs higher than this solution dur-ing the optimal-solution search. This isbecause the optimal solution cost willnever be higher than this random-solu-tion cost. Pruning unnecessary nodes not only saves memory, but also saves thetime required to insert the nodes into
OPEN
. Second, the algorithm sets the value of
f
(
n
) equal to
g
(
n
) for all leaf nodes, because for a leaf node
n
,
h
(
n
) isequal to 0. This avoids the unnecessary computation of
h
(
n
) at the leaf nodes.Figure 5 is search tree that resultsfrom using the OASS algorithm for ourexample problem. First, we used a fasterand suboptimal version of A*
6
to gener-ate a random solution. The cost of therandom solution was 29. Therefore, wediscard all nodes with a cost greater than29. As a result, OASS generates only 14nodes, while A*O produces 42 nodes
20110(49)
t
0
t
1
t
2
t
3
t
4
p
2
p
0
p
1
p
1
p
2
20111(39)20112(28)1420020(49)20021(40)20022(36)8100XX(47)101XX(51)102XX(28)120XX(31)121XX(51)122XX(29)9200XX(47)201XX(28)202XX(39)5210XX(26)211XX(41)212XX(39)122000X(38)2011X(28)2002X(28)132100X(31)2101X(35)2102X(31)10210(49)10212(35)10220(49)Final assignment(Goal)10221(32)10222(29)11101020X(38)1021X(28)1022X(28)2 16 40XXXX(30)10XXX(28)11XXX(36)12XXX(26)7 320XXX(28)21XXX(26)22XXX(30)1XXXX(26)2XXXX(24)XXXXX(0)10211(41)
Figure 2. Search tree for the example problem using A*O (42 nodes generated, 14 nodes expanded).
4
20102(38)
t
0
t
1
t
2
t
3
t
4
p
2
p
0
p
1
p
1
p
2
20112(28)20122(36)1321002(31)(Goal)21012(35)21022(36)10100XX(47)101XX(51)102XX(28)120XX(31)121XX(51)122XX(29)9200XX(47)201XX(28)202XX(39)5210XX(26)211XX(41)212XX(39)11201X0(49)201X1(39)201X2(28)8210X0(44)210X1(46)210X2(28)10202(38)10212(35)10222(29)12102X0(49)102X1(32)102X2(28)2 16 40XXXX(30)10XXX(28)11XXX(36)12XXX(26)7 320XXX(28)21XXX(26)22XXX(30)1XXXX(26)2XXXX(24)XXXXX(0)Final assignment
Figure 3. Search tree for the example problem using A*R (39 nodes generated, 13 nodes expanded).
5
.
46IEEEConcurrency
and A*R produces 39 nodes for the sameoptimal solution—20112. This algo-rithm’s efficiency clearly depends on theinitial solution’s quality.
Parallel search
The parallel algorithm aims to speed upthe search as much as possible using par-allel processing. This is done by dividingthe search tree among the
processing ele-ments
(PEs) as evenly as possible and by avoiding the expansions of nonessentialnodes—that is, nodes that the sequentialalgorithm does not expand. A. Gramaand Vipin Kumar
7
and Vipin Kumar, K.Ramesh, and V.N. Rao
8
provide usefuldiscussions of different issues in paral-lelizing the depth-first and best-first search algorithms. To distinguish theprocessors on which the parallel task-assignment algorithm is running fromthe processors in the problem domain, we will denote the former with the ab-breviation PE—in our case, the IntelParagon processor). We call our paral-lel algorithm Optimal Assignment withParallel Search (OAPS).Initially, we statically divide the searchtree based on the number of PEs
P
in thesystem and the maximum number of suc-cessors
S
of a node in the search tree. There are three ways to achieve an ini-tial partitioning:•
P < S
. Each PE expands only the ini-tial node, which results in
S
new nodes.Each PE gets one node, and the initialdivision distributes additional nodes by round-robin (RR).•
P = S
. Each PE expands only the ini-tial node, and each PE gets one node.•
P > S
. Each PE keeps expanding nodes,starting from the initial node (the nullassignment) until the list’s number of nodes is greater than or equal to
P
. Wesort the list in an increasing order of node-cost values. The first node in thelist goes to PE
1
, the second node toPE
p
, the third node to PE
2
, the fourthnode to PE
p
– 1
, and so on. Extra nodesare distributed by RR. Although thisdistribution does not guarantee that abest-cost node at the initial levels of thetree will lead to a good-cost node, thealgorithm still tries to initially distri-bute the good nodes as evenly as pos-sible among all the PEs. If the search finds a solution, the algo-rithm terminates. There is no master PEresponsible for first generating and thendistributing the nodes to other PEs. Thus,compared to the host-node model, thisstatic-node assignment’s overhead is neg-ligible. To illustrate this, try assigning 10tasks to four processors using two PEs(PE
1
and PE
2
). Here,
S
is 4 because asearch-tree node can have a maximum of four successors; so each PE generates fournodes numbered 1 to 4 (as in Figure 6, where the boxed number is the node’s
f
(1)Generate a random solution(2)Let S_Opt be the cost of this solution(3)Reorder the tasks(4)Build the initial node
s
and insert it into the list OPEN(5)Set
f
(
s
) = 0(6)Repeat(7)Select the node
n
with smallest
f
value.(8)if(
n
is not a Solution )(9)Generate successors of
n
(10)foreach successor node
n
’ do(11)if(
n
’ is not at the last level in thesearch tree)(12)
f
(
n
’) =
g
(
n
’) +
h
(
n
’)(13)else
f
(
n
’) =
g
(
n
’)(14)if(
f
(
n
’) <= S_Opt)(15)Insert
n
’ into OPEN(16)end for(17)e
nd if(18)if(
n
is a Solution)(19)Report the Solution and stop(20)Until(
n
is a Solution) or (OPEN is empty)
Figure 4. The Optimal Assignment with Sequential Search algorithm (OASS).
20112(28)12(Goal)11102XX(28)122XX(29)8201XX(28)5210XX(26)10201X2(28)7210X2(28)102X2(28)2 19 410XXX(28)12XXX(26)6 320XXX(28)21XXX(26)1XXXX(26)2XXXX(24)XXXXX(0)
t
0
t
1
t
2
t
3
t
4
p
2
p
0
p
1
p
1
p
2
Final assignment
Figure 5. Search tree for the example problem using OASS (14 nodesgenerated, 12 nodes expanded).
.

Search

Similar documents

Tags

Related Search

Task Scheduling in Heterogeneous Computing SyDistributed Computing SystemsHeterogeneous Distributed SystemsHuman Factors in Computing SystemsDistributed Computing ArchitectureDistributed ComputingDistributed Information SystemsInside attack in a Distributed computer systeDistributed Embedded SystemsParallel \u0026 Distributed Computing

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks