Int. J. BioInspired Computation, Vol. 2, No. 2, 2010
71 Copyright © 2010 Inderscience Enterprises Ltd.
A viral system massive infection algorithm to solve the Steiner tree problem in graphs with medium terminal density
Pablo Cortés*, José M. García, Jesús Muñuzuri and José Guadix
Ingeniería de Organización, School of Engineering, University of Seville, Camino de los Descubrimientos s/n, E41092 Sevilla, Spain Email: pca@esi.us.es Email: jmgs@esi.us.es Email: munuzuri@esi.us.es Email: guadix@esi.us.es *Corresponding author
Abstract:
The Steiner tree problem in graphs presents highest difficulties in case of graphs with a medium density of terminals. In fact, most of the efficient algorithms that have been developed to deal with the Steiner tree problem show their worse behaviour for a terminal density between 15 and 30%. Here we present a new bioinspired approach based on a biological virus infection called viral system (VS) that emulates a massive infection in an organism (representing a massive exploration of the feasibility region). The approach is tested in a largesized library of problems for which the optimal solution is known, and it is compared to other very efficient soft computing methodologies such as Tabu search and genetic algorithms that have been developed for this specific problem. The VS massive infection algorithm improves the results provided by such approaches.
Keywords:
viral system; VS; bioinspired methods; Steiner tree problem; networks.
Reference
to this paper should be made as follows: Cortés, P., García, J.M., Muñuzuri, J. and Guadix, J. (2010) ‘A viral system massive infection algorithm to solve the Steiner tree problem in graphs with medium terminal density’,
Int. J. BioInspired Computation
, Vol. 2, No. 2, pp.71–77.
Biographical notes:
P. Cortés studied Industrial Engineering and obtained his PhD in the School of Engineering at the University of Seville, Spain, in year 2000. Since then, he has been involved in artificial intelligence methods with special attention to bioinspired technologies. Currently he lectures at the School of Engineering of the University of Seville. J.M. García studied Computer Engineering in the University of Seville. He obtained his PhD in Industrial Engineering in 2003. His research is focused on computer science methodologies applied on management problems with special attention on network synthesis and manufacturing problems. J. Muñuzuri has a PhD in Industrial Engineering and currently lectures at the School of Engineering of the University of Seville. He also obtained his MSc in Industrial Design and Production during his stay in the University of Wales Swansea in 1996. His main fields of research are operations research and simulation applied to logistics, transport and telecommunications. J. Guadix is a Teaching Assistant in the School of Engineering at the University of Seville. He earned his Doctorate in 2004 from University of Seville. His research interests include operations research and computational intelligence methods applied on hotel management, which is the field of his PhD. He has been a Consultant for several leading services companies and regional government administrations since 2000.
1 Introduction
The Steiner tree problem in graphs considers the connection of a set of nodes, identified as terminal nodes, using another set of nodes, called Steiner nodes. The Steiner tree problem tries to find the minimum cost network that connects the subset of terminal nodes in a specific graph
G
= (
N
,
A
). The Steiner tree problem can be stated as follows: given a nondirected graph
G
= (
N
,
A
) with 
N

nodes and 
A
 arcs with costs
c
ij
∀
(
i
,
j
)
∈
A
, and a subset
T
⊆
N
with 
T
 nodes called terminals or targets, with the rest of the nodes in
N
called Steiner nodes; find a network
G
T
⊆
G
that connects all the terminal nodes in
T
at minimum cost. This network can include some of the Steiner nodes but does not have to include all the Steiner nodes. The Steiner tree problem presents a major difficulty for problems with a medium density of terminals. It is known
72
P. Cortés et al.
that the Steiner tree problem with low terminal density can be efficiently solved with approaches based on shortest path algorithms (in the limit, the case with only two terminals corresponds to the shortest path problem), and Steiner tree problems with high terminal density can be efficiently solved with approaches based on minimum spanning trees (MST) (in the limit, the total absence of Steiner nodes corresponds to the MST problem). Problems with medium terminal density reveal themselves as the most complex cases, and the most suitable approaches experiment the highest difficulties. To solve this type of problems we are presenting a novel approach based on a soft computing technique called viral systems (VS) (firstly described in Cortés et al., 2008), and we compare the provided results with other wellestablished approaches in a large sized library of problems. The new approach differs from the srcinal VS in the infection conceptualisation. VS were firstly introduced making use of a selective (parsimonious) infection process. Here we reveal how considering a quick massive infection can provide better solutions for special cases. The rest of this paper follows with a brief description of VS massive infection algorithm in Section 2, the presentation of the computational results in Section 3, and the final conclusions in last section.
2 A brief description of VS in massive infection cases
VS were first introduced by Cortés et al. (2008) with the description of a new soft computing approach and its biological analogy with a viral infection; readers interested in a detailed description of such methodology are referred to this paper. One of the main characteristics of VS refers to the replication mechanism of viruses, together with the capability of the host organism to develop an antigenic response. Basically, viruses can follow two different replication processes: lytic or lysogenic replication. In lytic replication, the virus infects the cell starting to replicate nucleuscapsids (copies) of the virus. Once a maximum number of replications has been reached, the new viruses are released and try to infect new cells of the organism. In lysogenic replication, the virus infects the cell and remains hidden inside until a number of generations has passed. After that, the genome of the cell is altered in a way similar to a mutation process. VS is an iterative method that runs during a maximum number of iterations, or until the optimum is reached in case of a known optimum. VS defines the
clinical picture
of an infected population as the description of all the cells infected by viruses. Computationally, it includes the encoding of the solution that is being explored (the genome of the cell that is infected, in biological terms) and the number of nucleuscapsids being replicated, NR, (for lytic replications) or the number of hidden generations, IT, (for lysogenic replications). Thus the state of each virus is given by the threetuple ‘cell genomeNRIT’. All these threetuples corresponding to the cells infected by viruses define the clinical picture. Every cell infected by a virus develops a lytic or a lysogenic replication according to a probability
p
lt
(for lytic replication) or
p
lg
otherwise, where
p
lt
+
p
lg
= 1. In case of
lysogenic replications
, the activation of the mutation process takes place after a limit of iterations has passed (LIT). The value of LIT depends on the cell’s health conditions, so a healthy cell (high value of the objective function being minimised,
f
(
x
)) will have a low infection probability, i.e., the value of LIT will be higher. An unhealthy cell, on the contrary, will have a lower value of LIT. In case of
lytic replications
, a number of virus replications (NR) is calculated for each iteration as a function of a binomial variable, Z, adding its value to the current NR in the clinical picture. Z is calculated using a binomial distribution given by the maximum level of nucleuscapsids replicated, LNR, and the single probability of one replication,
p
r
,: Z = Bin (LNR,
p
r
). LNR represents the limit to break the cell border and to release the lodged viruses. As in the lysogenic cycle, the value of LNR is set depending on the value of the objective function being minimised,
f
(
x
). Thus cells with higher
f
(
x
) have lower probability of getting infected, and therefore the value of LNR will be higher. After that, each liberated virus will have a probability,
p
i
, of infecting other new cells of the neighbourhood. If the neighbourhood cardinality of
x
is defined as 
V
(
x
), the number of cells infected by the virus in the neighbourhood can be calculated as a binomial distribution given by Y = Bin (
V
(
x
),
p
i
). On the other hand, in order to defend itself from the growth of the viral infection, the
organism
(the set of cells) responds by releasing antigens. In the clinical picture, each one of the infected cells generates antibodies according to a Bernoulli probability distribution A(
x
) = Ber (
p
an
), where
p
an
is the unitary probability of generating antibodies by the cell
x
in the clinical picture. Hence, the total population of infected cells generating antibodies is characterised by a binomial distribution of parameters: the size of the clinical picture,
n
, and the probability of generating antibodies,
p
an
: A(population) = Bin (
n
,
p
an
). Also, the antigenic response for every cell in the neighbourhood of an active virus is estimated as a Bernoulli probability distribution given by the probability of generating antibodies,
p
an
: A(
x
’) = Ber (
p
an
):
x
’
∈
V(
x
). Therefore, the total number of cells with antibodies in the neighbourhood will follow a binomial probability distribution given by the total size of the neighbourhood for all the active viruses, V(
x
), and the probability of generating antibodies,
p
an
: A = Bin (V(
x
),
p
an
). In this situation, a Markovian process defines the evolution of the clinical picture. Let
01LNR
(,,...,)
π π π π
=
be the probability of a cell with 0, 1, …, LNR nucleuscapsids replicated. Equations (1–3) are satisfied in steady state.
A viral system massive infection algorithm to solve the Steiner tree problem in graphs
73
P
π π
= ⋅
(1)
100
11, 2, ..., LNR 1
j jjkk k
pj p
π π
−−=
⎛ ⎞⎜ ⎟= ⋅ ∀ =⎜ ⎟−⎝ ⎠
∑
(2)
01LNR
...1
π π π
+ + + =
(3) To ensure computational control of the infection evolution (4) has been given as a suitable value for
p
an
.
( )( )
LNR LNR
()1()1
iani
npVx pnpVxn
π π
⋅ ⋅ ⋅ −>⋅ ⋅ ⋅ − +
(4) where
()
Vx
is the average neighbourhood size for a specific problem.
2.1 Considerations of VS in relation to the Steiner tree problem
Considering a Steiner tree problem with N nodes and T terminals, the cell genome is represented by a bit string of size 
N
–
T
 corresponding to the Steiner nodes in the graph, where position
i
corresponds to node
i
in the graph. Taking value 1 means that the Steiner node
i
is connected, with the bit set to 0 otherwise. We consider a bit string of size 
N
–
T
 because all the terminal nodes must be in the Steiner tree, so it is enough to use a bit string that identifies only the Steiner nodes in the Steiner tree. Once the Steiner nodes of the graph have been specified, a Steiner tree can be constructed by means of a MST that contains all the terminal nodes (set
T
), the subset of Steiner nodes fixed to 1 in the bit string and, perhaps, some artificial arcs if the set is disconnected. If there is need to introduce artificial arcs due to the disconnection of the tree, there appear different options, and we follow the graph construction mechanisms (Gendreau et al., 1999). A key decision in every VS is to state an adequate cell neighbourhood for the virus in the lytic replication process, as well as to define the genome alteration process for the lysogenic replication. In the case of the Steiner tree problem, given a feasible solution, the genome alteration is carried out by flipping a bit in the string when the lysogenic replication is being considered. On the other hand, when considering the lytic replication, the neighbourhood of the feasible solution (infected cell) being considered consists of the set of bit strings that can be obtained by the removal or the addition of a single Steiner node from/to the current cell encoding. In order to be efficient, the new MSTs must be found by manipulating a rooted tree data structure carefully (Gendreau et al., 1999). Due to the special encoding for the Steiner problem solutions the neighbourhood size is constant and equal to the number of Steiner nodes. It must be noted that the neighbourhood is set by changing the value of each bit from 0 to 1 and vice versa.
2.2 VS pseudocode
The following pseudocode (Table 1) describes a VS massive infection applied to the Steiner tree problem.
Table 1
General pseudocode for the VS massive infection procedure
Virus_System(
N
max
, Clinical_Size, p
lt
,p
i
, p
an
, p
r
, LNR, LIT
)
CP
=
∅
/* Clinical Picture
/* Get Initial Clinical Picture
for
i
= 1 to
Clinical_Size
/* Get randomly a feasible solution and assign randomly a replication type
CP
(
i
) = Get_Random_Feasible_Solution()
CP
(
i
).Replicat_Type = Get_Random_Replication _Type(
p
lt
) next do
iterations
=
iterations
+ 1
i
= Select_Random_Solution(
Clinical_Size
) if
CP(i)
.Replicat_Type = ‘Lytic’ Then Lytic_Replication(
CP(i), p
lt
,
p
i
,
p
an
,
p
r
, LNR
) else Lysogenic_Replication(
CP(i) , p
lt
) loop until
iterations
=
N
max
or Check_Gap(
CP
) = True end Virus_System  procedure Lytic_Replication (
C
S
,
p
lt
,
p
i
,
p
an
,
p
r
, LNR
)
C
S
= Current solution
/* Get the number of replicated nucleuscapsids
z
= Get_Random_Binomial_Probability(
LNR
,
p
r
) do
i
=
i
+ 1 if
z
< Binomial(
i
) then
P
(
c
).
NR
=
P
(
c
).
NR
+ 1 loop until
i
=
LNR
or
z
≥
Binomial(
i
)
/* Check infection
if
C
S
.
NR
>
C
S
.
LNR
then
/* Get the list
V
S
of neighbouring solutions of
C
S
in descending order regarding solution health
VA
S
= Get_ Arranged_Neighbourhood(
V
s
)
/* Get the clinical picture CP in ascending order regarding solution health
CPA
= Get_ Arranged_Clinical_Picture(
CP
)
i
= 1 for each
S’
∈
VA
S
if
i
<= 
CP
A
 then
replace
=
false
do
a
= Get_Random_Binomial_Probability(
V
s
 ,
p
an
)
74
P. Cortés et al.
Table 1
General pseudocode for the VS massive infection (continued)
b
= Get_Random_Binomial_Probability (
V
s

,
p
i
) if
a > p
an
and
b > p
i
then
/* Replace
CP
A
(
i
) with a new solution
C
S’
CP
A
(
i
) =
C
S’
CP
A
(
i
).Replicat_Type = Get_Random_Replicat_Type(
p
lt
)
replace
=
true
i
=
i
+ 1
loop until
replace
=
true
or
i
> 
CP
A
 endfor end Lytic_Replication  procedure Lysogenic_replication(
C
S
,
p
lt
)
C
S
.
IT
=
C
S
.
IT
+ 1 if
C
S
.
IT
>
C
S
.
LIT
then
s
= Get_Random_Gen ()
/* apply move of mutation on C
S
C
S NEW
=
Mutation(
C
S
,
s
)
C
S NEW
.Replicat_Type = Get_Random_Replication_Type(
p
lt
)
return
C
S
end Lysogenic_replication
The general pseudocode functions and procedures previously described in the table are complemented with the specific problem procedures. These are mainly the neighbourhood characterisation and the problemoriented lysogenic replication. These procedures are described for the Steiner tree problem in Table 2.
Table 2
Specific pseudocode for the Steiner tree problem procedure Lytic_Replication (
C
S
,
p
lt
,
p
i
,
p
an
,
p
r
, LNR
)
C
S
= Current solution
S
= Steiner nodes in the current solution
/* Get the number of replicated nucleuscapsids
z
= Get_Random_Binomial_Probability(
LNR
,
p
r
)
i
= 0
do
i
=
i
+ 1
if
z
< Binomial(
i
) then
P
(
c
).
NR
=
P
(
c
).
NR
+ 1
loop until
i
=
LNR
or
z
≥
Binomial(
i
)
/* Check infection
if
C
S
.
NR
>
C
S
.
LNR
then
/* Get the list
V
S
of neighbouring solutions of
C
S
in descending order regarding solution health
V
S
=
∅
for each
s
∈
S
if
s
∈
S
then
S
NEW
=
S
 {
s
} else
S
NEW
=
S
+ {
s
}
V
S
=
V
S
+
S
NEW
Table 2
Specific pseudocode for the Steiner tree problem (continued) endfor
VA
S
= Get_ Arranged_Neighbourhood(
V
s
)
/* Get the clinical picture CP in ascending order regarding solution health
CPA
= Get_ Arranged_Clinical_Picture(
CP
)
i
= 1 for each
S’
∈
VA
S
if
i
<= 
CP
A
 then
replace
=
false
do
a
= Get_Random_Binomial_Probability(
V
s
,
p
an
)
b
= Get_Random_Binomial_Probability (
V
s

,
p
i
)
if
a > p
an
and
b > p
i
then
/* Replace
CP
A
(
i
) with the solution
C
S’
generated by
S’}
CP
A
(
i
) =
C
S’
CP
A
(
i
).Replicat_Type = Get_Random_Replicat_Type(
p
lt
)
replace
=
true i
=
i
+ 1
loop until
replace
=
true
or
i
> 
CP
A

endfor end Lytic_Replication  procedure Lysogenic_replication(
C
S
,
p
lt
)
C
S
.
IT
=
C
S
.
IT
+ 1 if
C
S
.
IT
>
C
S
.
LIT
then
s
= Get_Random_Steiner_Node() if
s
∈
S
then
S
=
S
 {s}
else
S
=
S
+ {s}
C
S
.Replicat_Type = Get_Random_Replication_Type(
p
lt
) return
C
S
end Lysogenic_replication
3 Computational results
The ORLibrary that can be accessed in the website http://people.brunel.ac.uk/~mastjjb/jeb/info.html (Beasley, 2008) has been used for the computational results, considering series SteinC, SteinD and SteinE. We divided the Steiner tree problem into three groups: Group No.1 is a low terminal density group that contains problems with less than 15% of terminal nodes; Group No. 2 corresponds to medium terminal density and consists of problems with more than 15% and less than 30% of terminal nodes; and Group No. 3 features problems with more than 30% of terminals.
A viral system massive infection algorithm to solve the Steiner tree problem in graphs
75
Table 3 shows results obtained by the PTabu and FTabu methods (Gendreau et al., 1999), two genetic algorithms GAE (Esbensen, 1995), and GAV (Voss and Gutenschwager, 1999), and a VS based on a selective infection (Cortés et al., 2008). The table also includes the minimumpath heuristic (MPH) (Takahashi and Matsuyama, 1980) which is the best greedy heuristic implemented for the Steiner tree problem. Problems were assigned to groups after preprocessing the Steiner tree problem with the five graph reduction rules described in Winter (1897). Table 3 shows the problem code that is associated to each analysis group. As clearly stated by the figures, Group 2 is largely the most complex one, and bigger errors are always obtained for this group, taking into consideration both the average error and the maximum error. We have therefore focused our research on Group 2. In our opinion, a major effort has to be put on this specific and characteristic group of the Steiner tree problem. To compare methodologies, we executed the VS massive infection five times for each problem, with the algorithm run on a Xeon (TM) 2.80 GHz; 1 GB RAM. Table 4 shows the results compared to the previous of Table 3. The most significant parameters of the algorithm are the maximum number of iterations (taking into account that each iteration is computed after a virus replication takes place, which does not necessarily happen every time the cells neighbourhood is searched), and the probabilities of infecting neighbouring cells (
p
i
) and generating antibodies (
p
an
). It has to be taken into account that
p
an
and
p
i
must satisfy equation (4). We finally selected a number of iterations equal to 100,000, a clinical picture size of 100, a probability of antigenic response equal to 0.1, and a probability of infection equal to 0.6. Likewise, we fixed the values of LNR and LIT to 15 and the probability of lytic replication to 0.7). Table 4 provides the results for Steiner tree problem Group 2. VS massive infection algorithm provided the best solution for 11 instances, nine of which were the optimum. The selective infection case of VS provided seven times the optimum and the best approach and the FTabu provided five optimums, although it was the best approach on seven occasions. The results provided by the less efficient PTabu were slightly worse. The behaviour of the other methods is clearly worse, producing two best approaches in case of the genetic algorithm due to Esbensen (1995) and only one for the MPH heuristic. Furthermore, the massive infection approach maintained a bounded distribution of its standard deviation, which provides a better adjustment around the optimum. It also provided the best solution for all the problems except for C8 (0.39% error versus 0.00% of FTabu), D8 (0.47% versus 0.37% of FTabu) and E8 (1.78% versus 0.42% of GAE).
Table 3
Results for the Steiner tree problem according to the percentage of terminal
ORlibrary Group 1 Group 2 Group 3
Trials C{1,2,6,7,11,12,16,17} D{1,2,6,7,11,12,16,17} E{1,2,6,7,11,12,16,17} C{8,9,13,14,18,19} D{8,13,14,18,19} E{8,13,14,18,19} C{3,4,5,10,15,20} D{3,4,5,9,10,15,20} E{3,4,5,9,10,15,20} Average value MPH: GAV: GAE: PTabu: FTabu: VSs: 0.15% 0.88% 0.60% 0.15% 0.06% 0.00% MPH: GAV: GAE: PTabu: FTabu: VSs: 2.88% 1.13% 0.95% 0.63% 0.39% 0.57% MPH: GAV GAE: PTabu: FTabu: VSs: 0.35% 0.37% 0.17% 0.08% 0.04% 0.10% Standard deviation MPH: GAV: GAE: PTabu: FTabu: VSs: 0.50% 1.08% 0.78% 0.50% 0.30% 0.00% MPH: GAV: GAE: PTabu: FTabu: VSs: 2.32% 0.49% 0.73% 0.66% 0.46% 0.72% MPH: GAV: GAE: PTabu: FTabu: VSs: 0.35% 0.39% 0.23% 0.15% 0.09% 0.19% Maximum error MPH: GAV: GAE: PTabu: FTabu: VSs: 2.07% 3.62% 2.07% 2.07% 1.49% 0.00% MPH: GAV: GAE: PTabu: FTabu: VSs: 7.62% 2.28% 3.37% 2.66% 1.60% 2.66% MPH: GAV: GAE: PTabu: FTabu: VSs: 1.17% 1.25% 0.92% 0.42% 0.32% 0.69%