A Bayesian Approach to the Evolution of MetabolicNetworks on a Phylogeny
Aziz Mithani
1
*
, Gail M. Preston
2
, Jotun Hein
1
1
Department of Statistics, University of Oxford, Oxford, United Kingdom,
2
Department of Plant Sciences, University of Oxford, Oxford, United Kingdom
Abstract
The availability of genomes of many closely related bacteria with diverse metabolic capabilities offers the possibility of tracing metabolic evolution on a phylogeny relating the genomes to understand the evolutionary processes and constraintsthat affect the evolution of metabolic networks. Using simple (independent loss/gain of reactions) or complex(incorporating dependencies among reactions) stochastic models of metabolic evolution, it is possible to study howmetabolic networks evolve over time. Here, we describe a model that takes the reaction neighborhood into account whenmodeling metabolic evolution. The model also allows estimation of the strength of the neighborhood effect during thecourse of evolution. We present Gibbs samplers for sampling networks at the internal node of a phylogeny and forestimating the parameters of evolution over a phylogeny without exploring the whole search space by iteratively samplingfrom the conditional distributions of the internal networks and parameters. The samplers are used to estimate theparameters of evolution of metabolic networks of bacteria in the genus
Pseudomonas
and to infer the metabolic networksof the ancestral pseudomonads. The results suggest that pathway maps that are conserved across the
Pseudomonas
phylogeny have a stronger neighborhood structure than those which have a variable distribution of reactions across thephylogeny, and that some
Pseudomonas
lineages are going through genome reduction resulting in the loss of a number of reactions from their metabolic networks.
Citation:
Mithani A, Preston GM, Hein J (2010) A Bayesian Approach to the Evolution of Metabolic Networks on a Phylogeny. PLoS Comput Biol 6(8): e1000868.doi:10.1371/journal.pcbi.1000868
Editor:
Aviv Regev, Broad Institute of MIT and Harvard, United States of America
Received
September 28, 2009;
Accepted
June 25, 2010;
Published
August 5, 2010
Copyright:
ß
2010 Mithani et al. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the srcinal author and source are credited.
Funding:
This work was supported by the grants from the Higher Education Commission, Government of Pakistan (AM), The Royal Society (GMP) and theBiotechnology and Biological Sciences Research Council grant BB/E007872/1 (GMP). The funders had no role in study design, data collection and analysis, decisionto publish, or preparation of the manuscript.
Competing Interests:
The authors have declared that no competing interests exist.* Email: mithani@stats.ox.ac.uk
Introduction
Biological networks are under continuous evolution and theirevolution is one of the major areas of research today [1–6]. Theevolution of biological networks can be studied using variousapproaches such as maximum likelihood and parsimony [7,8].The maximum likelihood approach calculates the likelihood of evolution of one network into another by summing over allpossible networks that can occur during the course of evolutionunder the given model. Parsimony, on the other hand, assumesminimum evolution and only considers those networks thatcorrespond to the minimum number of changes between thetwo networks. However, the problem with these approaches is thatenumeration of networks potentially occurring during evolutionbecomes impractical in the case of biological networks as thenumber of networks grows exponentially with the network size.Recently, the evolution of biological networks has been studiedusing stochastic approaches where efficient sampling techniquesmakes the problem computationally tractable. For example, Wiuf
et al.
[5] used importance sampling to approximate the likelihoodand estimate parameters for the growth of protein networks undera duplicate attachment model. Similarly, Ratmann
et al.
[6] usedapproximate Bayesian computation to summarize key features of protein networks. The authors also approximated the posteriordistribution of the model parameters for network growth using aMarkov Chain Monte Carlo algorithm.In this work, we focus on metabolic networks. The evolution of metabolic networks is characterized by gain and loss of reactions(or enzymes) connecting two or more metabolites and can bedescribed as a discrete space continuous time Markov processwhere at each step of the network evolution a reaction is eitheradded or deleted until the desired network is obtained [9]. To givea biologically relevant picture of evolution some reactions may bedefined as core (reactions that cannot be deleted during the courseof evolution) or prohibited (reactions that cannot be added) in thegiven networks. The evolution of metabolic networks can then bestudied using simple (independent loss/gain of reactions) orcomplex (incorporating dependencies among reactions) stochasticmodels of metabolic evolution. We previously presented aneighbordependent model for the insertion and deletion of edgesfrom a network where the rates with which reactions are added orremoved from a network depend on the fraction of neighboring reactions present in the network [9]. In this model, two reactionswere considered to be neighbors if they shared at least onemetabolite. The model is summarized in Section ‘Neighbordependent model’ below. The neighbordependent model depictsa biologically relevant picture of metabolic evolution by taking thenetwork structure into account when calculating the rates of insertion and deletion of reactions from a network. The model is,however, limited in the sense that it does not allow one to measurethe strength of the neighborhood structure affecting network evolution.
PLoS Computational Biology  www.ploscompbiol.org 1 August 2010  Volume 6  Issue 8  e1000868
Here, we present an extended model called the hybrid modelthat combines an independent edge model, where edges aregained or lost independently, and a neighbordependent model of network evolution [9] such that the rate of going from one network to another is a sum of the rates under the two models based on aparameter, which measures the probability of being in theneighbor dependent model. This allows estimation of theneighborhood effect during metabolic evolution. When modeling network evolution, we represent metabolic networks as directedhypergraphs [9–11], where an edge called a hyperedge representsa reaction and may connect any number of vertices or metabolites.Representing metabolic networks as hypergraphs not onlycaptures the relationship between multiple metabolites involvedin a reaction but also provides an intuitive approach to studyevolution since loss or gain of reactions can be regarded as loss orgain of hyperedges.We use the hybrid model to study the evolution of a set of metabolic networks connected over a phylogeny. Previousattempts to study the evolution of metabolic networks in aphylogenetic context include Dandekar
et al.
[12] and Peregrin
et al.
[13]. However, to our knowledge, the stochastic treatment of metabolic evolution over a phylogeny is an unexplored area. Here,the phylogenetic relationship between the networks is establishedusing sequence data since the metabolic annotations available forthe majority of genomesequenced organisms are generated using automated annotation tools based on the similarity of predictedgenes to genes of known function and, therefore, contain a hugeamount of noise. In addition, we treat the branch lengths obtainedusing the sequence data as certain. The advantage of fixing branchlengths is that the calculations do not require summing over allbranch lengths for the given tree. Calculating the likelihood over aphylogeny then requires a sum, over all possible networks that mayhave existed at the interior nodes of the tree, of the probabilities of each scenario of events. This is similar to the idea introduced byFelsenstein [14] for observing DNA sequences over a phylogeny.To sample the networks at internal nodes of the tree a Gibbssampler [15,16] is presented that samples a network conditionedon its three neighbors, including a parent and two childrennetworks, for given parameter values. A Gibbs sampler forestimating the parameters of evolution that encases the Gibbssampler for internal networks sampling is also presented. Thesampler estimates the evolution parameters without exploring thewhole search space by iteratively sampling from the conditionaldistributions of the trees and parameters. We demonstrate theGibbs sampler by estimating and comparing the evolutionparameters for the metabolic networks of bacteria belonging tothe genus
Pseudomonas
. The Gibbs sampler can also be used to inferthe ancestral networks of a given phylogeny. This is shown byinferring the metabolic networks of
Pseudomonas
spp. ancestors.
Methods
Neighbordependent model
In the neighbordependent for the evolution of metabolicnetworks [9] hyperedges are inserted or deleted from a network depending on the fraction of neighboring hyperedges present inthe network. Two hyperedges are considered as neighbors if theyshare a node. The model assumes that the number of nodes in anetwork remains fixed and there is a set
E
such that
D
E
D
~
M
of hyperedges connecting these nodes. The model also assumes theexistence of a network called
Reference Network
which contains allthese hyperedges. If the hyperedges in the reference network arelabeled 1 to
M
then any given network
x
can be represented as asequence of 0s and 1s such that the
i
th entry
(0
v
i
ƒ
M
)
in thesequence is 1 if and only if the hyperedge labeled
i
is present in thenetwork
x
, and 0 otherwise. Let the rate matrix describing theevolution under the neighbordependent model be denoted by
C
. An entry
c
(
x
’
i
;
x
i
)
in this rate matrix corresponds to the rate of going from a network
x
to a network
x
’
, which differs from
x
atposition
i
. In the neighbordependent model, the rate
c
(
x
’
i
;
x
i
)
of going from
x
to
x
’
depends on
x
i
,
x
’
i
and the neighboring hyperedges
Y
(
x
i
)
present in the network
x
, and is given as follows:
c
(
x
’
i
;
x
i
)
~
q
(
x
i
,
x
’
i
)
F
(
x
i
,
Y
(
x
i
))
ð
1
Þ
where the function
F
corresponds to the neighborhood component and
q
(
x
i
,
x
’
i
)
is the appropriate entry from the
2

2
ratematrix
Q
i
for the hyperedge
i
. The rate matrix
Q
i
is given as
Q
i
~{
l lm
{
m
!
ð
2
Þ
where
l
is the insertion rate and
m
is the deletion rate.The neighborhood component
F
(
x
i
,
Y
(
x
i
))
weights the insertion and deletion rates by the proportion of neighbors present inthe network and is given as follows:
F
(
x
i
,
Y
(
x
i
))
~
D
Y
(
x
i
)
D
P
i
=
j
x
j
,
D
Y
(
x
i
)
D
w
0,1
M
z
1, Otherwise
:
8>><>>:
ð
3
Þ
The denominator
P
i
=
j
x
j
in Equation 3 gives the number of hyperedges present in the current network.
Hybrid model of network evolution
Although the neighbordependent model summarized aboveproduces a biologically relevant behavior whereby highly connected reactions are toggled more frequently than the poorlyconnected counterparts, it does not allow one to determine the
Author Summary
Metabolic networks correspond to one of the mostcomplex cellular processes. Most organisms have acommon set of reactions as a part of their metabolicnetworks that relate to essential processes such asgeneration of energy and the synthesis of importantbiological molecules, which are required for their survival.However, a large proportion of the reactions present indifferent organisms are specific to the needs of individualorganisms. The regions of metabolic networks corresponding to these nonessential reactions are under continuousevolution. Using different models of evolution, we can ask important biological questions about the ways in whichthe metabolic networks of different organisms enablethem to be welladapted to the environments in whichthey live, and how these metabolic adaptations haveevolved. We use a stochastic approach to study theevolution of metabolic networks and show that evolutionary inferences can be made using the structure of thesenetworks. Our results indicate that plant pathogenic
Pseudomonas
are going through genome reductionresulting in the loss of metabolic functionalities. We alsoshow the potential of stochastic approaches to infer thenetworks present at ancestral levels of a given phylogenycompared to deterministic methods such as parsimony.
Metabolic Evolution on a PhylogenyPLoS Computational Biology  www.ploscompbiol.org 2 August 2010  Volume 6  Issue 8  e1000868
strength of the neighborhood structure effecting the evolution of metabolic networks. To overcome this limitation, a parameter canbe introduced in the model that corresponds to the neighborhoodeffect during the course of metabolic network evolution.Consider two networks
x
and
x
’
which differ at position
x
i
. Thehybrid model combines the independent edge model where edgesare added or deleted independently, and the neighbordependentmodel summarized above such that the rate of going from
x
i
to
x
’
i
is the sum of the rates under the two models based on a parameter
d
, 0
ƒ
d
ƒ
1
, which specifies the probability of being in theneighbordependent model. The rate from
x
i
to
x
’
i
is given as
v
(
x
’
i
;
x
i
)
~
d
:
c
(
x
’
i
;
x
i
)
z
(1
{
d
)
:
q
(
x
i
,
x
’
i
)
where the term
c
(
x
’
i
;
x
i
,
Y
(
x
i
))
is the rate under the neighbordependent model given by Equation 1 and the term
q
(
x
i
,
x
’
i
)
is therate under the independent edge model corresponding to theappropriate entry from the rate matrix Q given by Equation 2.Substituting the value of
c
(
x
’
i
;
x
i
)
from Equation 1, the aboveequation can be simplified as follows.
v
(
x
’
i
;
x
i
)
~
d
q
(
x
i
,
x
’
i
)
F
(
x
i
,
Y
(
x
i
))
z
(1
{
d
)
q
(
x
i
,
x
’
i
)
~
q
(
x
i
,
x
’
i
)
½
d
F
(
x
i
,
Y
(
x
i
))
z
(1
{
d
)
ð
4
Þ
where the term
F
(
x
i
,
Y
(
x
i
))
corresponds to the neighborhoodcomponent given by Equation 3.It can been seen from (4) that the model behaves under theindependent edge model when
d
equals 0 and under the neighbordependent model described in the previous section when
d
equals1. For example, consider the toy network
H
1
shown in Figure 1A.The reference network
H
containing all allowed hyperedges forthis example system is also shown in the figure. The systembehavior for different values of
d
is illustrated in Figure S1 for thetoy network
H
1
when simulated under the hybrid model along with the number of neighbors for each hyperedge. The rates werecalculated at each step using (4). An edge was then selected basedon these rates and was inserted if absent from the current network and deleted otherwise. As expected, hyperedges evolve independently when
d
~
0
, resulting in similar insertion frequencies for allhyperedges and increasingly reflecting their neighborhood as the value of
d
goes up to unity. The fitness of the model is discussed inthe Section ‘Fitness of the hybrid model’ below.
Evolution on a phylogeny
Biological networks are connected over a phylogenetic treewhich is known through sequence analysis. Calculating thelikelihood over a phylogeny requires a sum, over all possiblenetworks that may have existed at the interior nodes of the tree, of the probabilities of each scenario of events. For example, Figure 1Ashows an example system containing three networks
H
1
,
H
2
and
H
3
with a phylogeny connecting the three networks shown inFigure 1B. Let the phylogenetic tree be denoted by
T
. Thelikelihood of the tree
T
is given as follows.
L
H
(
T
)
~
X
H
1,2,3
f
P
H
(
H
1,2,3
)
P
H
,
t
3
(
H
3
D
H
1,2,3
)
:
X
H
1,2
f
P
H
,
t
1,2
(
H
1,2
D
H
1,2,3
)
P
H
,
t
1
(
H
1
D
H
1,2
)
P
H
,
t
2
(
H
3
D
H
1,2
)
ggð
5
Þ
Here
H
denotes the parameters of the model, which is
(
l
,
m
)
inthe case of the neighbordependent model and
(
l
,
m
,
d
)
in the caseof the hybrid model.
P
H
(
H
1,2,3
)
is the marginal probability of observing the root and
P
H
,
t
(
H
j
D
H
i
)
denotes the pairwise likelihoodof evolving from the network
H
i
to the network
H
j
conditioned on
H
i
in time
t
for the given parameters.In general, the likelihood of a tree with more than threenetworks can be calculated using the recursion described byFelsenstein [17]. The likelihood at an internal node
N
of the tree isgiven by the following recurrence relation
L
H
(
N
)
~
X
N l
P
H
,
tl
(
N
l
D
N
)
L
H
(
N
l
)
X
N r
P
H
,
tr
(
N
r
D
N
)
L
H
(
N
r
)
ð
6
Þ
where
N
l
and
N
r
are left and right descendants of the node
N
.The likelihood of the complete tree
T
is then given as
L
H
(
T
)
~
X
N
root
P
H
(
N
root
)
L
H
(
N
root
)
ð
7
Þ
where
P
H
(
N
root
)
is the marginal probability of observing the rootand
L
H
(
N
root
)
is given by Equation 6.Evaluating Equations 5 and 7 requires an algorithm tosystematically and efficiently sample networks at the internalnodes of a tree and a method to calculate the pairwise likelihood of network evolution. A MetropolisHastings algorithm to calculatethe pairwise likelihood based on sampling paths between network pairs was described by Mithani
et al.
[9], which calculates thelikelihood by summing over paths between the given network pairs. To sample networks at the internal node of a tree, a Markovchain can be constructed where states correspond to networks atthe internal nodes. The networks can then be sampled using aGibbs sampler [15,16] as described in the next section.
Figure 1. Toy networks connected by a phylogeny.
(A) Toynetworks consisting of 13 nodes. The nodes are labeled from A to M(blue) and the hyperedges are labeled from 1 to 10 (red). The referencenetwork consists of all allowed hyperedges for this example system.Networks
H
1
,
H
2
and
H
3
consist of subsets of the hyperedges from thereference network. (B) A phylogeny connecting the networks
H
1
,
H
2
and
H
3
.doi:10.1371/journal.pcbi.1000868.g001Metabolic Evolution on a PhylogenyPLoS Computational Biology  www.ploscompbiol.org 3 August 2010  Volume 6  Issue 8  e1000868
Sampling internal nodes
Given a set of networks related by a phylogenetic tree, thenetworks at the internal nodes of the tree can be sampled using aGibbs sampler. The general idea is to sample each internalnetwork by conditioning on its three neighbors (one parent andtwo children). This approach for sampling internal networks issimilar to the one used by Holmes and Bruno [18] for DNAsequence alignment. However, instead of using linear sequences,the sampler takes into account the network structure whencalculating the new state. The procedure is described below.Consider a network
X
with its three neighbors
Y
k
with branchlengths
t
k
,
k
~
1
. . .
3
. The new network
X
’
is selected as follows.1. For each hyperedge
i
, calculate the
2

2
rate matrix
V
X
~
Q
X
d
:
F
X
(
i
,
Y
(
i
))
z
(1
{
d
)
½
where
d
is the neighbordependence probability,
Q
is the ratematrix given by Equation 2 and the function
F
corresponds tothe neighborhood component given by Equation 3.2. Calculate, for each neighbor
Y
k
(
k
~
1,
. . .
, 3)
, the transitionprobabilities
P
H
,
t
k
(
Y
k
(
i
)
D
X
(
i
))
~
exp(
t
k
V
X
)
.3. Sample the new state
s
’
i
~
0,1
f g
for hyperedge
i
from thedistribution
P
(
s
i
)
!
p
(
s
i
)
P
3
k
~
1
P
H
,
tk
(
Y
k
(
i
)
D
s
i
)
ð
8
Þ
where
p
is the vector equilibrium probabilities and can beobtained by solving the equation
p
V
X
~
0
.
Example
Consider the network
H
1,2
in Figure 2 for which newstate is to be calculated. Denote the network by
X
. The threeneighboring networks of the network
H
1,2
are the networks
H
1
,
H
2
and
H
1,2,3
labeled as
Y
1
,
Y
2
and
Y
3
respectively. If
f
i
denotesthe neighborhood component for hyperedge
i
then for the givenrate parameters
l
(insertion) and
m
(deletion), and the neighbordependence probability
d
the rate matrix
V
X
is written as
V
X
~{
l lm
{
m
!
d
f
i
z
(1
{
d
)
ð Þ
:
For simplicity, assume that
d
~
1
. The system then behaves underthe neighbordependent model and the rate matrix simplifies to
V
X
~
f
i
{
l lm
{
m
!
The transition probability matrix of transforming
X
(
i
)
to
Y
1
(
i
)
isthen given as
P
H
,
t
1
(
Y
1
(
i
)
D
X
(
i
))
~
exp(
t
1
C
X
),
~
1
l
z
mm
z
l
exp(
{
t
1
f
i
(
l
z
m
))
l
(1
{
exp(
{
t
1
f
i
(
l
z
m
)))
m
(1
{
exp(
{
t
1
f
i
(
l
z
m
)))
l
z
m
exp(
{
t
1
f
i
(
l
z
m
))
"#
:
The transition probability matrices
P
H
,
t
2
(
Y
2
(
i
)
D
X
(
i
))
and
P
H
,
t
3
(
Y
3
(
i
)
D
X
(
i
))
can be calculated in the similar fashion.Once the transition probability matrices have been obtained,the sample for the new network
X
’
can be drawn using Equation8. For example, if the current configuration of the networks aretaken as shown in Figure 2, then the sample for the new state
s
1
,for hyperedge 1 is drawn from the following distribution:
P
(
s
1
)
!
p
(
s
1
)
P
3
k
~
1
P
H
,
tk
(
Y
k
(1)
D
s
1
),
~
p
(
s
1
)
P
H
,
t
1
(1
D
s
1
)
P
H
,
t
2
(1
D
s
1
)
P
H
,
t
3
(0
D
s
1
)
:
The samples for hyperedges labeled 2 to 10 can be drawn in asimilar fashion to obtain the new network.
Estimation of parameters
The Gibbs sampler described above samples the internalnetworks on a phylogenetic tree for given parameter values. Thiscan be extended to estimate the parameters
H
of evolution where
H
equals (
l
,
m
) in case of the neighbordependent model and(
l
,
m
,
d
) in case of the hybrid model. One way is to nest it withinanother Gibbs Sampler which iteratively samples internalnetworks and parameters from the distributions
P
(
T
D
H
)
and
P
(
H
D
T
)
respectively. The general outline of the Gibbs sampler isas follows:
N
Choose initial values for the parameters
H
(0)
.
N
Generate
T
(0)
by using the procedure described in Section‘Sampling internal nodes’ using
H
(0)
.
N
Use
T
(0)
to generate
H
(1)
by drawing from the distribution
P
(
H
D
T
)
.
N
Repeat
n
times to get subset of points
(
T
(
i
)
,
H
(
i
)
)
, where
1
ƒ
i
ƒ
n
, are the simulated estimates from the joint distribution
P
(
T
,
H
)
.The samples for parameters can be drawn using a MetropolisHastings algorithm [19,20] as described next. Since the MetropolisHastings algorithm is a wellestablished method, it sufficeshere to give details about how a proposal for new parameters can
Figure 2. A sample phylogenetic tree for the toy networksshown in Figure 1.
The tree contains arbitrary networks assigned atthe internal nodes. Also shown are the proportion of insertion anddeletion events and the proportion of allowed insertion and deletionevents while going from various ancestral networks to descendantnetworks.doi:10.1371/journal.pcbi.1000868.g002Metabolic Evolution on a PhylogenyPLoS Computational Biology  www.ploscompbiol.org 4 August 2010  Volume 6  Issue 8  e1000868
be generated. Readers interested in the general details of thealgorithm are referred to Chapter 1 of Gilks
et al.
[21]. Theperformance of the Gibbs sampler is discussed in Text S1.
Parameter proposal
Rates proposal.
For a given tree
T
, a proposal for the rateparameters can be generated from a gamma distribution
c
*
C
(
k
,
h
)
;
c
~
l
,
m
where
k
is the shape parameter and
h
is the scale parameter. Thehyperparameters
k
and
h
can be calculated from the given tree asdescribed next.Starting from root, calculate the proportion of insertion events
I
H
i
[
H
j
and the proportion of deletion events
D
H
i
[
H
j
between theparent network
H
i
and the child network
H
j
in the given tree
T
bydividing the number of insertion and deletion events by the totalnumber of alterable hyperedges
D
E
’
D
in the system. Also, calculatethe proportion of allowed insertion and deletion events betweenthese pairs. Let these be denoted by
I I
H
i
[
H
j
and
DD
H
i
[
H
j
. Thehyperparameters
k
l
and
h
l
for sampling insertion rate can thenbe given as
k
l
~
X
i
,
j
I
H i
[
H j
z
1,
ð
9
Þ
h
l
~
X
i
,
j
I I
H i
[
H j
ð
10
Þ
Similarly, the hyperparameters
k
m
and
h
m
for sampling deletionrate are given as
k
m
~
X
i
,
j
D
H i
[
H j
z
1,
ð
11
Þ
h
m
~
X
i
,
j
DD
H i
[
H j
:
ð
12
Þ
Example.
The calculation of hyperparameters
k
and
h
isdemonstrated on the tree shown in Figure 2 connecting the toynetworks shown in Figure 1. The number of hyperedges in thereference network is 10. If no core or prohibited hyperedges areassumed, then the number of alterable hyperedges
E
’
is also 10, i.e.
D
E
’
D
~
10
. Going from the network
H
1,2
to the network
H
1
there isone insertion event and one deletion event out of 2 and 8 allowedinsertion and deletion events respectively resulting in the following values:
I
H
1,2
[
H
1
~
110
~
0
:
1
D
H
1,2
[
H
1
~
110
~
0
:
1
I I
H
1,2
[
H
1
~
210
~
0
:
2
DD
H
1,2
[
H
1
~
810
~
0
:
8
:
The same is true for going from the network pair
H
1,2
[
H
2
. Valuesfor other network pairs can be calculated in a similar fashion. The values for
I
,
I I
,
D
and
DD
forallparentchild pairs inthe example treeare listed in Figure 2. Using Equations 9 and 10, the hyperparameters for sampling the insertion rate are calculated as
k
l
~
0
:
3
z
1
~
1
:
3and
h
l
~
1
:
0
:
Similarly, using Equations 11 and 12 the hyperparameters forsampling the deletion rate become
k
m
~
0
:
3
z
1
~
1
:
3and
h
m
~
3
:
0
:
Dependence probability proposal.
The hybrid model formetabolic network evolution described above allows estimation of the neighborhood effect shaping the evolution of given set of networks. The proposal for the parameter
d
measuring theprobability of being in the neighbordependent model can begenerated from a beta distribution
d
*
beta(
a
,
b
)
where the hyperparameters
a
and
b
are the shape parameters andare calculated as follows.Calculate the average number of neighbors present in thenetworks present at the leaves of the phylogeny. For example, if the network
x
(
j
)
is a leaf network, i.e. it occurs at the tip of thegiven phylogenetic tree, then calculate
N
x
(
j
)
~
1
M
X
i
D
Y
(
x
i
(
j
))
D
The parameter
a
is then given as the mean of the average numberof neighbors present in all the networks present at the leaves of thegiven phylogenetic tree. For a tree
T
with
l
leaves, this can bewritten as follows.
a
~
1
l
X
j
N
x
(
j
)
,
j
[
leaves(
T
)
ð
13
Þ
The shape parameter
b
corresponds to the average number of neighbors in the reference network (REF) and is given as
b
~
1
M
X
i
D
Y
(
x
i
(
REF
))
D
:
ð
14
Þ
Proposal probability
Rates proposal.
The proposal probability
q
(
l
’
,
m
’
D
l
,
m
)
forthe rate parameters is given as
q
(
l
’
,
m
’
D
l
,
m
)
~
P
c
~
l
,
m
q
(
c
’
D
c
)
such that
Metabolic Evolution on a PhylogenyPLoS Computational Biology  www.ploscompbiol.org 5 August 2010  Volume 6  Issue 8  e1000868