Syst. Biol.
53(1):47–67, 2004Copyright c
Society of Systematic BiologistsISSN: 10635157 print / 1076836X onlineDOI: 10.1080/10635150490264699
Bayesian Phylogenetic Analysis of Combined Data
J
OHAN
A. A. N
YLANDER
,
1
F
REDRIK
R
ONQUIST
,
1
J
OHN
P. H
UELSENBECK
,
2
AND
J
OS
´
E
L
UIS
N
IEVES
A
LDREY
3
1
Department of Systematic Zoology, Evolutionary Biology Centre, Uppsala University, Norbyv¨ agen 18 D, SE752 36 Uppsala, Sweden;Email: johan.nylander@ebc.uu.se (J.A.A.N)
2
Section of Ecology, Behavior and Evolution, Division of Biological Sciences, University of California–San Diego, La Jolla, California 920930116, USA
3
Departamento de Bioversidad y Biolog´ ıa Evolutiva, Museo Nacional de Ciencias Naturales, Jos´ e Guti´ errez Abascal 2, 28006 Madrid, Spain Abstract.—
TherecentdevelopmentofBayesianphylogeneticinferenceusingMarkovchainMonteCarlo(MCMC)techniqueshas facilitated the exploration of parameterrich evolutionary models. At the same time, stochastic models have becomemorerealistic(andcomplex)andhavebeenextendedtonewtypesofdata,suchasmorphology.Basedonthisfoundation,wedevelopedaBayesianMCMCapproachtotheanalysisofcombineddatasetsandexploreditsutilityininferringrelationshipsamong gall wasps based on data from morphology and four genes (nuclear and mitochondrial, ribosomal and proteincoding). Examined models range in complexity from those recognizing only a morphological and a molecular partitionto those having complex substitution models with independent parameters for each gene. Bayesian MCMC analysis dealsefﬁciently with complex models: convergence occurs faster and more predictably for complex models, mixing is adequatefor all parameters even under very complex models, and the parameter update cycle is virtually unaffected by modelpartitioning across sites. Morphology contributed only 5% of the characters in the data set but nevertheless inﬂuencedthe combineddata tree, supporting the utility of morphological data in multigene analyses. We used Bayesian criteria(Bayes factors) to show that process heterogeneity across data partitions is a signiﬁcant model component, although not asimportantasamongsiteratevariation.Morecomplexevolutionarymodelsareassociatedwithmoretopologicaluncertaintyandlessconﬂictbetweenmorphologyandmolecules.Bayesfactorssometimesfavorsimplermodelsoverconsiderablymoreparameterrich models, but the best model overall is also the most complex and Bayes factors do not support exclusion of apparently weak parameters from this model. Thus, Bayes factors appear to be useful for selecting among complex models, but it is still unclear whether their use strikes a reasonable balance between model complexity and error in parameterestimates. [Bayes factors; Bayesian analysis; combined data; Cynipidae; gall wasps; MCMC; model heterogeneity; modelselection.]
Increasingly, phylogenetic problems are being addressed using data from several different sources:morphology and molecules, DNA and protein, mitochondrial and nuclear genes, coding and noncodingsequences. Previously, it has been common to addresssuch mixed data sets using the parsimony method.Whereparametricmethodshavebeenapplied,theyhavetypically excluded some data (such as morphology) becauseofalackofappropriatestochasticmodels,andtheyhave often ignored obvious heterogeneity across datapartitions because of the computational complexity of the maximum likelihood (ML) approach (for exceptions,see Yang, 1996b; DeBry, 1999; Pupko et al., 2002; Thorneand Kishino, 2002).The recent development of Bayesian inference of phylogeny using Markov chain Monte Carlo (MCMC) estimation of posterior probability distributions has madeit easier to address complex, parameterrich stochastic models within a statistical framework, opening upthe possibility for combined data analysis recognizingamongpartition heterogeneity in data source and inproperties of the evolutionary process. Recent stochastic models developed for new types of data, such asmorphology (Lewis, 2001a; Ronquist and Huelsenbeck,in prep.), now make it possible to include virtuallyany kind of character used today to infer phylogenyin such analyses, and the computational efﬁciency of the Bayesian MCMC approach allows each data partition to be treated using more realistic evolutionary models. However, combined statistical analysis using Bayesian MCMC techniques introduces a wholerange of questions that have not been addressed previously, while providing a new perspective on others. Here, we describe a Bayesian MCMC approachto combined data analysis, using empirical resultsfrom one combined data set to address some of thesequestions.
Bayesian MCMC Approach to Combined Data
Bayesian phylogenetic inference based on heterogeneousdataisastraightforwardextensionofthemethodsalready described for homogeneous data (see recent reviews by Huelsenbeck et al., 2001; Lewis, 2001b; Holderand Lewis, 2003). Assume that the data set
X
consistsof two distinct partitions
X
a
and
X
b
and allow the substitution model parameters,
θ
a
and
θ
b
, respectively, to becompletelydifferentforthetwopartitions.Inthemodelswe explored, we further assumed that the two data subsets evolve on the same topology,
τ
, with the same set of branch lengths,
ν
, but that the overall rate differs acrosspartitions according to a rate multiplier, denoted
m
a
and
m
b
forthetwopartitions.Inotherwords,effectivebranchlengths are potentially different but proportional acrosspartitions, as in the ML model proposed by Yang (1996;note that Yang used
c
instead of
m
for the multiplier).Using Bayes’s rule (see for instance Huelsenbeck et al.,2001), the joint posterior probability distribution for thismodel becomes
f
(
τ
,
ν
,
θ
a
,
θ
b
, m
a
, m
b

X
)
=
f
(
τ
,
ν
,
θ
a
,
θ
b
, m
a
, m
b
)
f
(
X

τ
,
ν
,
θ
a
,
θ
b
, m
a
, m
b
)
f
(
X
)
,
47
b y g u e s t onF e b r u a r y1 8 ,2 0 1 6 h t t p : / / s y s b i o . oxf or d j o ur n a l s . or g / D o wnl o a d e d f r om
48
SYSTEMATIC BIOLOGY VOL.
53where
f
(
τ
,
ν
,
θ
a
,
θ
b
, m
a
, m
b
)isthepriorprobabilityofthemodelparameters,
f
(
X

τ
,
ν
,
θ
a
,
θ
b
, m
a
, m
b
)istheproba bilityofthedatagivenmodelparameters(thelikelihoodfunction), and
f(X)
is the model likelihood (also calledthe integrated or predictive likelihood), which is a multidimensional sum and integral of the probability of thedata over all parameter values.The posterior probability distribution, which is thecentral quantity in Bayesian inference, is typically estimated using MCMC techniques instead of being derived analytically. The procedure is started with an arbitrary set of parameter values. In each cycle (generation)of the Markov chain, one parameter or a block of parameters is updated using a stochastic proposal mechanism.The most common mechanism used in Bayesian phylogenetic inference, Metropolis sampling, involves theproposal of a new state based on an arbitrary proposaldistribution,
q
, and then acceptance of this state with aprobabilitydeterminedbytheproductofthreeratios:theprior ratio, the likelihood ratio, and the proposal ratio.Assume, for instance, that we wish to update the substitution model parameters for partition
a
from
θ
a
to
θ
∗
a
.The acceptance probability
r
would then become
r
=
min
1
, f
(
θ
∗
a
)
f
(
θ
a
)
×
f
(
X
a

τ
,
v
, m
a
,
θ
∗
a
)
f
(
X
a

τ
,
v
, m
a
,
θ
a
)
×
q
(
θ
a

θ
∗
a
)
q
(
θ
∗
a

θ
a
)
.
When updating a homogeneous model or a parameter shared across all partitions, the calculation of thelikelihood ratio (the second ratio in the product) alwaysinvolves the entire data set. However, updating a partitioned parameter only requires consideration of the affected data partition,
X
a
in this case. The calculation of the likelihood ratio is by far the most computationallycomplex operation in MCMC analysis, and the speed of the calculation is roughly proportional to the size of thedata set. Thus, the increase in the number of parametersin a partitioned model over that in a similar homogeneousmodelislargelyoffsetbythespeedgainedineachcycle of the chain. The net result is that the time requiredfor updating all model parameters a given number of times will remain roughly constant regardless of modelpartitioning. However, more complex models will of course have more dimensions in their parameter spaces,which might cause difﬁculties for the MCMC samplingprocedure.
Convergence and Mixing
Theory predicts that a properly constructed Markovchain, if run long enough, will produce a valid sample from the posterior probability distribution (Tierney,1994).However,thegreatestpracticalprobleminMCMCanalysis is to determine when the chain is sufﬁcientlyclose to its target distribution (the posterior distributionof interest) for the samples to provide a good approximation of this distribution. One of the most powerfulapproaches used to address this question is comparison of the results from independent runs started fromdifferent points in parameter space. In the phylogeneticcontext,weexpectintegrationovertopologytobeparticularly difﬁcult; therefore, starting the independent runsfrom different, randomly chosen topologies should provide a good test of whether the chains are providingvalidsamplesfromtheposteriorprobabilitydistribution(Huelsenbeck et al., 2002).Itisusefultodistinguishtwopotentialsourcesofproblems with MCMC estimation of a target distribution:convergence and mixing. The difference between themis best explained if we consider a posterior distributionwith two separate regions, each containing roughly half of the total probability. Typically, a MCMC run startssampling from a region with extremely low posteriorprobability because starting values are set arbitrarily orchosen randomly. When the chain has settled into thehighdensity regions of the distribution, it can be said tohave converged, and the overall likelihood will tend tovary less than during the initial burnin period. However, we still do not know how long it will take thechain to adequately sample both regions of high density in the posterior distribution; this is determined bythe mixing behavior of the chain. The slower the mixing, the longer it will take the chain to move from one tothe other of the highdensity regions. Whereas the generation plot of the overall likelihood gives a preliminaryidea of whether convergence might have occurred, assessment of the mixing behavior requires examinationof the plots of all model parameters. This is particularlytruewhenMetropoliscouplingisused,becausethistechniqueallowsthechaintojumpbetweendifferentregionsinparameterspacewithlittleeffectonoveralllikelihood(Huelsenbeck et al., 2001).
Bayesian Model Selection
Analyzing combined data using Bayesian MCMCmethods allows us to specify partitionspeciﬁc substitution models. As more partitions are being considered,the complexity of the joint model increases as does thecomplexity of the issue of model selection. One strategyfor model selection for Bayesian MCMC analysis is to ﬁta substitution model to each partition prior to the analysis using, for example, a hierarchical likelihoodratiotest(hLRT;HuelsenbeckandCrandall,1997;PosadaandCrandall, 2001), the Akaike information criterion (AIC;Akaike,1973),ortheBayesianinformationcriterion(BIC;Schwartz, 1978), all of which are based on ML estimates.The Markov chain is then run using a composite ‘supermodel’ that consists of several submodels.It is not selfevident, however, that such an approachwill necessarily lead to an optimal composite model.Most importantly, the selection of an optimal model forone partition should not ignore information from otherpartitions. For example, the methods mentioned abovedepend on point estimates of the topology and other parameters, and it is well known that different topologiesmightrankmodelsdifferently(SandersonandKim,2000;Posada and Crandall, 2001). Thus, selecting an optimalmodel for each partition separately, on the best tree implied by the data from that partition, might result in a
b y g u e s t onF e b r u a r y1 8 ,2 0 1 6 h t t p : / / s y s b i o . oxf or d j o ur n a l s . or g / D o wnl o a d e d f r om
2004
NYLANDER ET AL.—BAYESIAN COMBINED ANALYSIS
49combination of models that could not be optimal on thesame topology. Furthermore, considering each partitionseparately may result in overparameterization, becausesuch an approach makes it difﬁcult to discover when itis appropriate for two partitions to share parameters.Unfortunately, computational problems make it difﬁcult to apply directly the methods described aboveto parameterrich partitioned models. Furthermore,Bayesian statisticians often object in general to modeltesting based on point estimates, because such methodsare not taking the uncertainty of the topology and otherparameters into account. The argument is that a modelwith substantial posterior probability for a large rangeof parameter values could have a higher marginal (total)likelihood than a model with a narrow peak in its likelihood, even though the latter model may have the highest ML value. In such situations, Bayesian statisticianshaveargued,itwouldbeunwisetocomparemodelsonly based on the merits of a single point; instead, we shouldconsidertheentireparameterspaceandpreferthemodelwith the largest total likelihood (Bollback, 2002; HolderandLewis,2003).AnadditionalproblemwiththeMLapproach is that it favors the more parameterrich modelin comparisons of nested models unless the parameterrich models are penalized as in AIC or BIC. That is, thefavored model might contain parameters that have littleornoexplanatoryvalue(BurnhamandAnderson,2002).The Bayesian approach does not always favor the moreparameterrich of two nested models; on the contrary,there is some concern that Bayesian methods may, under some circumstances, put too much emphasis on thesimpler model. This phenomenon is known as Lindley’sparadox, and it can occur with large data sets when theestimate from the complex model is close to the simplemodel (Bartlett, 1957; Lindley, 1957).Becauseoftheproblemswiththelikelihoodapproach,weexploredBayesianmodelcomparisonbasedonBayesfactors. Assume that we wish to compare how well twomodels,
M
0
and
M
1
, describe the processes generating adata set
X
. The Bayes factor in favor of model 1 overmodel 0,
B
10
, is calculated as the ratio of the modellikelihoods
f
(
X

M
i
):
B
10
=
f
(
X

M
1
)
f
(
X

M
0
)
.
The model likelihoods,
f
(
X

M
i
), are the same as the
f
(
X
) denominator of Bayes’s rule; the conditioning ona model is implicit in the latter.The Bayes factor can be interpreted as the posterioroddsofmodel1tomodel0inaBayesianinferenceproblem where we start with equal probability of the twomodels being true (Kass and Raftery, 1995; Wasserman,2000). Alternatively, the Bayes factor can be viewed simply as a comparison of the predictive likelihoods of themodels (Gelfand and Dey, 1994; Kass and Raftery, 1995;Wasserman, 2000) or a comparison of the ability of themodelstoupdatethepriors(LavineandSchervish,1999;Wasserman,2000).Boththelattercomparisonswouldbe
T
ABLE
1. Interpretation of the Bayes factor (
B
10
) (taken from Kassand Raftery, 1995).
2 log
e
(
B
10
)
B
10
Evidence against
M
0
0 to 2 1 to 3 not worth more thana bare mention2 to 6 3 to 20 positive6 to 10 20 to 150 strong
>
10
>
150 very strong
valid even, although strictly speaking none of the models is likely to be an exact (true) description of the process under study. The Bayes factor comparison can beapplied to any set of models, regardless of whether theyare nested or not (as can AIC and BIC but not hLRT),and it is based on integration over the uncertainty in allparameter values rather than on ML point estimates (asopposed to AIC, BIC, and hLRT).The Bayes factor is not used in a normal statistical testof whether a hypothesis should be rejected or acceptedgiven some subjective cutoff value. Instead, the Bayesfactor evaluates the relative merits of competing models, and the interpretation is left to the scientist. Jeffreys(1961) srcinally provided some guidelines for this interpretation, which have been modiﬁed by other workers. We use a version srcinally presented by Kass andRaftery (1995) (Table 1).
Questions Regarding Combined Phylogenetic Analysis
We applied combined Bayesian MCMC analysis to anempirical data set consisting of morphological and nucleotide data for 32 exemplar species of gall wasps (Hymenoptera: Cynipidae) and outgroups. The exemplarsspan the entire diversity of the family and include phytophagous guests in galls (inquilines) and gall inducerson a variety of both herbaceous and woody host plants(Table 2; Ronquist, 1999).The morphological data consisted of 166 characters,which have previously been shown to partly resolvethe phylogeny with strong support values using parsimony methods (Liljeblad and Ronquist, 1998). The nucleotide data are almost entirely srcinal to this studyand consisted of a total of 3,080 aligned base pairs (bp)fromfourgenes:twonuclearproteincodinggenes(elongation factor 1
α
F1 copy [EF1
α
] and longwavelengthopsin [LWRh]), one mitochondrial proteincoding gene(cytochrome oxidase
c
subunit I [COI]), and nuclear 28Sribosomal DNA (rDNA). We analyzed the data using arange of models of varying complexity (dimensionality)and explored the following questions.
What is the relationship between model complexity andcomputational complexity?—
It is difﬁcult to predict howMCMC estimation of the posterior probability distribution is affected by an increase in model complexity. Thechain can be updated faster in those generations wheremodel parameters affecting only some of the partitionsare changed; however, more parameters also means thateach parameter will be visited more rarely. More parameters will also affect the complexity and the shape of theposterior distribution, which might slow convergence
b y g u e s t onF e b r u a r y1 8 ,2 0 1 6 h t t p : / / s y s b i o . oxf or d j o ur n a l s . or g / D o wnl o a d e d f r om
50
SYSTEMATIC BIOLOGY VOL.
53
T
ABLE
2. Taxa of gall wasps (Cynipidae) and outgroups (Figitidae, Liopteridae, Ibaliidae) used in the analysis. Brief biological data are givenfor each exemplar genus. GenBank accession numbers are given for all sequences; a dash indicates missing data.
GenBank nos.Taxon Morphology
a
Host plant
b
Biology
c
COI 28S EF1
α
LWRh
CynipidaeSynergini
Synergus crassicornis Quercus
(Fg) inquiline AY368909 AY368936 AY368962 AY371051
Ceroptres cerri C. clavicornis Quercus
(Fg) inquiline AY368910 AY368935 — AY371052
Periclistus brandtii Rosa
(Ro) inquiline AF395181 AF395152 AF395173 AF395189
Synophromorpha sylvestris S. rubi Rubus
(Ro) inquiline AY368911 AY368937 AY368961 —“Aylacini”
Xestophanes potentillae Potentilla
(Ro) galler AY368912 AY368938 AY368963 —
Diastrophus turgidus
Rosaceae galler AY368913 AY368939 AY368964 —
Gonaspis potentillae Potentilla
(Ro) galler AY368914 AY368940 AY368965 —
Liposthenes glechomae Glechoma
(La) galler AY368915 AY368941 AY368966 AY371053
Liposthenes kerneri Nepeta
(La) galler AY368916 AY368942 AY368967 AY371054
Antistrophus silphii A. pisum
Asteraceae galler AY368917 AY368943 AY368968 AY371055
Rhodus oriundus Salvia
(La) galler AY368918 AY368944 AY368969 AY371056
Hedickiana levantina Salvia
(La) galler AY368919 AY368945 AY368970 AY371057
Neaylax verbenaca N. salviae Salvia
(La) galler AY368920 AY368946 AY368971 AY371058
Isocolus rogenhoferi
Asteraceae galler AY368921 AY368947 AY368972 AY371059
Aulacidea tragopogonis
Asteraceae galler AY368922 AY368948 AY368973 AY371060
Panteliella bicolor P. fedtschenkoi Phlomis
(La) galler AF395180 AF395153 AF395172 AF395188
Barbotinia oraniensis Papaver
(Pa) galler AF395179 AF395150 AF395171 AF395187
Aylax papaveris Papaver
(Pa) galler AY368923 AY368949 AY368974 AY371061
Iraella luteipes Papaver
(Pa) galler AY368924 AY368950 AY368975 —
Timaspis phoenixopodos
Asteraceae galler AY368925 AY368951 AY368976 AY371062
Phanacis hypochoeridis
Asteraceae galler AY368926 AY368952 AY368977 —
Phanacis centaureae
Asteraceae galler AY368927 AY368953 AY368978 —Eschatocerini
Eschatocerus acaciae Acacia
(Fb) galler AY368928 AY368954 AY368979 AY371063Diplolepidini
Diplolepis rosae Rosa
(Ro) galler AF395174 AF395157 AF395166 AF395182Pediaspidini
Pediaspis aceris Acer
(Sa) galler AY368929 AY368955 AY368980 AY371064Cynipini
Plagiotrochus quercusilicis
d
Quercus
(Fg) galler AF395178 AF395154 AF395162 AF395186
Andricus kollari A. quercusradicis Quercus
(Fg) galler AF395176 AF395156 AF395168 AF395184
Neuroterus numismalis Quercus
(Fg) galler AY368930 AY368956 AY368981 —
Biorhiza pallida Quercus
(Fg) galler AY368931 AY368957 AY368982 AY371065Figitidae
Parnips nigripes
— parasitoid AY368932 AY368958 AY368983 AY371066Liopteridae
Paramblynotus virginianus P. zonatus
— parasitoid AY368933 AY368959 AY368984 —Ibaliidae
Ibalia ruﬁpes
— parasitoid AY368934 AY368960 AY368985 —
a
Species coded for morphology if different from the species sequenced.
b
Genus or family of host plant attacked by the exemplar genus if phytophagous. A few rarely used host plants have been omitted; see Ronquist and Liljeblad(2001) for more information. If all members of the genus attack the same hostplant genus, then the family to which that genus belongs is indicated in brackets: Fb
=
Fabaceae; Fg
=
Fagaceae; La
=
Lamiaceae; Pa
=
Papaveraceae; Ro
=
Rosaceae; Sa
=
Sapindaceae.
c
Cynipidae are either inquilines (phytophagous guests) in galls or gall inducers. The outgroups are endoparasitoids attacking various insect larvae.
d
Species name recently designated a senior synonym of
P. fusifex
.
andmixing.However,morerealisticmodelsmayleadtoposterior distributions that are easier to traverse usingMCMC, despite the increase in the number of parameters.Weexaminedthecomputationalspeed,timetoconvergence, and mixing over the entire range of models toexamine these questions empirically.
Do morphological data inﬂuence multigene analyses?—
Morphological data are potentially important in phylogenetic inference for many reasons. For instance,morphological characters are crucial in placing fossils inphylogenies and thus in dating branching events. However, the ability to combine morphological and molecular data in a single analysis is particularly important if itcan be shown that morphology has signiﬁcant inﬂuenceon the phylogenetic estimate even when combined withmultigene data sets. This question has remained largelyunexplored with parametric methods, because only recently were stochastic models seriously considered formorphologicaldata(Lewis,2001a).WeusedanextendedversionofLewis’smodels(RonquistandHuelsenbeck,inprep.)inevaluatingwhetherthe166morphologicalcharacters in our data set signiﬁcantly affected the phylogenetic estimate when combined with the 3,080 nucleotidecharacters from the four different genes.
Are composite models better?—
When it becomes possible to analyze partitioned models easily, an obvious
b y g u e s t onF e b r u a r y1 8 ,2 0 1 6 h t t p : / / s y s b i o . oxf or d j o ur n a l s . or g / D o wnl o a d e d f r om
2004
NYLANDER ET AL.—BAYESIAN COMBINED ANALYSIS
51question is how important it is to recognize acrosspartition heterogeneity in evolutionary processes. To examinethisquestion,weusedBayesfactorcomparisonstolook at the increase in model likelihood associated withtheintroductionofdifferentmodelcomponentsaccountingforwithinpartitionoracrosspartitionheterogeneityin the molecular portion of the data set.
Are complex models associated with increased variance of topologyestimates?—
Complexmodelsaregenerallyassociated with more error variance in parameter estimates.If the error variance is excessive, it becomes a problemknownasoverparameterizationoroverﬁtting(Burnhamand Anderson, 2002). However, overly simple modelscan also be problematic. In particular, oversimpliﬁedevolutionary models might lead to dramatically lowered topological variance and exaggerated clade proba bility values in Bayesian phylogenetic inference (Suzukiet al., 2002). To examine the relationship between modelcomplexity and the precision of parameter estimates,we compared topology and treelength estimates acrossmodels.Wealsolookedattheeffectofmodelcomplexityontheconﬂictbetweenthemorphologicalandmolecularpartitions.
Is the Bayesian MCMC approach sensitive to the inclusion of superﬂuous parameters in a complex model?—
It may be difﬁcult to design complex models that adequatelyexplain a process under study without including oneor a few parameters that are superﬂuous in the sensethat (1) the data are not powerful enough to signiﬁcantly alter their prior probability distribution or (2) theposterior probability distribution coincides with a lessparameterrich submodel. Such “superﬂuous” parametersmightcauseproblemswithMCMCestimationoftheposterior distribution. We searched the posterior distri butions of more complex models for such parametersto see whether they were present and, if so, whetherthere was any apparent effect on convergence or on theposterior distributions of other model parameters. If theBayesianMCMCapproachweresensitivetosuperﬂuousparameters, it might be difﬁcult to design appropriatecomposite models that would result in successful com bined analysis.
Do Bayes factors strike a reasonable balance between modelcomplexity and error variance?—
The ability to allow heterogeneity across data partitions in model parametersopens up a Pandora’s Box of model choice problems,which are difﬁcult to address without good model selection criteria and procedures. Standard likelihood ratio tests have a tendency to prefer complex models(Gelfand and Day, 1994; Burnham and Anderson, 2002)and various procedures have been developed to punishparameterrichmodels(Akaike,1973;Schwartz,1978).Intheory, the Bayes factor comparison does not suffer fromthisproblem;asimplemodelcanbefavoredoveramoreparameterrich model even if the models are nested. WelookedforinstancesofsimplemodelswinningovermorecomplexonesandcaseswheretheBayesfactorwouldfavormodelreductionbysupportingtheexclusionofweakparameters.M
ATERIALS AND
M
ETHODS
Data
WeassembledDNAandmorphologicaldatafor29gallwasp exemplars and three outgroup exemplars, the latter representing the families Figitidae, Liopteridae, andIbaliidae (Table 2). Previous phylogenetic analyses indicate that Figitidae is the sister group to Cynipidaeand that the Liopteridae and Ibaliidae are successivelymore distant outgroups (Ronquist, 1999). The gall waspsample included representatives of all described tribesof the only extant subfamily. All major wasp genera of phytophagous guests in cynipid galls, also known asinquilines, were represented except for the genus
Saphonecrus
, which is considered close to if not embeddedwithin
Synergus
(Ronquist, 1994, 1999; NievesAldrey,2001; Ronquist and Liljeblad, 2001). A broad selectionof gall inducers attacking herbaceous and woody hostplants was also included. At least half the describedgenera were included for all tribes except the Cynipini,or the oak gallers. This tribe, comprising more than 40described genera, was represented by only four genera but is widely thought to be monophyletic (Kinsey, 1920;Askew, 1984; Ronquist, 1994, 1999; Liljeblad and Ronquist,1998;NievesAldrey,2001;RonquistandLiljeblad,2001; Stone et al., 2002).The morphological data were taken from Liljebladand Ronquist (1998) and consist of 166 parsimonyinformativediscretecharacters:164externalmorphological characters and two ecological characters (alterationof sexual/asexual generations, and hostplant choice)(Liljeblad and Ronquist, 1998: appendix 1). Some multistate characters were treated as ordered and othersas unordered, as speciﬁed by Liljeblad and Ronquist(1998).As far as possible, DNA data were collected from thesame species for which we had morphological data. Ina few cases, an exact match could not be obtained, butDNAsequenceswereobtained,fromacloserelativeandthese taxa were combined into a single terminal in the ﬁnal analyses (Table 2). We sequenced parts of four genes:COI (1,078 bp), the nuclear proteincoding genes LWRh(481 bp) and EF1
α
, (367 bp), and the nuclear 28S rDNA(1,154 bp) (GenBank accession numbers in Table 2). Details of the DNA ampliﬁcation protocols and primerswere given by Rokas et al. (2002). The proteincodinggenes (COI, LWRh, and EF1
α
) were easily aligned byeye. The ribosomal sequences (28S) differed in length,and some of the more variable regions were difﬁcultto align manually. We used ClustalW 1.81 (Thompsonet al., 1994) for this alignment. We applied a range of costs for the gap opening and gap extension penalties, and the individual alignments were subjected toparsimony bootstrap (Felsenstein, 1985) analyses usingPAUP
∗
(Swofford, 1998). Supported groups were largelycongruent among the resulting trees. The alignment resultingfromtheuseofthedefaultsettingsinClustalWisavailable from TreeBase (http://www.treebase.org, accession S970).
b y g u e s t onF e b r u a r y1 8 ,2 0 1 6 h t t p : / / s y s b i o . oxf or d j o ur n a l s . or g / D o wnl o a d e d f r om