Documents

biologia.pdf

Description
Articulo Bio-Geología.
Categories
Published
of 11
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Development 1994 Supplement, 15-25 (1994)Printed in Great Britain @ The Company of Biologists Limited 1994 Can the Cambrian explosion be inferred through molecular phylogeny? Herv6 Philippe, Anne Chenuil* and Andr6 Adouttef Laboratoire de Biologie Cellulaire 4, URA 1134 CNRs-Batiment 444, Universit6 Paris-Sud, 91405 ORSAY-CEDEX, France fAuthor for correspondence*Present address: Laboratoire G6nome et Populations, Universit6 Montpellier 2, Case 063, 34095 Montpellier cedex 05, France 15 SUMMARYMost of the major invertebrate phyla appear in the fossil record during a relatively short time interval, not exceeding 20 million years (Myr), 540-520 Myr ago. This rapid diversification is known as the 'Cambrian explosion'. In the present paper, we ask whether molecular phyloge- netic reconstruction provides confirmation for such an evo- lutionary burst. The expectation is that the molecular phy-logenetic trees should take the form of a large unresolved multifurcation of the various animal lineages. Complete 18S rRNA sequences of 69 extant representatives of 15 animal phyla were obtained from data banks. After elimi- nating a major source of artefact leading to lack of resolu- tion in phylogenetic trees (mutational saturation of sequences), we indeed observe that the major lines of triploblast coelomates (arthropods, molluscs, echinoderhsr chordates...) are very poorly resolved i.e. the nodes defining the various clades are not supported by high bootstrap values. Using a previously developed procedure consisting of calculating bootstrap proportions of each node of the tree as a function of increasing amount of nucleotides (Lecointre, G., Philippe, H. Le, H. L. V. and Le Guyader,H. (1994) Mol. Phyl. Evol., in press) we obtain a more infor- mative indication of the robustness of each node. In addition, this procedure allows us to estimate the numberof additional nucleotides that would be required to resolve confidently the currently uncertain nodesl this number turns out to be extremely high and experimentally unfea- sible. We then take this approach one step further: using parameters derived from the above analysis, assuming a molecular clock and using palaeontological dates for cali- bration, we establish a relationship between the number of sites contained in a given data set and the time interval thatthis data set can confidently resolve (with 95Vo bootstrapsupport). Under these assumptions, the presently available 18S rRNA database cannot confidently resolve cladogeneticevents separated by less than about 40 Myr. Thus, at the present time, the potential resolution by the palaeontolog- ical approach is higher than that by the molecular one. key words: evolution, metazoa, rRNA INTRODUCTION The notion that most lines of metazoans appear rather abruptlyin the fossil record, during the Cambrian, unpreceeded by iden- tifiable forerunners, dates back to the 19th century. In fact, thisobservation was a point of serious concern to Darwin who devoted several pages in 'The srcin of species' (1859) todiscuss the possible reasons for the absence of identifiable Pre- cambrian fossils. The notion of a 'Cambrian explosion' ofanimal life is now amply documented and is particularly striking in the richness of the Burgess Shale fauna, which dates from the mid-Cambrian (approx. 520 Myr ago) but also in its slightly earlier (530-535 Myr ago) close strata of Sirius Passet (Greenland) and Chengjiang (China). Briefly, the present view of the palaeontological evidence, as summarised by Conway Morris (1993 and this volume; see also Bowring et al., 1993),, recognises three early episodes of animal life: (i) the pre- Cambrian Ediacarafauna (570-555 Myr ago), probably mostly of diploblastic 'grade' of organisation, and which may be unrelated to (ii) the later Cambrian fauna, largely dominated by triploblasts, which appears at the base of the Cambrian (approx. 540 Myr ago) and then radiates explosively during (iii) the third episode, yielding representatives of most of the 35 major metazoan phyla within an interval of probably less than 20 million years. Thus, the major types of body plans of metazoans may have srcinated during a relatively short timeinterval.These observations are of considerable interest in under-standing the mechanisms of large scale evolution and the role that developmental innovations may have had in shaping animal diversity. It is therefore important that they be sub- stantiated by independent lines of evidence. The purpose of the present paper is to inquire as to whether phylogenetic analysis of gene sequences from extant metazoans might provide con- firmation for the occuffence of such an evolutionary burst of triploblasts. The central argument runs as follows: if the splitbetween the various animal phyla took place within a short time interval as compared to the length of time elapsed since its occurrence (say 10 Myr as compared to 500 Myr), one expects that determination of the order of emergence of the various lineages using sequence data will be almost imposs- ible, i.e. that the various animal phyla will emerge as an unre- solved 'bush' in the molecular phylogeny. The basic reason forthis expectation is that the molecular events allowing one to  16 H. Philippe, A. Chenuil and A. Adoutte establish the order of emergence of the various clades on a tree are the mutations that occur on the 'internal branches' of the tree, in between the points of emergence of the clades under analysis: these are the synapomorphies (shared derived char- acters) uniting the successive clades into a series of nested groups. The longer the time interval between two cladogeneticevents the higher the probability that mutations will have accu- mulated within the coffesponding branch in the tree and therefore the clearer the kinship of the taxa located after the branch will be. The idea, then, is to reverse the argument and to assume that if we cannot satisfactorily resolve a multifurcation in a molecular phylogeny, it is because the time interval separating the emergence of the various clades involved has been too short with respect to the time elapsed since the event; thus, unresolved nodes in a molecular phylogeny would be inter-preted as coffesponding to an evolutionary radiation. This is indeed what has repeatedly been observed in the molecular phylogeny of Metazoa. However, it should be realised at theonset that the use of this argument rests on two essenti al para- meters, a true historical one, the duration of time separating two cladogenesis events, and a methodological one, namelyour ability, through our tree construction methods, to discrim- inate 'well resolved'-nodes from 'unresolved' ones. From the start, then, we see that this approach to the question of the Cambrian radiation is intimately linked to the problem of eval- uating the reliability of nodes in molecular phylogenies. Theshort history of the molecular phylogeny of Metazoa provides a good illustration of how these evaluations started in a rather intuitive and non-quantitative way to become increasinglyrigorous.Starting with the pioneering study of Field et al. (1988) for example, which was based on partial 18S ribosomal RNA sequences analysed by a distance method, the authors stressed that the order of emergence of the four major groups of coelo-mates they analysed, Chordata, Echinodermata, Arthropodaand a set of 'eucoelomate protostomes' could not be confi- dently resolved and suggested that this reflected a rapid phyletic splitting, i.e. a rapid radiation of all coelomate phyla. Their arguments were that the internal branches separating the points of emergence of the various taxa were short and, more importantly that the topology was unstable i.e. that it changed depending on the actual species sampled. These data werereanalysed by Patterson (1989) using a variety of methods, inparticular parsimony, and by Lake (1990) using his evolu- tionary parsimony method (see Erwin, I99I for a review of these papers). Contradictions and uncertainties in the resultssuggested both that a rapid radiation of eucoelomates may indeed have occurred and that the data were 'noisy'. 28S rRNA partial sequences of a broad sample of invertebrate' species covering ten triploblastic coelomate phyla, three pseudo-coelomates ones and one acoelomate were also obtained and analysed by one of us (Chenuil, 1993) with very similar results. Later, two of us (Adoutte and Philippe, 1993) reanalysed a broader 18S rRNA partial sequence data set, using parsimony methods and bootstrap testing and clearly confirmed three points: (1) diploblasts were deeply split from triploblasts (as had been inferred by Raff et al. (1989) and by Christen et al. (1991); (2) platyhelminths (acoelomate triploblasts) were thesister group of coelomates and (3) the major coelomate phylawere very poorly resolved. A 'grant' multifurcation, compris-ing annelids, arthropods, molluscs, echinoderffis, chordates and many more minor groups provided a fair representation of the results. Even the separation into the two major lineages ofcoelomates, protostomes and deuterostomes, althoughapparent in the tree, was not supported by significant bootstrapvalues. We suggested at that time that the latter point could bea reflection of the Cambrian explosion. In the past ten years, much use has been made of the bootstrap value (or bootstrap proportion, BP) to estimate the reliability of nodes in phylogenetic trees (Felsenstein, 1985). The BP (the number of times a given node is obtained over the total number of nucleotide resamplings carried out) is indeed a convenient value to estimate the strength of the phylogenetic signal, within the framework of a given tree reconstruction method: values above 957o indicate that the data contain a strong signal in favour of this node while low values indicatethat the node is poorly supported and thus in fact may not exist (see Zharkrk'h and L| l992a,b; Hillis and Bull, 1993 and Felsenstein and Kishino, 1993 for recent detailed analyses ofthe significance and statistical properties of the bootstrap). Thus, an expectation of a radiation process is that bootstrap proportions should be low in all the internal branches sur- rounding the radiation point in the tree. A corollary of this criterion is the instability of the node: when poorly supported by the bootstrap, nodes often display instability in the face of variations in the length of sequence analysed and, more significantly, in the face of modifications of species sampling (as systematically analysed in Lecointre et71., 1993). That point is strikingly illustrated in the paper byAdoutte and Philippe ( 1 993) where the addition of a single newspecies to the tree transforms the Metazoa from monophyletic (diploblasts + triploblasts) to biphyletic (diploblasts on one branch and triploblasts on another), in agreement with the lowBP (607o) of the coffesponding node. This provided an indi- cation of the difficulty of solving this question and was inter- preted as confirming the depth of the split between diploblastsand triploblasts. In the present work, we have carried out a new analysis of an 'edited' database of 18S rRNA, eliminating fast evolving species and analysing the significance of the nodes in the trees by a procedure recently developed within our group. Thisinvolves calculating not only the bootstrap proportion at each node on a tree, but also determining how this value changes when different lengths of sequen ce are included in the analysis (Lecointre et al. , 1994). This allows us to obtain a curve of the BP as a function of the number of nucleotides used, which is more informative than the mere BP value based simply on a single sequence length. In particular, this procedure allows one to estimate the number of additional nucleotides that would be required to transform a 'moderately supported' node into a strongly supported one. This is then combined with palaeon- tological data to establish a rough relationship between the length of time separating two cladogenesis events, the length of the corresponding branch and the amount of sequence information required to support it in a statistically significant way. When applied to the rRNA dataset of Metazoa, this provides an estimate of the sequencing effort that would be required to resolve closely spaced branching points in the phylogeny, an effort that turns out to be enormous and unreal- istic in most cases.  MATERIAL AND METHODS Sequences used Only species for which the full 18S rRNA sequences is available have been used in the present study. As of December 1993, this coffe- sponded to 69 species of Metazoa in the EMBL and GenBank data banks. All the sequences were handled and further analysed throughthe MUST package (Philippe, 1993), and are available upon request. Alignment and tree construction The sequences were aligned manually using the editing functions of the MUST package (Philippe, 1993). Only confidently aligneddomains were used, using stringent criteria to eliminate all doubtfulportions. The boundaries of the domains thus selected are as follows, using mouse 18S rRNA nucleotide numbering as a reference: 82-125, r37 -180, 187 -r94, 208-243, 289-307 , 3rr-539, 548-689, 798-834, 841- trrz, lr22-r407, 1440-1551, 1558- 1737 . For the 69 species, this yielded a total of 1615 aligned sites of which 1010 were variable and 690 informative under the parsimonycriterion. When fast-evolving species are eliminated (see Results), 55 species remain, yielding I47 4 aligned sites of which 708 are variable and 486 informative. Trees were constructed using the Neighbour Joining method(Saitou and Nei, 1987) and were submitted to bootstrapping (Felsen- stein, 1985) using the NJBOOT program of the MUST package set at 1000 resamplings. All the bootstrap calculations were carried out on a Sun-Sparc 10 computer. Calculation and display of the 'pattern of resolved nodes' (PRN)The method described by Lecointre et al. ( 1994) was used through-out, under the following conditions. The full aligned sequences of the 55 species were each submitted to random sampling of a givennumber of sites (a Jack-knifing of sites') through the use of a new program, PRN, running on UNIX platforms. Thirteen different sequence lengths were chosen (25, 50,75, 100, 150, 200,250,,300, 350, 400, 450, 500 and 600 sites) and for each, 200 samples were drawn. Thus, a total of 2600 subsets of sequence alignments wereobtained each including all 55 species. Each of these subsets was used to construct an NJ tree, which was submitted to 1000 bootstrap repli- cates. All the combinations of species appearing in more than l7o of the replicates were 'stored' in a file (of about 60 Mb). This yields several tens of thousands of nodes. Selection of the nodes was then carried out using the new program AFT-PRN according to the following criteria: the node should coffespond to a BP with an ascending tendency and it should be present in more than 2000 of thesubsets of sequences. This is a rather stringent criterion allowing us to keep only nodes that appear frequently. At a given node, one couldtherefore display graphically the evolution of BP as a function of the number of nucleotides that were used to generate the tree (COMP- BOO program of the MUST package or DISPLAY-PRN under uNrx). Lecointre et al. (1994) have shown that the mean of BP can berelated to the number of nucleotides, x, through the function BP = 100 (1-e-u (x-x')). The parameters b and x'are estimated by non-linearregression using the GENSTAT package. Relationship of 'b' to branch length and to time Branch lengths were directly provided by the NJ program and displayed using the TREEPLOT program. These lengths were plotted as a function of the value of the b parameter for the coffesponding branch. The following palaeontological dates were used to establish the relationship between the time elapsed between two cladogeneses and the b value: 300 Myr between the point of srcin of Pectinidae (as represented by Placopecten) and that of Mactridae (as represented bySpisula; Rice et al., 1993), 50 Myr between the point of srcin of Phylogeny and the Cambrian radiation 17 Table 1. Results of the relative rate test for a selected sample of species M o linifu rmi s mo linifo rmi s Herdmania momus D ro s ophila me lano gas te r Oedignathus inermisEptatretus stoutii Myxine glutinosaPugettia quadridensSagitta elegans Haemonchus contortus Haemonchus placei Haemonchus similis Stron gyloide s ste rc o rali s Aedes albopictusCaenorhabditis ele gans 1.58 19.67 3.48 19 .7 r r.69 20.36 3.52 20.36 1.26 2r.2r r.26 2r.2r 3.99 2r.64 1.97 22.99 2.23 23.47 2.23 23.47 2.23 23.47 2.rr 23.89 3. l0 24.50 2.r0 25.7 4 The relative rate was calculated by establishing the mean number of either the gaps (1st column) or all types of substitutions (2nd column) between thesequences of the species considered and the sequences of 8 diploblast species taken as an outgroup.The values for the complete set of species ranged from 14.6 to 25.74 when all types of substitutions are computed. The species displayed in this table are those at the highest extreme of the distribution, which have all been eliminated, the limit for inclusion in the dataset having been set at 18.3. In the column coffesponding to gaps, there is much greater homogeneity, even for fast evolving species except for a few species that display a number of gaps much higher than that of others (Herdmania momus, Oedignathis inermis and Pugettia quadridens) and which correspond to species whosesequence determination appears to have been problematic. gnathostomes and that of all vertebrates and 300 Myr between the point of srcin of amniotes and that of eutherians (see Benton, 1990). Relative rate test The mean values of the distances between each of the triploblast species and the 8 diploblast ones was computed using either all types of nucleotide differences or only the gaps (Table 1). Saturation curveThe number of inferred substitutions was calculated using the program PAUP 3 (Swofford, 1991), courtesy of H. R6cipon. The cor-responding matrix between all pairs of species was obtained through the TREEPLOT program, and the COMP-MAT program allowed the visualisation of the results. RESULTS Global 185 rRNA Neighbour-Joining tree of Melazoa Fig. 1 shows the Neighbour-Joining (NJ; Saitou and Nei, 1987)tree of all the metazoan species for which complete 185 rRNA sequences were available in data banks as of December 1993. Partial sequences were excluded in order to maximise theamount of information for each species. This database contains representatives of 13 metazoan phyla including the most numerically important ones, with only one unfortunateomission, that of annelids, for which complete sequences are not available. The tree is arbitrarily rooted between diploblasts (Porifera, Ctenophora, Plac ozoa and Cnidaria) and triploblasts to avoid the use of a non-metazoan outgroup (which woulddecrease the length of alignable sequences) and because diploblasts constitute a clear outgroup to triploblasts on thebasis of multiple previous evidence. The tree displayed is that directly obtained by the NJ method, a distance method that does not assume equality of evolutionary rates among branches  18 H. Philippe, A. Chenuiland A. Adoutte and whose efficiency at recovering the actual tree has been shown to be reasonably good (Tateno et al., 1994). It is used here because of its great rapidity in terms of computer time, a critical advantage for the extensive calculations carried out in this paper. The tree displays a number of interesting features and also immediately illustrates one major source of artefact: severalspecies or groups of species display much longer branches, indicating either a two- or three-fold higher rate of evolutionin those sequences or inaccurately determined sequences. Such inequalities are known to generate topological errors in the positioning of the corresponding taxa (Felsen- stein, 1978; Hendy and Penny, 1989; Swofford and Olsen, 1990). A clear example is provided by two insects, Drosophila and Aedes, which emerge as a sister group to nematodes (the latter also all having long branches) and separate from the five other arthropods, which have smaller branches and are more traditionally positioned in the tree. Bothfor Drosophila and for Caenorhab- ditis it is known that the problem lies not in the quality of the sequences but in their rapid rate of evolution, as has been pointed out in several previous papers. In spite of these sources of errors, Fig. I displays a number offeatures that deserve a brief comment. (1) The distance between diploblasts and triploblasts is indeed the largest measured in the tree between high level taxa supporting the validity of the rooting.(2) Except for the nematodes, and an acantho- cephalan (Molinifurmis), all the other major triploblast lineages emerge very close to each other, i.e. separated by very short interval branches and correspondingly low BPs. (3) Nematodes and the acanthocephalan, tra- ditionally grouped within the pseudocoelomates,seem to emerge between the diploblasts and the platyhelminths, i.e. acoelomate triploblasts. This is contrary to the usual view which places pseudocoelomate emergence between acoelo- mates and eucoelomates. However, because of the great inequalities in rates of evolution and the very poor BPs in this portion of the tree, thisresult should not be over emphasised. (4) Contrary to our previous study, based on a parsimony analysis of partial l85 rRNA sequences (Adoutte and Philippe, 1993), the monophyly of coelomates is not stronglysupported (33Vo BP) and the positioning of the platyhelminths as a sister group to coelomates is correspondingly weakened. In summary, the view emerging from this initial tree is one of a large burst of all triploblas-tic metazoans, including platyhelminths. But this view should be considered with caution because of the inequalities in evolutionary rates observed within the tree. In a second step, the relative-rate test (Sarichand Wilson, 1973) was systematically carried out on all the species of the tree (Table 1). The distribution of distances to the outgroup for the 55 remaining species is roughly gaussian and ranges from 14.6 to 18.3 whereas the distances for the fourteen discarded species vary from 19.7 to 25.7, noticeably far from gaussian. All fourteen taxa withtoo high a rate were discarded, yielding the more restricted set appearing in Fig. 2, which was used in all further calcu-lations.When the rapidly evolving species are eliminated from thedatabase, according to this criterion, several discrepancies ofFig. 1 disappear, and the BPs rise (Fig. 2). However, some Microciona prolifera PoriferaCtenophora Placozoa CnidariaAcanthocephala Nematoda Insecta Chaetognatha Platyhelminthes Insecta Arachnida Crustacea Sq,pha cilidta Mnemiopsis leidyr Trichoplu sp. - Anthopleura kuroganeAnemonia sulcataM o linfomis mo linifo mi s - Strongy oide s s t e rc o ralis C aenorhab diti s e I e gans Sagitt\ elegans '7 g I S c histo somT s p indole 3'tL C altc o pho ron c alic ophorum Eurypelma califurnica - Anemia salina Oedignathus inemis Tre.sus nuttali Tresus capaxSpisula subtruncato Spisula solidtssima Spisula solida Mulinia lateralis Mollusca Mactromeris polynyma Ctassostrea virginica - Limicolaria kambeul P lac opecten m0 ge llanic us Argopecten irradians Chlamys islandica Ac anthople ura j ap onic a - Sttchopus .iaponicus ' TichopLtx adhaerens fripedalia cystophoraHaemonchus similis Haemonchus placei Haemonchus contorlus Schistosoma mansoni S c histos omq hae mato b i umTetracerasta bleptd Fasciolopsis buskiEchinostoma caproni Heronimus mollts O p i st ho r c hi s v iv e r rini Tenebrio molitorPugettia quodridensMyxine glatinosa Eptatretus stoutii - Ophioplocu s j ap onic us S t rongy lo c e ntrotus inte me diusAsteriqs amurensisAntedon serrata B alano g los sus c arno s u s Styela plicata Herdwnia momusBranchiostoma floridae EchinodermataHemichordata Urochordata Cephaiochordata Myxinoidea Petromyzon mqrinusIampetra aepyptera - S e b astolo bu s altive lis Fundulus heleroclttusRhinobato s lenti gino sus Xenopus laevis lntimeia chalumnae Vertebrata Notorynchus cepedianus ' Squalus acanthias ' Echinorhinus cookei Oryctolagus cuniculusRattus noruegicusMus musculusHomo sapiens F----1 o.oz Fig. 1. Phylogeny ofMetazoa based on complete l8S rRNA sequences treated by the Neighbour-Joining method. Numbers below internal branches indicate thebootstrap proportion of the corresponding node (1000 resamplings). Note the length of the branches of all the species of Nematoda and of two Insecta (D r o s op hila me lano g as t e r and Ae de s alb op i c tus).
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x