A statistical test of a neutral model using the dynamics of cytonuclear disequilibria

In this paper we use cytonuclear disequilibria to test the neutrality of mtDNA markers. The data considered here involve sample frequencies of cytonuclear genotypes subject to both statistical sampling variation as well as genetic sampling variation.
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  Copyright zyxwvutsrqpon   1996 by the Genetics Society zyxwvutsrqpo f zyxwvutsr merica A Statistical Test of a Neutral Model zy sing the Dynamics of Cytonuclear Disequilibria Susmita Datta,* Mike Kipar~ky,~ avid M. Randt and onathan Arnoldf zy Department of Biostatistics, Rollins School of Public Health, Emory University, Atlanta, Geor ‘a 30322, Department z f Ecology and Evolutionary Biology, Brown University, Providence, mode Island 02912 and gt epartment of Genetics, University of Georgia, Athens, Georgia 30602 Manuscript received February 26, 1996 Accepted for publication September 11, 1996 ABSTRACT In this paper we use cytonuclear disequilibria to test the neutrality of mtDNA markers. The data considered here involve sample frequencies f cytonuclear genotypes subject o both statistical sampling variation as well as genetic sampling variation. First, weobtain the dynamics of the sample cytonuclear disequilibria assuming random drift alone s the source of genetic sampling variation. Next, wedevelop a test statistic using cytonuclear disequilibria via the theory of generalized least squares to test the random drift model. The null distribution f the test statistic is shown to be approximately chi-squared using an asymptotic argument as well as computer simulation. Power of the test statistic is investigated under an alternative model with drift and selection. The method is illustrated using data from cage experiments utilizing different cytonuclear genotypes of Drosophila melunogaster. A program for imple- menting the neutrality test is available upon request. C TONUCLEAR disequilibrium measures provide new inferential tools in analyzing hybrid zone data. In recent years, there has been increasing atten- tion on studying the association or interaction between a nuclear gene or genotype and maternally inherited cytoplasmic components such as mitochondria see LAMB and AVISE 1986; ASMUSSEN et al. 1987; ARNOLD 1993). Questions have arisen concerning whether or not these cytonuclear associations can be explained without invoking natural selection. Since a significant amount of work in the last decade is based on neutrality of mitochondrial DNA (mtDNA) markers, a number of researchers have recently de- signed experiments o test the neutrality of mtDNA markers (CLARK nd LYCKEGAARD 988; MACRAE and ANDERSON 1988; FOS et al. 1990; POLLAK 991; SCRIBNER and AVISE 1994a,b; S. T. KILPATRICK and D. M. RAND unpublished results; HUTTER nd RAND 1995). We have taken a different approach to test the neutrality of a mtDNA marker with measures of association of mtDNA markers with nuclear genes or genotypes termed qyte nuclear disequilibria. Stochastic behavior of allelic disequilibria was ad- dressed by FU and ARNOLD (1992). The dynamics of cytonuclear genotypic disequilibria over different gen- erations have been described by DATTA t al. (1996). Stochastic trajectories of cytonuclear disequilibria cal- culated in Fu and ARNOLD (1992) and DATTA t al. (1996) were used to test the neutrality of mitochondrial Cmesponding author: Jonathan Arnold, Department of Genetics, University of Georgia, Athens, GA 30602. E-mail: Genetics 144: 1985-1992 (December, 1996) DNA markers in a vertebrate age experiment involving two species of mosquito fish (SCRIBNER nd AVISE 994). In this experiment an artificial hybrid zone composed of two competing species of mosquito fish that inter- breed was established. Frequencies of cytonuclear geno- types and associated cytonuclear disequilibria were monitored over time and compared with their expecta- tions under drift. In such an xperiment here re two potential sources of variation in cytonuclear frequencies, statisti- cal sampling variation and genetic sampling ariation (WEIR, 1990). Statistical sampling variation arises from sampling individuals from a population to estimate fre- quencies of cytonuclear genotypes (ASMUSSEN et aZ., 1987). Genetic sampling variation arises from genetic drift, the sampling of gametes from a finite breeding pool of individuals in nature o constitute the next gen- eration (Fu and ARNOLD 1992; DATTA z t al. 1996). In the experiment of SCRIBNER nd AVISE 1994) in every generation, all individuals from an entire population were sampled. As a consequence statistical sampling variation was eliminated, and only genetic sampling variation (i.e., drift) remained. This unusual design to a cage experiment permitted DATTA and ARNOLD (1996) to develop a simple neutrality test of a mtDNA marker using the observed trajectories of cytonuclear disequilibria. It is the purpose of this report to extend the domain of applicability of our neutrality test of an mtDNA marker to include cage experiments with both statistical and genetic sampling variation in cytonuclear disequilibria. In Figure 1, we present a schematic diagram for the design of a more general sampling scheme in a cage  S. zyxwvu atta zyxwvu t al. 986 Generation 0 Generation 1 Generation 2 zyxwvutsr b b b experiment Initial Population zyxwvut   zyxwvutsrqpon   . Population j Sample . __ ____ _. Population i Sample Population FIGURE 1.-Format of experiment. than in DATTA and ARNOLD (1996). We note that a statistical test of neutrality of a single genetic locus was carried out by SCHAFFER t zyxwvutsrq l. (1977) under a similar sampling scheme. We follow a similar ap- proach in testing whether or not the dynamics (over time) of the observed cytonuclear disequilibria are con- sistent with those expected under a random drift model. Tests of hypotheses about cytonuclear disequi- libria in the absence of genetic sampling variation have been considered by ASMUSSEN et al. (1987), FU and ARNOLD (1992), and ASMUSSEN and BASTEN (1994). The focus of this article is on a test of fit to a hypothe- sized drift process using sample estimates of cytonuclear disequilibria over time. THE METHOD Cytonuclear disequilibria: Let us suppose we observe a hybrid population at a nuclear locus with two alleles A and a, and at a cytoplasmic locus with alleles M and m simultaneously as in SCRIBNER nd AVISE (1994a). The frequencies of the respective cytonuclear genotypic classes are denoted by PI . . . s, respectively. For exam- ple, pl is the frequency of the class AA/M (see Table 1). Note that the same population can be represented in terms of the frequencies f their allelic combinations (A/M, A/m, a/M, a/m) in Table 2. TABLE I Frequencies of cytonuclear genotypes Nuclear genotype Cytoplasm AA Aa aa Total M PI P2 ps zyxwvutsrqp   m 4 P5 ps 1-4 Total U U zyxwvutsrq   1 zyxwvu TABLE 2 Frequencies of allelic combinations Nuclear allele Cytoplasm A U Total M el = pl + p/2 e3 = p, + p/2 Q m Q = p4 + p/2 e4 = ps + p/2 1 q Total P 1-P 1 Cytonuclear disequilibrium is defined to be the asso- ciation of nuclear genotypes or alleles with cytoplasmic alleles (ASMUSSEN et al. 1987). The following are he cytonuclear disequilibria corresponding to the omozy- gote AA/M and the heterozygote Aa/M, respectively: 4 P, Uq, (1) 4 = p, vq, (2) where u = P, + p4 and v = p, ps. Similarly, the allelic disequilibrium is given by D = e1   pq, 3) where p = el + , and q = el e3, and the e,'s are the frequencies of the allelic combinations (Table 3). For detailed definitions and their moment dynamics we re- fer to DATTA t al. (1996). All the above quantities can be defined at the sample level, as well. We will supplement all of these variables with carets to denote their sample counterpart. For ex- ample, l = il ia denotes the sample disequilibrium corresponding to AA/M. A statistical test based on disequilibrium dynamics: From now on, we will consider a cage experiment with the sampling scheme in Figure 1. The experiment starts with an initial base population with nonzero allelic dis- equilibrium D = 4. enerations are discrete and non- overlapping, and only a fixed number of randomly se- lected offspring of the previous generation re introduced into the cage, leading to genetic drift (Le., genetic sampling variation). The offspring are allowed to mature, mate, and have offspring, and are removed from the cage to make room for the next generation. From the adults of generation t removed from the cage, a total of a ) adults are randomly sampled to estimate frequencies of cytonuclear genotypes in Table 1 (see Figure 1 , eading to statistical sampling variation. Con- TABLE 3 Gametic (allelic) disequilibrium Nuclear allele Cytoplasm A a Total M e, = pq D es = (1 p)q D Q m a=p(l-q)-D ~=(l-p)(l-q)+D 1-q Total P 1-P 1  Disequilibrium zyxwv est 1987 sequently, the sample genotypic counts are multinomial with parameters equal to the ytonuclear genotypic fre- quencies zyxwvutsrq l (t), zyxwvutsr   . . , p6(t) of the tth generation of the population. n each generation these genotypic fre- quencies are he realization of a drift process. The model used here to describe the genetic sampling varia- tion is termed he random union of zygotes (RUZ) model (WATTERSON 970, 1972). Genetic sampling variation: Let X@( t) be the number of individuals receiving gamete zyxwvu   rom the father and gamete m from the mother at time t, and let x(t) = (X11( t), . . . , zyxwv 4( t)) be the vector of such counts. The probability distribution of the counts X(t) given the gametic combination counts at time t 1, X(t 1) is multinomial and is given by Pr(X(t) = zyxwvuts 0 1x0 1)) where N( t) = XJm @ t), m = 1, 2, 3, 4 (WATTEFSON, 1970) and efand e, are frequencies of allelic combina- tions in fathers and mothers. It is not hard to see that the cytonuclear genotypic frequencies are linear ombi- nations of the gametic combination counts Xfm (see e.g., DATTA et al. 1996). Therefore, one can find the condi- tional moment generating function f pl, . . . , & at time t given the frequencies at time t 1, which in turn extends to the calculation of the moments of the cyto- nuclear disequilibria, Dl and 4. See DATTA t al. (1996) for details. Statistical sampling variation: The estimated disequi- libria based on the statistical sample from the popula- tion at time t are given by B 4 = 1;dS + = nj(t)/n(t), zyxwvut   = 1, . . . , 6, (7) and nl (t) zyx   ) are the genotypic counts in the sample from generation t = 0, 1, * . Here the estimates are subject to multinomial sam- pling variation, conditional on the genotypic frequen- cies in Table 1, at time t generated by the drift process. Total sampling variation: The total (statistical ge- netic) variance and covariances of the cytonuclear dis- equilibria can now be calculated from the above model. These expectations are necessary to construct a neutral- ity test comparing observed and expected disequilibria. The (unconditional) variance of &t), i = 1, 2, can be calculated by the formula where and 5 stand for the genetic and the statistical sampling, respectively. Now ignoring smaller terms of the order O(n (t)Ar'(t)) and O(n-'(t)), (i.e., terms like l/n(t)N(t) nd l/n2(t)), we can obtain an approxi- mation for the first term in (8): Varf,(E5(Dj(t))) Var,;(D,(t)), (9) which can be calculated from the formulae given in DATTA t al. (1996). Note that E,@,( t)) s calculated by differentiating the moment generating function of a multinomial distribution with parameters n(t) and pl( ), . . . , p6( ). Direct calculation using the moment generating unction and gnoring he higher order terms shows that E4Var5(fi1(t))) = -(p + 2dl-1$-$q0 40 1 n( ) + d:-1$ dt-lh) + pO(-d;-,& + 2d:-1g dl-1 + 2dt-1&0 + o   3d?-1&0 + 4dt-i&qo &qo + Po& 2$& 3dt-l&& + 2&& && 2d:-1& + dt-lf$qo + dt-1poqo + zy   dt-1&70 )I, (10) where po = p(0) qo = q 0) are the initial gene frequencies and dl-] = Ec,(D(t 1)). In deriving the above formula, Var,(G (t) ) is calculated by arguments similar to those for E$(& t)) from the multinomial statistical sampling. For calculating the second term, E,(Var,(a(t))), n (8), we used the 6-method and the fact that expectations of p and q remain constant over generations. Using a result of Fu and ARNOLD (1992), for the random drift model we find where Nj = N i), 1. In the same way, one obtains the second term in (8)  1988 zyxwvutsrq . Datta zyxwvu t al. zyxwv   48dt-,&& + zyxwvu 66 + 24dt-1p&$ 16&$ 56d,-16$ + 32&$ + 32d,-1&$ 1664 + 8&& 16&& + 8&$)}. (12) For a complete understanding of the system we also need to calculate the total covariance between the sam- ple disequilibria as well. Covariances can occur between disequilibria in the same generation or disequilibria in different generations. Both are now calculated. The to- tal covariance between Bl (t) and zyxwvut (t) (Cov(Bl( ), I ( t) ) in the same generation is given by cov~j(~~(Bl(t)), \(&(O)) + ECj(Covs(B1(t), (t))). (13) Using similar calculations as before and ignoring higher order terms, we find cov(~l(t),&(t)) COV,,(Dl(t), t)) 1 + pO(8d;-lP, l6d;-,& 24d;-,& 32d;-,& + l6d;-& l6d;-& 3d,-,pOqO 8d;-lpOqo + 2&qo 4d?-l&qo + 16d:-1&q0 + 64d,3_1&q0 zyx 640 8Od:-&o 120d:-l&qo + 112d:-,&qO + 64d;-l&qo 48d:-l&O 4&& 32&& 32d:-1&& + 4&$ + 32d,-,&$ + 152d,'-1&$ 4 ) 112d,-,&& 216d:-1p&$ + 128dt-,&& 96d:-1&& 48dt-1&& 40d,-,p&$ 16&$ + 144dt-,&$ 4864 l68dt-,g$ + 48&$ + 64d,-lP:$ 16&$ 16&& + 48&& 4863 + 16p$$)}, (14) where the first term Cov,(D,(t), 4 )) can be calculated following DATTA t al. (1996). The covariance between the estimates based on sam- ples from different generations are calculated next. For 1 zyxwvut   j 5 2, t 0, zyxwvut   1, given 5, the composition of the population at time t, Bi(t) and Bj(t s are independent. Also up to higher order terms, E(Bi( ) Tt = Di( ) (15) and E(B;(t s) 1s = E(E(Bj(t + s) Yt+J 3,) = E(D,(t + s) 1s . (16) It can be shown using the one-step conditional moment generating function ( cJ: DATTA,  Fu, and ARNOLD 1996) that the above quantity equals ((N,+,r l)/N,+,)E(D(t s l)p(t + s 1)(7,), fj = 1 and ((N,+, 1)/ N,+,)E D t s 1)(1 2p(t + s 1))17,), fj = 2, which in turn, by the &method, s approximately equal to (follows from the dynamics of D, ee ARNOLD and Fu 1992), where rp = p,,, if j = 1 and = 1 2p0 if j = 2. Note that the above product needs to be interpreted as 1 for s = 1. Consequently one has, modulo higher order terms, C0V(Bi(t), Bj( + s = (N+s 1) N+ zy   (17) Once again, in the above equation Cov(Di( ), D )) can be found using the recursions of the first two moments of Di( ) in DATTA t al. (1996) and D(t) in Fu and AR- NOLD (1992). Note that in obtaining the first term of the above expression we have used the independence of the samples at times t and t + s given the population at time t. Let the vector Y = (B1(1), . , Bl(4, &(I), . . . , &(k)) (18) be the trajectory of sample disequilibria, fil and I . Note that E(Bi(t)) = Etr(E$(Bi(t))) S(Di(t)), (19) ignoring higher order terms, which is equal to a,&& if i = 1 and a,(l 2pO)& if i = 2. Thus, the trajectory of the sample disequilibria Y s approximately normally distributed with mean vector p = Xp and variance- covariance matrix E, where X = [X, X,], X1 = ( al, . . . , ~k, , . . . , O) , X2 = (0, . . . , 0, ai, . . . , Q)', with P = PI, Pd, I = A, PZ = (1 2po)&. To recapitu- late, the trajectory of sample cytonuclear disequilibria is denoted by Y. The expectation of this trajectory is p. The estimates of the trajectory are approximatelyjointly normal, and the variancecovariance matrix of the sam- ple cytonuclear disequilibria, Bl and 4, along the tra- jectory is X. The vector p is determined by the initial state of the cage. The calculation of the variance-covari- ance matrix of the disequilibria X was explained in  Disequilibrium Test zyxwv 989 the earlier paragraphs. The normal approximation is just-ified when both zyxwvutsr ( zyxwvut ) and zyxwvuts ( ) are large. ivPutmlit?l test: With the mean vector p and the vari- ance-covariance matrix C of the estimated trajectory for Bl and zyxwvut 2, we can now construct a neutrality test by comparing the inferred trajectory of cytonuclear dis- equilibria zyxwvut Y) ith its expectation over time p). If the initial values fi), qo and Do n the cage are known, one can use the following statistics to test the null hy- pothesis of a random drift model: T= Y zyxwvut   P)~Z”(Y p), 20) which will have an approximate chi-square distribution with 2k degrees of freedom. One would reject a random drift model if T > ~:(2k), here ~2(2k) s the upper ath quantile of a chi-square distribution with 2k degrees of freedom. However, often in practice, the composi- tion of the initial population (summarized by p is not known and only a simple random sample from it is available. In that case, we suggest (consistently) estimat- ing the variance covariance matrix C by substituting in the sample estimates fi(O), q 0) nd B(0) in places of b, o and do, respectively, in the expression for E rom (9) 14) and (17). Next, an overall estimate of p is obtained by the method of weighted least squares: fi = (xrg- X) 1xtg 1y. Finally, we could use the test statistic T = Y @ r e-’ Y @), which will have an approximately chi- square distribution with 2(k 1) freedom under the random drift model. This proposition is validated with a simulation study for k = 2. In the simulation we con- sidered that the initial generation had the frequencies p, = 0.5 and p6 = 0.5, and all the subsequent genera- tions had constant generation size N = 500. In each simulation, random samples of constant sample size n = 50 are generated for + 1 = 3 generations. The value of the test statistic is calculated for each simulation, nd the ntire process is repeated 5000 times. Conse- quently, we have 5000 independent realizations of the test statistic. A histogram wa5 drawn with these values, which is in good agreement with an overlaid true chi- square distribution with 2 degrees of freedom (Figure 2). The agreement is emphasized with a Q Q plot also. Note that for this neutrality test it is not necessary to have sample data from all consecutive generations. It is easy to adjust and reinterpret the est statistic in such cases. Software for carrying out this neutrality test is available from the author ( Instead of using the chi-square percentiles in con- structing the approximate rejection region of the test, one may resort to Monte Carlo simulation to find the exact Pvalue given a sample. This may be preferable if the sample sizes are small and one is afraid that the large sample distribution may not be adequate. Since 0 5 zyxwv 0 1s 20 TS w 0 S io 5 20 Qventlles 01 Chi Squmre disWbullon with 2 d.f. FIGURE .-Histogram and chisquared Q Q plot based on 5000 Monte-Carlo replications of the test statistic T. The overlaid graph in solid line is the density of chi-squared distri- bution with 2 degrees of freedom. the Pvalue calculation using present day computing is not expensive in terms of computing time, we anticipate that this approach may be preferred by many users. If, in practice, the entire history of the generation sizes is not known, then one may use simple interpola- tion. We found that the test statistic is not too sensitive with respective to misspecification of the generation sizes. RESULTS An example: To illustrate our test procedure, we now consider a real cage experiment constituting artificial hybrid zone data on Drosophila melanogmter conducted by M. PARSKY. The experiment uses controlled envi- ronmental conditions o study the effects of ytonuclear interaction in a population genetic context sing genet- ically manipulated strains of D. melanogmtpr. The hy- brids were formed with the crosses of initial stocks of flies collected in Ega, Denmark and Death Valley, CA. Cages were maintained in a cycle of discrete genera- tions. In each generation, flies were allowed to mate, and the adult flies were taken out and kept frozen for genotyping by PCR analysis. Eggs were allowed o form the next generation. A random sample of adult flies were taken from the frozen adult population and using PCR-based techniques, the cytonuclear genotypes were observed at two nuclear loci (60A and DPP) and at a cytoplasmic locus simultaneously. The experiment was continued for four generations. Data were not initially collected on the third eneration except for the opula- tion size. For carrying out our test procedure we have ignored the data on their initial generation as the ex- periment started with a purely hetero7ygous population in which all disequilibrium measures are zero. Thus, in
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks