A General Model for the Genetic Analysis of Pedigree Data

A General Model for the Genetic Analysis of Pedigree Data
of 20
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
    Human Heredity 21: 523-542 (1971) General Model for the Genetic nalysis o Pedigree Datal R. C. ELSTON and J. STEWART Department of Biostatistics and the Genetics Curriculum, University of North Carolina, Chapel Hill, N.C., and Department of Genetics, Milton Road, Cambridge Abstract. Assuming random mating and random sampling of Key Words pedigrees, the likelihood of a set of pedigree data is developed in terms Pedigree data of: (1) the population distribution of the different genotypes; (2) the Genetic analysis phenotypic distributions for the different genotypes, and (3) the genotypic distribution of offspring given the parents' genotypes. This last is given for any number of unlinked autosomal loci, two linked autosomal loci, an X-linked locus, and combinations of these possibilities. Methods are given for using this likelihood to test specific genetic hypotheses and for genetic counselling, I. Introduction The purpose of analysing pedigree data is to establish the presence or absence of a genetic mechanism for the manifestation of a particular trait or set of traits; to elucidate such a mechanism, if it is present; and to classify individuals for their genotypes. By pedigree data we mean data collected on one or more groups of related individuals, a group being more extensive than just parents and children (families): thus more than two generations will be involved. Whereas it is possible to examine genetic mechanisms without such data, using families, pairs of relatives, or even unrelated individuals, pedigree data provide the most genetic information. A typical pedigree may comprise a hundred or so individuals covering four or more generations, and from a genetic point of view such a group of individuals is capable of yielding far more information than can be obtained from the same number of individuals 1 This investigation was supported by a Public Health Service Research Career Development Award (1-K3-GM-31, 732), training grant (GM 00685) and research grant (GM HD 16697) from the National Institute of General Medical Sciences. I I  524 ELSTON/STEWART divided up into small unrelated groups. Yet, except for the very specialized purpose of studying genetic linkage, informative statistical analysis of such data seems to have been completely ignored. It is a common practice among geneticists, when wishing to establish a certain genetic hypothesis from pedigree data, to divide the pedigree up into families (i. e groups of parents and children) and analyze the data as though the families were independently sampled. This practice 'wastes' information, even though it is not always statistically invalid. Such analyses have probably done little harm in the past, since so far they have usually been used to study dichotomous traits with a view to establishing a 'dominant' or 'recessive' mode of inheritance. The families are of course not independent, but under a simple null hypothesis there is independent segregation in each of the families. However, if we are to examine quantitative traits, or traits whose manifestation is influenced to a large degree by the environment, a more powerful analysis becomes necessary. It is thepurpose of thispapertoindicate in broad outline and fair amount of generality an approach to this problem. II. The Basic Probability Model: Likelihood of a Set of Data In this section we derive, under very general assumptions as to the underlying genetic model, the likelihood that a particular set of pedigree data should be observed. We shall elaborate in section III certain special cases, and indicate in section V how this likelihood formulation can be used to test specific genetic hypotheses. If the data are distributed discretely this likelihood is in fact the probability that the particular data should be observed; if the data are distributed continuously the likelihood is no longer a true probability, but it is intuitively helpful to think of it as such. A. Notation for Data We need first some notation to identify the measures on each member of the pedigree. We consider here only the case in which there are no consanguineous marriages, so that each member of the pedigree is one of two types: either he is related to someone in the previous generation, or, if not, he is an unrelated person 'marrying into' the pedigree. Measures on the first type of person will be denoted by x, on the second type by y. In the case of the srcinal parents of a pedigree, it is arbitrary as to which is termed x and which is termed y. The use of subscripted subscripts, though clumsy typographically, will be helpful in the sequel to keep track of the different generations.  525 General Model for the Genetic Analysis of Pedigree Data Fig. 1. Hypothetical examples to illustrate the notation for the beginnings of the first 2 pedigrees. In general, the data will consist of separate pedigrees, numbered 1,2, ... , io' ... Let the measures on the srcinal parents of the io-th pedigree be xi and o Yi (fig. 1). Let the measure on the iI-th child of these srcinal parents be o xi i ' and that on his or her spouse be Yi i; similarly let the measure on the 01 01 i 2 -th child of the iI-th child of the io·th srcinal parents be x· , I' , and that on 1011 2 his or her spouse be y. , I ; and so on (fig. 1). In general, the measure on an 101 12 individual in the j-th generation of the pedigree, counting the srcinal parents as generation 0, will be of the form Xl I I 1 . I"; this being the o 1 2 ., J-I J irth child of the ij_I-th child of the ... of the i 2 -th child of the iI-th child of the io·th srcinal parents. B. Relationship between Genotype and Phenotype Let k be the number of different genotypes that cause variation in the trait measured; in particular it is the smallest number of distinguishable genotypes that must be postulated to exist in the population to account for the segregation occurring in our particular sample of pedigrees. Thus k must be equal to or greater than the number of different genotypes, influencing the trait concerned, that occur in our sample. For example, if in our sample data there is segregation at just one of several loci that can affect the trait, and furthermore only two alleles occur in our data, three genotypes are possible, say AA, Aa and aa. In this case k will be three, whether or not all three of these genotypes occur in the sample data. The genotypes can be  526 ELSTON/STEWART arranged in some specified order, and so we can talk of the u-th genotype, u = 1,2, ... , k. Frequently a single phenotype is associated with each genotype, but this ignores the possibility of misclassification. More importantly, it does not cater for the case of quantitative inheritance, where each genotype is associated with a range of phenotypic values, the variation within each genotype being due to environmental influences. We, therefore, associate a probability density function with each genotype. If x is the trait measured, denote this function by gu (x) for the u-th genotype. There will thus be k such functions, not necessarily all distinct. For example, dominance at a single locus with two alleles segregating would imply that two of the three density functions are identical. There is no major difficulty in letting gu (x) be age and/or sex dependent, but, for simplicity of notation, this will not be done here. In the discrete case gu (x) is a multinomial distribution (binomial for the special case of a dichotomous trait); in the continuous case it will usually be reasonable to assume it is a normal distribution (x being a suitable transformation, if necessary, of the scale of measurement). Thus, gu (x) is the conditional probability (density), given the u-th genotype, that x should be observed. The gu (x) (u = 1, 2, ... k) may be known independently, but usually they will need to be estimated from the data. For example, x might be a continuous character whose distribution in the population suggests trimodality, and the genetic model it is desired to test is that of a single locus with two alleles resulting in three genotypes. We could then fit a mixture of three normal distributions to the overall data, estimating the three means, relative proportions, and a common variance by maximum likelihood [MURPHY and BOLLING, 1967]. These estimates are then used as starting values in the maximization of the likelihood that will now be developed. C. Construction of the Likelihood We shall start by considering the likelihood for a single sibship. The key here is that, given the genotypes of both parents, the genotypes of all the offspring are independent of each other. Let Pstu be the probability that an individual has genotype u, given that his parents' genotypes are sand t (s, t and u can each have the values 1, 2, ... , k). Then if the values of x for a sibship of size n are Xl' x 2, , xn' the likelihood of the sibship given that the parents have genotypes sand tis: n k II E Pstugu(Xj). (1) i=lu=l  527 General Model for the Genetic Analysis of Pedigree Data Now let be the probability that an individual should be of the v-th v genotype, i.e. is the proportion of individuals in the population who have v the v-th genotype. Then the likelihood of observing the spouse of the i-th member of the sibship, whose measure is Yi' is simply k 1: 'If.'ygY(Yi). (2) v=l Thus the likelihood of observing the sibship and their spouses, given that the parents have genotypes sand t, can be written: n k k II 1: Pstugu(Xi) 1: 'If.'ygvCYi). (3) i=lu=l v=l The expression (3) is of course a function of sand 1. However, the sand t in this expression correspond to the u and v in a similar expression for the previous generation. This relationship between the generations can be expressed by rewriting (3) as the likelihood of observing a sibship and their spouses of the j-th generation, using the notation of section A above. We have: k k rj =II 1: PSj_ltj_Pjgsj(Xioil ... ij) 1: 'If.'tj gtj (Yioi 1 ... j)' (4) ij Sj =1 tj = 1 r j is thus an operator which is a function of Sj-l and t j-1, since it is the likelihood of observing a sibship and their spouses in generation j conditional on the sibs' parents being of genotypes Sj-l and t j-1, respectively. But the likelihood of the parents, if they are of genotypes Sj-l and t j -1, is the term (5) in the likelihood of observing a (j-l)th generation sibship and spouses, except that this is conditional on their parents being of genotypes Sj-2 and t j- 2• Thus k rj-l = II 1: PSj_2tj_2Sj_lgSj_l(XiOil'" ij-l) ij_l Sj_l = 1 (6) k }.,' 'If.'tj_l gt j_l (Yioi 1" . ij_l) tj_l = 1
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks