Description

A General Model for the Genetic Analysis of Pedigree Data

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Transcript

Human
Heredity
21:
523-542 (1971)
General Model
for
the Genetic nalysis
o
Pedigree Datal
R. C.
ELSTON
and J.
STEWART
Department
of
Biostatistics and the Genetics Curriculum, University
of
North
Carolina, Chapel Hill,
N.C.,
and Department
of
Genetics, Milton Road, Cambridge
Abstract.
Assuming random mating
and
random sampling
of
Key
Words
pedigrees, the likelihood
of
a set
of
pedigree
data
is
developed in terms Pedigree
data
of: (1) the population distribution
of
the different genotypes; (2) the Genetic analysis phenotypic distributions for the different genotypes, and (3) the genotypic distribution
of
offspring given the parents' genotypes. This last
is
given for any number
of
unlinked autosomal loci, two linked autosomal loci,
an
X-linked locus, and combinations
of
these possibilities. Methods are given for using this likelihood to test specific genetic hypotheses
and
for genetic counselling,
I.
Introduction
The
purpose
of
analysing pedigree
data
is to establish the presence
or
absence
of
a genetic mechanism for the manifestation
of
a particular trait
or
set
of
traits; to elucidate such a mechanism,
if
it
is
present; and to classify individuals for their genotypes.
By
pedigree
data
we
mean
data
collected on one
or
more groups
of
related individuals, a group being more extensive
than just
parents and children (families): thus more
than
two generations will be involved. Whereas it is possible to examine genetic mechanisms without such data, using families, pairs
of
relatives,
or
even unrelated individuals, pedigree
data
provide the most genetic information. A typical pedigree may comprise a hundred
or
so individuals covering four or more generations,
and
from a genetic point
of
view such a group
of
individuals
is
capable
of
yielding far more information than can be obtained from the same number
of
individuals
1
This investigation was supported by a Public Health Service Research Career Development Award
(1-K3-GM-31,
732), training grant
(GM
00685) and research grant
(GM
HD
16697) from the National Institute
of
General Medical Sciences.
I
I
524
ELSTON/STEWART
divided up into small unrelated groups. Yet, except for the very specialized purpose
of
studying genetic linkage, informative statistical analysis
of
such data seems to have been completely ignored.
It
is
a common practice among geneticists, when wishing to establish a certain genetic hypothesis from pedigree data, to divide the pedigree up into families
(i.
e
groups
of
parents and children) and analyze the data as though the families were independently sampled. This practice 'wastes' information, even though it
is
not always statistically invalid. Such analyses have
probably done little harm in the past, since
so
far they have usually been used to study dichotomous traits with a view to establishing a 'dominant'
or
'recessive' mode
of
inheritance. The families are
of
course not independent, but under a simple null hypothesis there is independent segregation in each
of
the families. However,
if
we
are to examine quantitative traits,
or
traits whose manifestation
is
influenced to a large degree by the environment, a more powerful analysis becomes necessary.
It
is
thepurpose
of
thispapertoindicate in broad outline and fair amount
of
generality an approach to this problem.
II. The Basic Probability Model: Likelihood
of
a Set
of
Data
In this section
we
derive, under very general assumptions as to the underlying genetic model, the likelihood that a particular set
of
pedigree data should be observed. We shall elaborate in section
III
certain special cases, and indicate in section V how this likelihood formulation can be used to test specific genetic hypotheses.
If
the data are distributed discretely this likelihood
is
in fact the probability that the particular data should
be
observed;
if
the data are distributed continuously the likelihood
is
no longer a true probability, but it is intuitively helpful to think
of
it as such.
A. Notation for Data
We
need first some notation to identify the measures
on
each member
of
the pedigree. We consider here only the case in which there are no consanguineous marriages,
so
that each member
of
the pedigree
is
one
of
two types: either he
is
related to someone in the previous generation, or,
if
not,
he
is
an
unrelated person 'marrying into' the pedigree. Measures on the first type
of
person will be denoted by
x,
on
the second type by
y.
In
the case
of
the srcinal parents
of
a pedigree,
it
is
arbitrary as to which is termed x and which
is
termed
y.
The use
of
subscripted subscripts, though clumsy typographically, will be helpful in the sequel to keep track
of
the different generations.
525
General
Model
for the Genetic
Analysis
of
Pedigree
Data
Fig.
1.
Hypothetical
examples
to
illustrate
the
notation
for the
beginnings
of
the first
2
pedigrees.
In
general, the data will consist
of
separate pedigrees, numbered
1,2,
...
, io'
...
Let the measures
on
the srcinal parents
of
the io-th pedigree be
xi
and
o
Yi
(fig.
1).
Let the measure
on
the iI-th child
of
these srcinal parents be
o
xi
i '
and that
on his or her spouse be
Yi
i;
similarly let the measure on the
01
01
i
2
-th child
of
the iI-th child
of
the io·th srcinal parents be
x·
,
I'
,
and
that
on
1011 2
his or her spouse be
y.
,
I
;
and
so
on (fig.
1).
In
general, the measure on
an
101
12
individual in the j-th generation
of
the pedigree, counting the srcinal parents as generation
0,
will be
of
the form
Xl
I I
1 .
I";
this being the
o
1
2
.,
J-I
J
irth
child
of
the ij_I-th child
of
the
...
of
the i
2
-th child
of
the iI-th child
of
the io·th srcinal parents.
B. Relationship between Genotype and Phenotype
Let k be the number
of
different genotypes
that
cause variation in the trait measured; in particular
it
is
the smallest number
of
distinguishable genotypes
that
must be postulated to exist in the population to account for the segregation occurring in
our
particular sample
of
pedigrees. Thus k must be equal to
or
greater than the number
of
different genotypes, influencing the trait concerned,
that
occur in our sample.
For
example,
if
in our sample
data
there
is
segregation
at just
one
of
several loci
that
can affect the trait,
and
furthermore only two alleles occur in
our
data, three genotypes are possible, say AA, Aa
and
aa.
In
this case k will be three, whether
or not
all three
of
these genotypes occur in the sample data. The genotypes can be
526
ELSTON/STEWART
arranged in some specified order, and so
we
can talk
of
the u-th genotype,
u
=
1,2,
...
,
k.
Frequently a single phenotype
is
associated with each genotype,
but
this ignores the possibility
of
misclassification. More importantly, it does not cater for the case
of
quantitative inheritance, where each genotype
is
associated with a range
of
phenotypic values, the variation within each genotype being due to environmental influences. We, therefore, associate a probability density function with each genotype.
If
x
is
the trait measured, denote this function
by
gu
(x)
for the u-th genotype. There will thus be k such functions, not necessarily all distinct.
For
example, dominance
at
a single locus with two alleles segregating would imply that two
of
the three density functions are identical. There
is
no major difficulty in letting
gu
(x)
be age and/or sex dependent, but, for simplicity
of
notation, this will not be done here.
In
the discrete case
gu
(x)
is
a multinomial distribution (binomial for the special case
of
a dichotomous trait); in the continuous case it will usually be reasonable to assume it
is
a normal distribution
(x
being a suitable transformation, if necessary,
of
the scale
of
measurement). Thus,
gu
(x)
is
the conditional probability (density), given the u-th genotype, that x should be observed. The
gu
(x)
(u
=
1,
2,
...
k)
may be known independently,
but
usually they will need to be estimated from the data.
For
example, x might be a continuous character whose distribution in the population suggests trimodality, and the genetic model it
is
desired to test
is
that
of
a single locus with two alleles resulting in three genotypes. We could then fit a mixture
of
three normal distributions to the overall data, estimating the three means, relative proportions, and a common variance by maximum likelihood
[MURPHY
and
BOLLING,
1967]. These estimates are then used as starting values in the maximization
of
the likelihood that will now be developed.
C.
Construction
of
the Likelihood
We shall start by considering the likelihood for a single sibship. The key here
is
that, given the genotypes
of
both parents, the genotypes
of
all the offspring are independent
of
each other. Let Pstu be the probability that an individual has genotype u, given that his parents' genotypes are
sand
t
(s,
t and u can each have the values
1,
2,
...
, k). Then
if
the values
of
x for a sibship
of
size n are
Xl'
x
2, ,
xn' the likelihood
of
the sibship given that the parents have genotypes
sand tis:
n
k
II
E
Pstugu(Xj).
(1)
i=lu=l
527
General Model for the Genetic Analysis
of
Pedigree
Data
Now
let be the probability
that
an individual should be
of
the v-th
v
genotype, i.e. is the proportion
of
individuals in the population who have
v
the v-th genotype. Then the likelihood
of
observing the spouse
of
the i-th member
of
the sibship, whose measure is
Yi'
is
simply
k
1:
'If.'ygY(Yi).
(2)
v=l
Thus the likelihood
of
observing the sibship and their spouses, given
that
the parents have genotypes
sand
t, can be written:
n
k k
II
1:
Pstugu(Xi)
1:
'If.'ygvCYi).
(3)
i=lu=l
v=l
The expression
(3)
is
of
course a function
of
sand
1.
However, the
sand
t in this expression correspond to the u and v in a similar expression for the previous generation. This relationship between the generations can be expressed by rewriting
(3)
as the likelihood
of
observing a sibship and their spouses
of
the j-th generation, using the notation
of
section A above. We have:
k
k
rj
=II
1:
PSj_ltj_Pjgsj(Xioil
...
ij)
1:
'If.'tj
gtj (Yioi
1
...
j)'
(4)
ij
Sj
=1
tj
=
1
r
j
is thus an operator which is a function
of
Sj-l
and t
j-1,
since it is the likelihood
of
observing a sibship and their spouses in generation j conditional
on
the sibs' parents being
of
genotypes
Sj-l
and t
j-1,
respectively. But the likelihood
of
the parents,
if
they are
of
genotypes
Sj-l
and t
j -1,
is the term
(5)
in the likelihood
of
observing a (j-l)th generation sibship and spouses, except
that
this is conditional on
their
parents being
of
genotypes
Sj-2
and t
j-
2•
Thus
k
rj-l
=
II
1:
PSj_2tj_2Sj_lgSj_l(XiOil'"
ij-l)
ij_l
Sj_l
=
1
(6)
k
}.,' 'If.'tj_l
gt
j_l (Yioi
1"
.
ij_l)
tj_l
=
1

Search

Similar documents

Tags

Related Search

A Practical Method for the Analysis of Genetia different reason for the building of SilburFuzzy Logic Model for the Prediction of StudePeople For The Ethical Treatment Of AnimalsTo Know the Comparative Analysis of Filipino Irish Society for the Academic Study of Religthe Conceptual analysis of judicial authorityThe Comparative Analysis of Counter-TerrorismA Phenomenological Model for Psychiatry, PsycA simulation model for chemically amplified r

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks