Gadgets

A robust statistical method for case-control association testing with copy number variation

Description
A robust statistical method for case-control association testing with copy number variation
Categories
Published
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  A robust statistical method for case-control associationtesting with copy number variation Chris Barnes 1 , Vincent Plagnol 2 , Tomas Fitzgerald 1 , Richard Redon 1 , Jonathan Marchini 3 , David Clayton 2 &Matthew E Hurles 1 Copy number variation (CNV) is pervasive in the humangenome and can play a causal role in genetic diseases. Thefunctional impact of CNV cannot be fully captured throughlinkage disequilibrium with SNPs. These observations motivatethe development of statistical methods for performing directCNV association studies. We show through simulation thatcurrent tests for CNV association are prone to false-positiveassociations in the presence of differential errors betweencases and controls, especially if quantitative CNV measurementsare noisy. We present a statistical framework for performingcase-control CNV association studies that applies likelihoodratio testing of quantitative CNV measurements in cases andcontrols. We show that our methods are robust to differentialerrors and noisy data and can achieve maximal theoreticalpower. We illustrate the power of these methods for testing forassociation with binary and quantitative traits, and have madethis software available as the R package CNVtools. The advent of technologies to probe DNA copy number genome-widehas led to rapid progress in the understanding of how segments of thegenome can vary in copy number between individuals 1,2 . In addition,there are multiple strands of evidence indicating that this copy number variation (CNV) can have an appreciable biological impact.CNVs frequently have a causal role in severe developmental syn-dromes and familial diseases 3 , CNVs can perturb gene expressionwithin and flanking the CNV 4 and CNVs can confer susceptibility toinfectious and complex diseases 5–7 .In light of the ever-increasing rate of discovery of CNVs throughoutthe human genome, and the growing appreciation for their potentialrole in complex disease, rapid growth in studies investigating associa-tions between CNVs and complex diseases is likely. The developmentof this nascent field is critically dependent on robust statisticalstrategies for identifying meaningful associations. The statisticalchallenges inherent within CNV-association testing are substantially different from those for CNV discovery  8 .A review of the sparse literature that exists on CNV-diseaseassociations reveals that the underlying data are often substantially noisier than for SNP genotyping, largely as a result of poor discrimi-nation of the underlying discrete copy numbers, and yet the statisticalmethods being applied are typically less sophisticated. Although someCNV-disease association studies simply assay presence or absence of specific copy number alleles 9 , most published studies rely on quanti-tative assessments, often crude, of the diploid copy number 6,7,10 . Mostfrequently, real time–PCR assays for known CNVs are applied to caseand control groups and individuals are then binned into copy numberclasses using pre-defined thresholds. These classes represent diploidcopy numbers (that is, the sum of the number of copies on each allele)rather than genotypes. Subsequently, nonparametric statistical tests(for example,  w 2 test, trend test, Mann-Whitney test) are applied to thefrequencies of the different copy number classes in the differentgroups. One previous study has shown that in the context of association with quantitative traits, it has been possible to identify robust associations by simply correlating the trait with the underlyingquantitative CNV measurements without inferring copy numbergenotypes 4 . Although approaches based on direct testing of quantita-tive CNV measurements will often be appropriate for association witha quantitative trait in a single group of subjects, they are often notrobust to the presence of differential errors between groups due todifferences in DNA quality or handling. Thus, they will often beinappropriate in a case-control setting. Given a quantitative measure-ment of copy number, different diploid copy numbers are manifestedas peaks, or clusters, in the distribution of measurements; thedistribution of measurements will be a mixture of (often overlapping)bell-shaped curves. Direct tests of copy number measurements aresensitive to shifts in the mean and/or variance of the underlyingdistributions, and scoring copy number by simple binning will, in thepresence of such shifts, lead to differential misclassification. Suchanalyses could generate many false-positive findings, especially in thecontext of genome-wide studies testing thousands of variants.It is emerging that such shifts in the distribution of measurementsoccur widely in practice, even after careful normalization and calibra-tion procedures have been applied to the raw observations. Forexample,  Figure 1  shows examples of differential errors in CNVmeasurements from three different technologies: SNP genotyping, Received 27 February; accepted 30 June; published online 7 September 2008; doi:10.1038/ng.206 1 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.  2 Juvenile Diabetes Research Foundation/Wellcome TrustDiabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK. 3 Department of Statistics, University of Oxford, Oxford OX1 3TG, UK. Correspondence should be addressed to M.E.H. (meh@sanger.ac.uk). NATURE GENETICS  VOLUME 40  [  NUMBER 10  [  OCTOBER 2008  1245 TECHNICAL REPORTS  array comparative genome hybridization (array CGH) and a variant of quantitative PCR known as the paralog ratio test (PRT) 11 . In eachcase, shifts in the location of the clusters representing specific copy numbers between groups are readily apparent. In the first example,where both distributions are drawn from control groups from thesame population, and no difference in copy number frequencies fromeither population structure or genetic association is expected, simplenonparametric statistical tests on the CNV measurement distributionsshow highly significant differences (Mann-Whitney test:  P   ¼  1.5   10  6 ;  t  -test:  P   ¼  4.2    10  6 ). Similar bias has been shown to affectlower-quality SNP assays within SNP association studies 12 and effec-tive treatment is critical to avoid false-positive associations in poorer-quality assays 13 . CNV data are typically of lower quality and present anadditional challenge in that they often have more than two alleles, thusgiving rise to more than three possible diploid copy numbers. RESULTS We have considered six methods of increasing sophistication fortesting for CNV-disease association. The first method is a nonpara-metric test (Mann-Whitney U test) for location shift between casesand controls of the distribution of quantitative CNV measurements(‘method 1’). The remaining five methods attempt to classify theindividuals into different copy number classes, and to test fordifferences between the frequencies of these copy number classesbetween cases and controls ( Fig. 2  shows these five methods schema-tically). The simplest of these classification methods involves simplebinning of individuals into copy number classes on the basis of pre-defined thresholds, together with an association test on the resultingcontingency table (‘method 2’).Our next three methods closely mirror the conventional analyses forSNP genotyping; the quantitative CNV measurement distribution ismodeled using a Gaussian mixture model and individuals are thenassigned to copy number classes on the basis of their maximum aposteriori probability (see Methods for details). Again, an associationtest is then applied to the resultant contingency table. In the first of these, the Gaussian mixture model was fitted to the cases and controlscombined and each subject was assigned a copy number, irrespectiveof the confidence of assignment (‘method 3’). This strategy was thenadapted to address the problem of differential misclassification by fitting the mixture model independently in cases and controls, thusallowing for shifts in the underlying measurement distributionsbetween groups (‘method 4’). The intuition that differential misclass-ification bias is removed by simply scoring cases and controlsindependently (‘method 4’) potentially car-ries a subtle flaw; in fitting the mixturemodel, differences in copy number frequency between groups is tacitly assumed, and thiscould lead to spurious inflation of any differ-ences unless uncertainty implicit in themixture modelling is correctly propagatedinto the later association test (discussed indetail in ref. 12). The next strategy was toassign copy numbers to subjects only if theposterior probability for the assignmentexceeded a threshold (0.95, ‘method 5’).This last strategy was explored because it iswidely used. Its use is probably suggested by the intuition that, by removing the mostuncertain data, the bias caused by measure-ment error is minimized. However, experi-ence of SNP genotyping brings this intuitioninto question, as application of stringent call quality filters cangenerate a different sort of bias as a result of nonrandom missingness.To address the problems associated with fitting mixture models tocases and controls separately or independently, we developed amethod to integrate CNV scoring (data model) and association testing(genetic model) into a single statistical model, and test for associationusing a likelihood ratio test (‘method 6’).  Figure 3  illustrates theelements of this integrated model. It allows for direct influences of caseor control group (phenotype) on the CNV measurement distribution,as well as the indirect association via copy number frequency. Thelikelihood ratio test compares maximized likelihoods for the modelwith or without an association between copy number and phenotype(shown as a broken arrow in the figure).Method 1 is a test for a simple monotone relationship betweendisease risk and diploid copy number, and, for comparability,the remaining five methods have been implemented to be maximally sensitive to this type of relationship by computing trend tests (with1 degree of freedom (d.f.)). Thus, in methods 2 through 5, we use theCochran-Armitage test for trend in the contingency table obtainedby assignment of subjects to diploid copy number classes. Similarly,method 6 yields a 1-d.f. test when the relationship betweencopy number and phenotype specified in the model has a simplelinear-logistic form. These methods could also be adapted to testfor nonlinear effects, analogous to ‘dominance’ terms in theclassical biometric model. However, as when testing conven-tional markers, power is lost when such terms are small and therelationship is monotonic. Here we concentrate on 1-d.f. tests,although our integrated model approach can be generalized toother genetic models, and these methods are implemented in oursoftware package.We also note that systematic errors due to differing measure-ment distributions are not only related to case or control groupmembership. Experimental batch effects are often evident. Such effectscan also lead to spurious associations, and should also be taken intoaccount in the analysis. Our likelihood ratio testing framework canexplicitly model these batch effects when the batches are large enoughthat the parameters of the mixture model within each batch can berobustly estimated. False-positive rates of different CNV-association procedures We carried out simulations to explore how differential errors andclustering quality influence the type 1 error of the six associationtesting methods outlined above ( Supplementary Methods  online). 0.7 a b c Affymetrix 500K dataArray-CGH dataParalog-ratio-test data0.60.5       D     e     n     s      i      t    y      D     e     n     s      i      t    y      D     e     n     s      i      t    y 0.40.30.20.10.002–2–0.4–0.20.00.20.424Controls - DutchHapMapControls - German68450.50.40.30.20.10.043210Copy number signalCopy number signalCopy number signalHGDP UKBS collection1958 Birth Cohort Figure 1  Example of CNV data showing poor clustering quality and differential errors. ( a ) Comparison ofthe distribution of quantitative CNV measurements for a single CNV (W8177) in the two control groupsof the WTCCC from Affymetrix 500K SNP genotyping data. ( b ) Comparison of the distribution ofquantitative CNV measurements in array-CGH data (clone Chr15tp-11F12 on the Whole GenomeTilePath array 1 ) between the HapMap panel and the Human Genome Diversity Panel (HGDP).( c ) Distribution of quantitative CNV measurements from a paralog-ratio-test assay for the  b -defensinlocus in Dutch and German control cohorts 11 . 1246  VOLUME 40  [  NUMBER 10  [  OCTOBER 2008  NATURE GENETICS TECHNICAL REPORTS  We generated datasets that varied in signal-to-noise ratio (clusteringquality, denoted by Q), as measured by the ratio of the separation of cluster means to the cluster s.d. ( Supplementary Methods ). We alsoexplored the sensitivity of the type 1 error rate to small differences incluster means and variances.  Figure 4  shows to what degree the teststatistics for all six methods are inflated when compared with theirexpected distributions. Even small location differences in the CNVmeasurement distribution between cases and controls can lead tomassively inflated type 1 error if Mann-Whitney testing or a prioribinning are used. Copy number assignment using mixture modelsperforms better, particularly when cases and controls are scoredindependently (method 4), but appreciable overdispersion remains.This inflation of the test statistic in method 4 results from over-estimating the confidence of copy number assignment through con-structing a contingency table, and from overestimating the differencesin copy number frequencies between cases and controls throughfitting the mixture model to cases and controls separately, whichallows the nuisance parameters to vary between the two models. Thisis effectively equivalent to fitting mixture models under the alternatehypothesis that copy number frequencies do indeed differ betweencases and controls. As a result, the true variance of the score teststatistic is greater than the naı¨ve estimate. By contrast, the likelihoodratio test (method 6) estimates all parameters under the null (no copy number differences) and alternate (copy number differences exist)hypotheses and thus provides the most robust test. As expected,imposing stringent call thresholds (method 5) does not remove theoverdispersion, but rather exacerbates it.We also investigated the performance of these methods in empirical CNV data inwhich we expected no true associations toexist. For this purpose, we analyzed 95known CNVs ( Supplementary Table 1 online) from Affymetrix 500K SNP genotyp-ing data collected on two UK control popula-tions, each of  B 1,500 individuals, as part of the Wellcome Trust Case Control Consortium(WTCCC) 14 . In one group (the 1958 BirthCohort sample) DNA was obtained fromEBV-transformed cell lines while, from theother (UK National Blood Service controls),DNA was from fresh blood. Association testson these 95 CNVs, which differ in numbers of alleles, clustering quality and allele frequencies,show substantially less overdispersion of   w 2 statistics using the like-lihood ratio trend test ( l  ¼  1.1) as compared to Cochran-Armitagetesting of separate mixture model assignment of the same CNVs with( l  ¼  1.74) or without ( l  ¼  1.58) allowing for differential errors( Supplementary Fig. 1  online). This overdispersion is significantly lower for the likelihood ratio trend test ( P  o 0.05 for Wilcoxon signedrank test comparing test statistics produced by the likelihood ratio testand either mixture model assignment method), but is not significantly  PhenotypePhenotypeNot allowing for differential bias ab Allowing for differential biasCopy numberCopy numberQuantitative signalQuantitative signal Figure 3  Modelling the dependency between copy number and disease.( a ) Naı¨ve model in which any dependency between disease phenotypeand quantitative measurements of copy number is assumed to be due todifferences in the distribution of copy number between cases and controls.( b ) A more elaborate model that allows for other differences in measurementdistribution between cases and controls due, for example, to differences inDNA qualities. Quantitative CNV signalCopynumber Genetic model(Association testing) Associationtest Data model + Genetic modelData model(Copy number assignment)    F  r  e  q  u  e  n  c  y   F  r  e  q  u  e  n  c  y Raw dataCopy numberCopy numberControlsCasesMixture model,assignmentcombinedMixture model,assignmentseparatelyMixture model,assignmentseparatelyif P > 0.95Mixture model,LR trend test Data model + Genetic modelData model(Copy number assignment) A priori binningGenetic model(Association testing)    F  r  e  q  u  e  n  c  y   F  r  e  q  u  e  n  c  y   F  r  e  q  u  e  n  c  y Copy number    F  r  e  q  u  e  n  c  y Copy number    F  r  e  q  u  e  n  c  y Copy number    F  r  e  q  u  e  n  c  y Copy number    F  r  e  q  u  e  n  c  y    F  r  e  q  u  e  n  c  y   F  r  e  q  u  e  n  c  y   F  r  e  q  u  e  n  c  y   F  r  e  q  u  e  n  c  y Copy number Copy numberCopy number Log odds β =0 β≠ 0 Log odds Copy numberCopy number 1 degree of freedom Copy numberCopy number    F  r  e  q  u  e  n  c  y Copy numberCopy numberCopy number    F  r  e  q  u  e  n  c  y   F  r  e  q  u  e  n  c  y   F  r  e  q  u  e  n  c  y Copy numberCopy numberCopy number H1:  β≠ 0H0:  β= 0 ab Nonparametric testing Figure 2  Methods for performing CNV-associationtesting. ( a ) In association studies, inference ofgenotypes from data and association testing ofgenotypic data are generally treated as separatestatistical problems; however, the two underlyingmodels can be combined into a single, integratedprocedure. ( b ) Five different case-controlassociation methods are represented schematicallyon simulated copy number intensity data in caseand control groups. The first three methodsclassify individuals into copy number classesbefore performing nonparametric testing.Classification is achieved by either a priori binningor assignment on the basis of maximal a posterioriprobability from mixture models fitted to theunderlying intensity data. The new likelihood ratiotest integrates classification and associationtesting into a single procedure by comparingmixture model fits under nested hypotheses. NATURE GENETICS  VOLUME 40  [  NUMBER 10  [  OCTOBER 2008  1247 TECHNICAL REPORTS  different between the two mixture model assignment methods ( P   4 0.05). The small degree of overdispersion observed in the likelihoodratio test statistics was not statistically significant.Within the integrated model framework it is also possible to carry out a likelihood ratio test for difference in cluster means and variancesbetween the two groups ( Supplementary Methods ). We first showed,in simulations in which there were no CNV measurement distributiondifferences, that this likelihood ratio test statistic had the expected  w 2 distribution. In contrast, in the WTCCC data these statistics areconsiderably overdispersed, demonstrating that differential errors arepervasive in this empirical example ( Supplementary Fig. 2  online).The same test also shows highly significant differential bias in thearray-CGH and PRTexamples shown in  Figure 1  ( P   ¼  1.2    10  6 and1.1    10  12 for panels B and C, respectively). These observationsconfirm that the features of copy number data modeled in thesimulations are indeed present in empirical data from different plat-forms, including SNP genotyping, array-CGH and PRT datasets,and that accounting for differential errors is essential for robustCNV-association testing. Maximizing information from probes in the same CNV All the association methods described above require a single measureto discriminate between different copy numbers. However, many CNVassay methods use multiple probes in each CNV (for example, eachCNV in the Affymetrix 500K SNP genotyping data is identified by multiple SNP probe sets), so some method for summarizing thesemeasurements is necessary. The obvious approaches are to use themean or median. However, these are not optimal, as different probesdiffer in their informativeness, not least because copy number regionboundaries can be uncertain. We developed two improved proceduresto weight the information from each probe within each CNV region( Supplementary Methods ). The first procedure is to use the firstprincipal component from the intensity data from different probes.This, by downweighting probe intensities uncorrelated with theremainder, generally gives a better separation of different copy numbers than the mean or median of all of the probe intensities.However, we suspected that the weights would still not be optimal,and developed a one-step refinement of these scores: we fitted theGaussian mixture model to the principal-component scores and thenused the estimated CNV assignments to compute an optimal lineardiscriminant function of probe intensities. We demonstrated thatthese procedures resulted in improved clustering quality across these95 CNVs in the Affymetrix 500K data from the WTCCC ( Fig. 5a  and Supplementary Fig. 3  online). We suggest that such procedures willhave general utility for many applications (for example, array CGH)where multiple probes identify the same variant. As expected, theimproved summary methods provide considerable protection againstoverestimation of CNV boundaries ( Supplementary Fig. 4  online).Given that the vast majority of known CNVs do not currently haveprecisely mapped breakpoints, being able to overestimate the extent of CNVs without seriously downgrading measurement quality is asignificant advantage. Power estimation We then assessed the statistical power of the likelihood ratio trend testusing simulated data across a range of signal-to-noise ratios. Thestatistical power of the likelihood ratio trend test can be estimated by aquadratic approximation of the profile likelihood (see Methods). Weobserved that when copy number clusters are discrete the likelihoodratio trend test achieves the maximum theoretical power, but thatpower falls off rapidly with decreasing clustering quality ( Fig. 5b ). Theloss of power is much more pronounced when the model allows fordifferent measurement properties between cases and controls, whichreflects the increasing difficulty in distinguishing between association 3.0 a b c Overdispersion (  ) as a functionof clustering quality ( Q  )Overdispersion (  ) as a functionof differential means ( ∆  )Overdispersion (  ) as a functionof differential variances ( ∆  2 ) Mann-WhitneyA priori binningMM-CMM-SMM-S95LR trend test 2.52.01.5          1.00.50.03.02.52.01.5          1.00.50.00.000.050.10 ∆   ∆  2 0.153.02.52.01.5          1.01.01.21.41.61.8 0.50.07654 Q Q  : 7.5  Q  : 5.5  Q  : 3.5 ∆    = 0.08 ∆    = 0.18 ∆  2   = 1.08 ∆  2   = 1.4 ∆  2   = 1.8 ∆    = 0.023 Mann-WhitneyA priori binningMM-CMM-SMM-S95LR trend testMann-WhitneyA priori binningMM-CMM-SMM-S95LR trend test Figure 4  Sensitivity of 1-d.f. association testing methods to clustering quality and differential errors between cases and controls in simulated data. Sixalternative association methods are considered: (i) Mann-Whitney testing for difference in location of CNV measurement distributions, (ii)  w 2 trend tests ondata binned with a priori thresholds, (iii)  w 2 trend tests on mixture model assignment of case and controls together (MM-C), (iv)  w 2 trend tests on mixturemodel assignment of case and controls separately (MM-S), (v)  w 2 trend tests on high confidence mixture model assignment of case and controls separately(MM-S95) and (vi) likelihood ratio trend test. Overdispersion ( l ) is estimated robustly from a linear fit to the first 90% of quantile-quantile plots from 1,000simulated datasets. ( a ) Overdispersion is estimated for alternative association methods at ten different clustering qualities. Density plots for three clusteringqualities are shown at the bottom. ( b ) Overdispersion is estimated for alternative association methods at ten different values of differential shift of means.Density plots for three values of differential shift are shown at the bottom with case and control groups in red and gray. ( c ) Overdispersion is shown foralternative association methods at ten different values of differential shifts in variance. Density plots for three values of differential shift are shown at thebottom with case and control groups in red and gray. 1248  VOLUME 40  [  NUMBER 10  [  OCTOBER 2008  NATURE GENETICS TECHNICAL REPORTS  and differences in measurement properties as clustering quality declines. This result is replicated in the empirical data on 95 CNVsdescribed above ( Fig. 5c ). We noted that this marked fall-off inpower is even more pronounced when copy number frequenciesare low, owing to the increased difficulty of accurately modellingmeasurement distributions.Although it could be argued that the more serious loss of power dueto the need to model differential errors points to a need for betterstudy design rather than additional statistical sophistication, sucheffects are very difficult (and often impossible) to exclude. Rarely cancases and controls be approached in strictly comparable circumstancesthat ensure identical DNA handling. Moreover, prospective groupstudies will rarely yield sufficient numbers of cases of disease to detectmodest effect sizes. Family-based association studies will, perhaps, facefewer difficulties in this respect. Quantitative traits We generalized the likelihood ratio trend test for use in quantitativetrait association by replacing the logistic regression for dependency between copy number and phenotype in our model by a simple linearregression (LR-QT test). Although studies of quantitative traits areoften carried out in a manner that effectively excludes the differentialerrors that largely concern us here, this may not always be the case;notably, differential errors can be introduced by experimental batches,which may be confounded with the trait (for example, when extremesof the trait distribution are targeted). Although careful study designmay control type 1 errors, we have also shown by simulation( Supplementary Fig. 5  online) that our approach, without allowancefor measurement distribution differences, is more powerful thansimple tests on the CNV measurements based on linear regression 4 .This advantage is maintained over a wide range of signal-to-noiseratios (clustering qualities). Empirical examples of positive associations We explored the performance of the likelihood ratio trend teston known CNV associations for both binary disease traits andquantitative traits in empirical data. Type 1 diabetes (T1D) isknown to be strongly associated to the MHC class I region, whichcontains several CNVs and across which there is long-range linkagedisequilibrium. Therefore, we should expect to see indirect association 1.0LDFPCAMean25 ab c    C   l  u  s   t  e  r   i  n  g  q  u  a   l   i   t  y 20151050WTCCC CNVs ( N   = 95)0.8Clustering quality for three probe-set summary methods    P  r  o   b  a   b   i   l   i   t  y  o   f       P   <   1   0   –   4    P  r  o   b  a   b   i   l   i   t  y   t   h  a   t       P   <   1   0   –   4 0.60.40.20.00.20.40.60.81.076 MAF = 0.3, assume no diff. biasMAF = 0.3, allow for diff. biasMAF = 0.1, assume no diff. biasMAF = 0.1, allow for diff. bias 5Clustering quality ( Q  )Simulated dataPower vs. clustering qualityEmpirical dataPower vs. clustering qualityClustering quality ( Q  )436.05.55.04.53.53.04.0 +++ 0.35 ab Controls: 1958BC ( N   = 1,105)LOC288077Expression vs. log 2  ratioLOC288077Expression vs. copy numberLOC288077 expressionLR trend testControls: NBS ( N   = 1,488)0.300.25    D  e  n  s   i   t  y   N  o  r  m  a   l   i  z  e   d  e  x  p  r  e  s  s   i  o  n   N  o  r  m  a   l   i  z  e   d  e  x  p  r  e  s  s   i  o  n   F  r  e  q  u  e  n  c  y 0.200.150.1011 P   = 6.3255e ×  10 –10 P   = 1.0084e ×  10 –9 P   = 0.00148 P   < 10 –16 109–0.50.0log 2  ratio0.51.01234Mixture model assignment58711605040302010–0.50.51.00.0Copy number signal0109870.050.00–10–5Signal0–10–15–5Signal0–10–15–5Signal00.350.300.25    D  e  n  s   i   t  y 0.200.150.100.050.000.350.300.25    D  e  n  s   i   t  y 0.200.150.100.050.00Cases: T1D ( N   = 1,654) Figure 6  Examples of empirical CNVassociations. ( a ) Association with a binary diseasetrait, type 1 diabetes (T1D). The red shaded arearepresents a density plot of copy numbermeasurement in each group. The two WTCCCcontrol groups come from the 1958 Birth Group(1958BC) and the National Blood Service (NBS).The colored lines reflect the posterior probabilitydistribution for each mixture in the fitted mixturemodel. The  P   value derives from the LR trendtest comparing case and control groups. ( b ) Thefirst panel shows normalized expression of gene LOC288077   against copy number measurement,with a linear regression shown in blue. Thesecond panel shows normalized gene expressionagainst mixture model assignment, with alinear regression shown in blue. The  P   valuesin these two plots represent the nominal  P   valueson the regression. The third panel shows ahistogram of copy number measurement andthe colored lines represent the posteriorprobability distribution for each of the five copynumber classes in the fitted mixture model usedin the LR trend test. Figure 5  Statistical power of the likelihood ratio trend test. ( a ) Clusteringquality resulting from alternative probe summary methods for 95 CNVs:linear discriminant function (LDF), principal components analysis (PCA)and arithmetic mean (mean). ( b ) Statistical power of the LR trend test insimulated data of varying clustering quality is shown for two minor allelefrequencies (MAF) with odds ratios (OR) set to equalize maximal theoreticalpower at 90%. Power is estimated for 2,000 cases and 2,000 controlsunder two conditions: (i) a model that assumes no differential errors and(ii) a model allowing for differential errors. ( c ) Statistical power of the LRtrend test in empirical data from 95 CNVs of varying clustering quality.Power is estimated for 2,000 cases and 2,000 controls, with odds ratios(OR) set to equalize maximal theoretical power at 90%. For ease of display,where the clustering quality ( Q  ) of a CNV exceeds a value of 6, it has beenset to 6. NATURE GENETICS  VOLUME 40  [  NUMBER 10  [  OCTOBER 2008  1249 TECHNICAL REPORTS
Search
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks