3
|
Yan S, Yuan S, Xu Z, Zhang B, Zhang B, Kang G, Byrnes A, Li Y. Likelihood-based complex trait association testing for arbitrary depth sequencing data. Bioinformatics 2015; 31:2955-62. [PMID: 25979475 PMCID: PMC4668777 DOI: 10.1093/bioinformatics/btv307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2014] [Revised: 05/06/2015] [Accepted: 05/11/2015] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED In next generation sequencing (NGS)-based genetic studies, researchers typically perform genotype calling first and then apply standard genotype-based methods for association testing. However, such a two-step approach ignores genotype calling uncertainty in the association testing step and may incur power loss and/or inflated type-I error. In the recent literature, a few robust and efficient likelihood based methods including both likelihood ratio test (LRT) and score test have been proposed to carry out association testing without intermediate genotype calling. These methods take genotype calling uncertainty into account by directly incorporating genotype likelihood function (GLF) of NGS data into association analysis. However, existing LRT methods are computationally demanding or do not allow covariate adjustment; while existing score tests are not applicable to markers with low minor allele frequency (MAF). We provide an LRT allowing flexible covariate adjustment, develop a statistically more powerful score test and propose a combination strategy (UNC combo) to leverage the advantages of both tests. We have carried out extensive simulations to evaluate the performance of our proposed LRT and score test. Simulations and real data analysis demonstrate the advantages of our proposed combination strategy: it offers a satisfactory trade-off in terms of computational efficiency, applicability (accommodating both common variants and variants with low MAF) and statistical power, particularly for the analysis of quantitative trait where the power gain can be up to ∼60% when the causal variant is of low frequency (MAF < 0.01). AVAILABILITY AND IMPLEMENTATION UNC combo and the associated R files, including documentation, examples, are available at http://www.unc.edu/∼yunmli/UNCcombo/ CONTACT yunli@med.unc.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Song Yan
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Shuai Yuan
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Zheng Xu
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Baqun Zhang
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Bo Zhang
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Guolian Kang
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Andrea Byrnes
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Yun Li
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| |
Collapse
|
4
|
Derkach A, Chiang T, Gong J, Addis L, Dobbins S, Tomlinson I, Houlston R, Pal DK, Strug LJ. Association analysis using next-generation sequence data from publicly available control groups: the robust variance score statistic. ACTA ACUST UNITED AC 2014; 30:2179-88. [PMID: 24733292 DOI: 10.1093/bioinformatics/btu196] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
MOTIVATION Sufficiently powered case-control studies with next-generation sequence (NGS) data remain prohibitively expensive for many investigators. If feasible, a more efficient strategy would be to include publicly available sequenced controls. However, these studies can be confounded by differences in sequencing platform; alignment, single nucleotide polymorphism and variant calling algorithms; read depth; and selection thresholds. Assuming one can match cases and controls on the basis of ethnicity and other potential confounding factors, and one has access to the aligned reads in both groups, we investigate the effect of systematic differences in read depth and selection threshold when comparing allele frequencies between cases and controls. We propose a novel likelihood-based method, the robust variance score (RVS), that substitutes genotype calls by their expected values given observed sequence data. RESULTS We show theoretically that the RVS eliminates read depth bias in the estimation of minor allele frequency. We also demonstrate that, using simulated and real NGS data, the RVS method controls Type I error and has comparable power to the 'gold standard' analysis with the true underlying genotypes for both common and rare variants. AVAILABILITY AND IMPLEMENTATION An RVS R script and instructions can be found at strug.research.sickkids.ca, and at https://github.com/strug-lab/RVS. CONTACT lisa.strug@utoronto.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Andriy Derkach
- Department of Statistical Science, University of Toronto, Toronto, ON, Canada, Program in Child Health Evaluative Sciences, the Hospital for Sick Children Research Institute, Toronto, ON, Canada, Department of Clinical Neuroscience, Institute of Psychiatry, King's College London, London, Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, Surrey, Molecular and Population Genetics and NIHR Comprehensive Biomedical Research Centre, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK, Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Theodore Chiang
- Department of Statistical Science, University of Toronto, Toronto, ON, Canada, Program in Child Health Evaluative Sciences, the Hospital for Sick Children Research Institute, Toronto, ON, Canada, Department of Clinical Neuroscience, Institute of Psychiatry, King's College London, London, Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, Surrey, Molecular and Population Genetics and NIHR Comprehensive Biomedical Research Centre, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK, Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Jiafen Gong
- Department of Statistical Science, University of Toronto, Toronto, ON, Canada, Program in Child Health Evaluative Sciences, the Hospital for Sick Children Research Institute, Toronto, ON, Canada, Department of Clinical Neuroscience, Institute of Psychiatry, King's College London, London, Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, Surrey, Molecular and Population Genetics and NIHR Comprehensive Biomedical Research Centre, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK, Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Laura Addis
- Department of Statistical Science, University of Toronto, Toronto, ON, Canada, Program in Child Health Evaluative Sciences, the Hospital for Sick Children Research Institute, Toronto, ON, Canada, Department of Clinical Neuroscience, Institute of Psychiatry, King's College London, London, Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, Surrey, Molecular and Population Genetics and NIHR Comprehensive Biomedical Research Centre, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK, Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Sara Dobbins
- Department of Statistical Science, University of Toronto, Toronto, ON, Canada, Program in Child Health Evaluative Sciences, the Hospital for Sick Children Research Institute, Toronto, ON, Canada, Department of Clinical Neuroscience, Institute of Psychiatry, King's College London, London, Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, Surrey, Molecular and Population Genetics and NIHR Comprehensive Biomedical Research Centre, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK, Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Ian Tomlinson
- Department of Statistical Science, University of Toronto, Toronto, ON, Canada, Program in Child Health Evaluative Sciences, the Hospital for Sick Children Research Institute, Toronto, ON, Canada, Department of Clinical Neuroscience, Institute of Psychiatry, King's College London, London, Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, Surrey, Molecular and Population Genetics and NIHR Comprehensive Biomedical Research Centre, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK, Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Richard Houlston
- Department of Statistical Science, University of Toronto, Toronto, ON, Canada, Program in Child Health Evaluative Sciences, the Hospital for Sick Children Research Institute, Toronto, ON, Canada, Department of Clinical Neuroscience, Institute of Psychiatry, King's College London, London, Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, Surrey, Molecular and Population Genetics and NIHR Comprehensive Biomedical Research Centre, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK, Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Deb K Pal
- Department of Statistical Science, University of Toronto, Toronto, ON, Canada, Program in Child Health Evaluative Sciences, the Hospital for Sick Children Research Institute, Toronto, ON, Canada, Department of Clinical Neuroscience, Institute of Psychiatry, King's College London, London, Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, Surrey, Molecular and Population Genetics and NIHR Comprehensive Biomedical Research Centre, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK, Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Lisa J Strug
- Department of Statistical Science, University of Toronto, Toronto, ON, Canada, Program in Child Health Evaluative Sciences, the Hospital for Sick Children Research Institute, Toronto, ON, Canada, Department of Clinical Neuroscience, Institute of Psychiatry, King's College London, London, Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, Surrey, Molecular and Population Genetics and NIHR Comprehensive Biomedical Research Centre, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK, Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, CanadaDepartment of Statistical Science, University of Toronto, Toronto, ON, Canada, Program in Child Health Evaluative Sciences, the Hospital for Sick Children Research Institute, Toronto, ON, Canada, Department of Clinical Neuroscience, Institute of Psychiatry, King's College London, London, Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, Surrey, Molecular and Population Genetics and NIHR Comprehensive Biomedical Research Centre, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK, Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| |
Collapse
|