1
|
Zhu L, Yan S, Cao X, Zhang S, Sha Q. Integrating External Controls by Regression Calibration for Genome-Wide Association Study. Genes (Basel) 2024; 15:67. [PMID: 38254957 PMCID: PMC10815702 DOI: 10.3390/genes15010067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 12/30/2023] [Accepted: 01/01/2024] [Indexed: 01/24/2024] Open
Abstract
Genome-wide association studies (GWAS) have successfully revealed many disease-associated genetic variants. For a case-control study, the adequate power of an association test can be achieved with a large sample size, although genotyping large samples is expensive. A cost-effective strategy to boost power is to integrate external control samples with publicly available genotyped data. However, the naive integration of external controls may inflate the type I error rates if ignoring the systematic differences (batch effect) between studies, such as the differences in sequencing platforms, genotype-calling procedures, population stratification, and so forth. To account for the batch effect, we propose an approach by integrating External Controls into the Association Test by Regression Calibration (iECAT-RC) in case-control association studies. Extensive simulation studies show that iECAT-RC not only can control type I error rates but also can boost statistical power in all models. We also apply iECAT-RC to the UK Biobank data for M72 Fibroblastic disorders by considering genotype calling as the batch effect. Four SNPs associated with fibroblastic disorders have been detected by iECAT-RC and the other two comparison methods, iECAT-Score and Internal. However, our method has a higher probability of identifying these significant SNPs in the scenario of an unbalanced case-control association study.
Collapse
Affiliation(s)
| | | | | | | | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA; (L.Z.); (S.Y.); (X.C.); (S.Z.)
| |
Collapse
|
2
|
Gentry AE, Alexander JC, Ahangari M, Peterson RE, Miles MF, Bettinger JC, Davies AG, Groteweil M, Bacanu SA, Kendler KS, Riley BP, Webb BT. Case-only exome variation analysis of severe alcohol dependence using a multivariate hierarchical gene clustering approach. PLoS One 2023; 18:e0283985. [PMID: 37098020 PMCID: PMC10128939 DOI: 10.1371/journal.pone.0283985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 03/21/2023] [Indexed: 04/26/2023] Open
Abstract
BACKGROUND Variation in genes involved in ethanol metabolism has been shown to influence risk for alcohol dependence (AD) including protective loss of function alleles in ethanol metabolizing genes. We therefore hypothesized that people with severe AD would exhibit different patterns of rare functional variation in genes with strong prior evidence for influencing ethanol metabolism and response when compared to genes not meeting these criteria. OBJECTIVE Leverage a novel case only design and Whole Exome Sequencing (WES) of severe AD cases from the island of Ireland to quantify differences in functional variation between genes associated with ethanol metabolism and/or response and their matched control genes. METHODS First, three sets of ethanol related genes were identified including those a) involved in alcohol metabolism in humans b) showing altered expression in mouse brain after alcohol exposure, and altering ethanol behavioral responses in invertebrate models. These genes of interest (GOI) sets were matched to control gene sets using multivariate hierarchical clustering of gene-level summary features from gnomAD. Using WES data from 190 individuals with severe AD, GOI were compared to matched control genes using logistic regression to detect aggregate differences in abundance of loss of function, missense, and synonymous variants, respectively. RESULTS Three non-independent sets of 10, 117, and 359 genes were queried against control gene sets of 139, 1522, and 3360 matched genes, respectively. Significant differences were not detected in the number of functional variants in the primary set of ethanol-metabolizing genes. In both the mouse expression and invertebrate sets, we observed an increased number of synonymous variants in GOI over matched control genes. Post-hoc simulations showed the estimated effects sizes observed are unlikely to be under-estimated. CONCLUSION The proposed method demonstrates a computationally viable and statistically appropriate approach for genetic analysis of case-only data for hypothesized gene sets supported by empirical evidence.
Collapse
Affiliation(s)
- Amanda Elswick Gentry
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, Virginia, United States of America
- Department of Psychiatry, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Jeffry C. Alexander
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Mohammad Ahangari
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, Virginia, United States of America
- Integrative Life Sciences Ph.D. Program, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Roseann E. Peterson
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, Virginia, United States of America
- Department of Psychiatry, Virginia Commonwealth University, Richmond, Virginia, United States of America
- Department of Psychiatry and Behavioral Sciences, Institute for Genomics in Health, SUNY Downstate Health Sciences University, Brooklyn, New York, United States of America
- VCU Alcohol Research Center, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Michael F. Miles
- VCU Alcohol Research Center, Virginia Commonwealth University, Richmond, Virginia, United States of America
- Department of Pharmacology and Toxicology, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Jill C. Bettinger
- VCU Alcohol Research Center, Virginia Commonwealth University, Richmond, Virginia, United States of America
- Department of Pharmacology and Toxicology, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Andrew G. Davies
- VCU Alcohol Research Center, Virginia Commonwealth University, Richmond, Virginia, United States of America
- Department of Pharmacology and Toxicology, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Mike Groteweil
- VCU Alcohol Research Center, Virginia Commonwealth University, Richmond, Virginia, United States of America
- Department of Human and Molecular Genetics, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Silviu A. Bacanu
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, Virginia, United States of America
- Department of Psychiatry, Virginia Commonwealth University, Richmond, Virginia, United States of America
- VCU Alcohol Research Center, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Kenneth S. Kendler
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, Virginia, United States of America
- Department of Psychiatry, Virginia Commonwealth University, Richmond, Virginia, United States of America
- VCU Alcohol Research Center, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Brien P. Riley
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, Virginia, United States of America
- Department of Psychiatry, Virginia Commonwealth University, Richmond, Virginia, United States of America
- VCU Alcohol Research Center, Virginia Commonwealth University, Richmond, Virginia, United States of America
- Department of Human and Molecular Genetics, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Bradley T. Webb
- VCU Alcohol Research Center, Virginia Commonwealth University, Richmond, Virginia, United States of America
- GenOmics, Bioinformatics, and Translational Research Center, Biostatistics and Epidemiology Division, RTI International, Research Triangle Park, North Caroline, United States of America
| | | |
Collapse
|
3
|
Chen W, Coombes BJ, Larson NB. Recent advances and challenges of rare variant association analysis in the biobank sequencing era. Front Genet 2022; 13:1014947. [PMID: 36276986 PMCID: PMC9582646 DOI: 10.3389/fgene.2022.1014947] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 09/22/2022] [Indexed: 12/04/2022] Open
Abstract
Causal variants for rare genetic diseases are often rare in the general population. Rare variants may also contribute to common complex traits and can have much larger per-allele effect sizes than common variants, although power to detect these associations can be limited. Sequencing costs have steadily declined with technological advancements, making it feasible to adopt whole-exome and whole-genome profiling for large biobank-scale sample sizes. These large amounts of sequencing data provide both opportunities and challenges for rare-variant association analysis. Herein, we review the basic concepts of rare-variant analysis methods, the current state-of-the-art methods in utilizing variant annotations or external controls to improve the statistical power, and particular challenges facing rare variant analysis such as accounting for population structure, extremely unbalanced case-control design. We also review recent advances and challenges in rare variant analysis for familial sequencing data and for more complex phenotypes such as survival data. Finally, we discuss other potential directions for further methodology investigation.
Collapse
Affiliation(s)
- Wenan Chen
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| | - Brandon J. Coombes
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| | - Nicholas B. Larson
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| |
Collapse
|
4
|
Li Y, Lee S. Integrating external controls in case–control studies improves power for rare‐variant tests. Genet Epidemiol 2022; 46:145-158. [PMID: 35170803 PMCID: PMC9393083 DOI: 10.1002/gepi.22444] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Revised: 12/29/2021] [Accepted: 01/20/2022] [Indexed: 11/08/2022]
Abstract
Large-scale sequencing and genotyping data provide an opportunity to integrate external samples as controls to improve power of association tests. However, due to the systematic differences between genotyped samples from different studies, naively aggregating the controls could lead to inflation in Type I error rates. There has been recent effort to integrate external controls while adjusting for batch effect, such as the integrating External Controls into Association Test (iECAT) and its score-based single variant tests. Building on the original iECAT framework, we propose an iECAT-Score region-based test that increases power for rare-variant tests when integrating external controls. This method assesses the systematic batch effect between internal and external samples at each variant and constructs compound shrinkage score statistics to test for the joint genetic effect within a gene or a region, while adjusting for covariates and population stratification. Through simulation studies, we demonstrate that the proposed method controls for Type I error rates and improves power in rare-variant tests. The application of the proposed method to the association studies of age-related macular degeneration (AMD) from the International AMD Genomics Consortium and UK Biobank revealed novel rare-variant associations in gene DXO. Through the incorporation of external controls, the iECAT methods offer a powerful suite to identify disease-associated genetic variants, further shedding light on future directions to investigate roles of rare variants in human diseases.
Collapse
Affiliation(s)
- Yatong Li
- Department of Biostatistics University of Michigan Ann Arbor Michigan USA
| | - Seunggeun Lee
- Department of Biostatistics University of Michigan Ann Arbor Michigan USA
- Graduate School of Data Science Seoul National University Seoul Republic of Korea
| |
Collapse
|
5
|
Chen D, Tashman K, Palmer DS, Neale B, Roeder K, Bloemendal A, Churchhouse C, Ke ZT. A data harmonization pipeline to leverage external controls and boost power in GWAS. Hum Mol Genet 2021; 31:481-489. [PMID: 34508597 PMCID: PMC8825237 DOI: 10.1093/hmg/ddab261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 09/02/2021] [Accepted: 09/03/2021] [Indexed: 11/12/2022] Open
Abstract
The use of external controls in genome-wide association study (GWAS) can significantly increase the size and diversity of the control sample, enabling high-resolution ancestry matching and enhancing the power to detect association signals. However, the aggregation of controls from multiple sources is challenging due to batch effects, difficulty in identifying genotyping errors, and the use of different genotyping platforms. These obstacles have impeded the use of external controls in GWAS and can lead to spurious results if not carefully addressed. We propose a unified data harmonization pipeline that includes an iterative approach to quality control (QC) and imputation, implemented before and after merging cohorts and arrays. We apply this harmonization pipeline to aggregate 27 517 European control samples from 16 collections within dbGaP. We leverage these harmonized controls to conduct a GWAS of Crohn's disease. We demonstrate a boost in power over using the cohort samples alone, and that our procedure results in summary statistics free of any significant batch effects. This harmonization pipeline for aggregating genotype data from multiple sources can also serve other applications where individual level genotypes, rather than summary statistics, are required.
Collapse
Affiliation(s)
- Danfeng Chen
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, 08544, New Jersey, United States
| | - Katherine Tashman
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, 02114, Massachusetts, United States.,Stanley Center for Psychiatric Research, Broad Institute of of MIT and Harvard, Cambridge, 02142, Massachusetts, United States
| | - Duncan S Palmer
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, 02114, Massachusetts, United States.,Stanley Center for Psychiatric Research, Broad Institute of of MIT and Harvard, Cambridge, 02142, Massachusetts, United States
| | - Benjamin Neale
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, 02114, Massachusetts, United States.,Stanley Center for Psychiatric Research, Broad Institute of of MIT and Harvard, Cambridge, 02142, Massachusetts, United States.,Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, 02142, Massachusetts, United States
| | - Kathryn Roeder
- Department of Statistics, Carnegie Mellon University, Pittsburgh, 15213, Pennsylvania, United States
| | - Alex Bloemendal
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, 02114, Massachusetts, United States.,Stanley Center for Psychiatric Research, Broad Institute of of MIT and Harvard, Cambridge, 02142, Massachusetts, United States
| | - Claire Churchhouse
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, 02114, Massachusetts, United States.,Stanley Center for Psychiatric Research, Broad Institute of of MIT and Harvard, Cambridge, 02142, Massachusetts, United States.,Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, 02142, Massachusetts, United States
| | - Zheng Tracy Ke
- Department of Statistics, Harvard University, Cambridge, 02138, Massachusetts, United States
| |
Collapse
|
6
|
Li Y, Lee S. Novel score test to increase power in association test by integrating external controls. Genet Epidemiol 2020; 45:293-304. [PMID: 33161601 PMCID: PMC9424128 DOI: 10.1002/gepi.22370] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 10/13/2020] [Accepted: 10/20/2020] [Indexed: 12/18/2022]
Abstract
Recent advances in genotyping and sequencing technologies have enabled genetic association studies to leverage high-quality genotyped data to identify variants accounting for a substantial portion of disease risk. The usage of external controls, whose genomes have already been genotyped and are publicly available, could be a cost-effective approach to increase the power of association testing. There has been recent effort to integrate external controls while adjusting for possible batch effects, such as the integrating External Controls into Association Test (iECAT). The original iECAT test, however, cannot adjust for covariates such as age, gender, and so forth. Hence, based on the insight of iECAT, we propose a novel score-based test that allows for covariate adjustment and constructs a shrinkage score statistic that is a weighted sum of the score statistics using exclusively internal samples and uses both internal and external control samples. We assess the existence of batch effect at a variant by comparing control samples of internal and external sources. We show by simulation studies that our method has increased power over the original iECAT while controlling for type I error rates. We present the application of our method to the association studies of age-related macular degeneration (AMD) utilizing data from the International AMD Genomics Consortium and Michigan Genomics Initiative. Through the incorporation of the score test approach, we extend the use of iECAT to adjust for covariates and improve power, further honing the statistical methods needed to identify disease-causing variants within the human genome.
Collapse
Affiliation(s)
- Yatong Li
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Seunggeun Lee
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.,Department of Data Science, Graduate School of Data Science, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
7
|
Chen S, Lin X. Analysis in case-control sequencing association studies with different sequencing depths. Biostatistics 2020; 21:577-593. [PMID: 30590456 PMCID: PMC7308042 DOI: 10.1093/biostatistics/kxy073] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2018] [Revised: 10/17/2018] [Accepted: 10/21/2018] [Indexed: 01/09/2023] Open
Abstract
With the advent of next-generation sequencing, investigators have access to higher quality sequencing data. However, to sequence all samples in a study using next generation sequencing can still be prohibitively expensive. One potential remedy could be to combine next generation sequencing data from cases with publicly available sequencing data for controls, but there could be a systematic difference in quality of sequenced data, such as sequencing depths, between sequenced study cases and publicly available controls. We propose a regression calibration (RC)-based method and a maximum-likelihood method for conducting an association study with such a combined sample by accounting for differential sequencing errors between cases and controls. The methods allow for adjusting for covariates, such as population stratification as confounders. Both methods control type I error and have comparable power to analysis conducted using the true genotype with sufficiently high but different sequencing depths. We show that the RC method allows for analysis using naive variance estimate (closely approximates true variance in practice) and standard software under certain circumstances. We evaluate the performance of the proposed methods using simulation studies and apply our methods to a combined data set of exome sequenced acute lung injury cases and healthy controls from the 1000 Genomes project.
Collapse
Affiliation(s)
- Sixing Chen
- Department of Biostatistics, Harvard TH Chan School of Public Health, 655 Huntington Avenue, Building 2, 4th Floor, Boston, MA 02115, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard TH Chan School of Public Health, 655 Huntington Avenue, Building 2, 4th Floor, Boston, MA 02115, USA
- Department of Statistics, Harvard University, One Oxford Street, Suite 400, Cambridge, MA 02138-2901, USA
| |
Collapse
|
8
|
Baskurt Z, Mastromatteo S, Gong J, Wintle RF, Scherer SW, Strug LJ. VikNGS: a C++ variant integration kit for next generation sequencing association analysis. Bioinformatics 2020; 36:1283-1285. [PMID: 31580400 PMCID: PMC7703770 DOI: 10.1093/bioinformatics/btz716] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 08/13/2019] [Accepted: 09/25/2019] [Indexed: 11/14/2022] Open
Abstract
SUMMARY Integration of next generation sequencing data (NGS) across different research studies can improve the power of genetic association testing by increasing sample size and can obviate the need for sequencing controls. If differential genotype uncertainty across studies is not accounted for, combining datasets can produce spurious association results. We developed the Variant Integration Kit for NGS (VikNGS), a fast cross-platform software package, to enable aggregation of several datasets for rare and common variant genetic association analysis of quantitative and binary traits with covariate adjustment. VikNGS also includes a graphical user interface, power simulation functionality and data visualization tools. AVAILABILITY AND IMPLEMENTATION The VikNGS package can be downloaded at http://www.tcag.ca/tools/index.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zeynep Baskurt
- Program in Genetics and Genome Biology, Research Institute, The Hospital for Sick Children, Toronto, ON M5G0A4, Canada
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON M5G0A4, Canada
| | - Scott Mastromatteo
- Program in Genetics and Genome Biology, Research Institute, The Hospital for Sick Children, Toronto, ON M5G0A4, Canada
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON M5G0A4, Canada
| | - Jiafen Gong
- Program in Genetics and Genome Biology, Research Institute, The Hospital for Sick Children, Toronto, ON M5G0A4, Canada
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON M5G0A4, Canada
| | - Richard F Wintle
- Program in Genetics and Genome Biology, Research Institute, The Hospital for Sick Children, Toronto, ON M5G0A4, Canada
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON M5G0A4, Canada
| | - Stephen W Scherer
- Program in Genetics and Genome Biology, Research Institute, The Hospital for Sick Children, Toronto, ON M5G0A4, Canada
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON M5G0A4, Canada
- McLaughlin Centre and Department of Molecular Genetics, University of Toronto, Toronto, ON M5G 0A4, Canada
| | - Lisa J Strug
- Program in Genetics and Genome Biology, Research Institute, The Hospital for Sick Children, Toronto, ON M5G0A4, Canada
- The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON M5G0A4, Canada
- Division of Biostatistics and Department of Statistical Sciences, University of Toronto, Toronto, ON, M5T3M7, Canada
| |
Collapse
|
9
|
Stahel P, Nahmias A, Sud SK, Lee SJ, Pucci A, Yousseif A, Youseff A, Jackson T, Urbach DR, Okrainec A, Allard JP, Sockalingam S, Yao T, Barua M, Jiao H, Magi R, Bassett AS, Paterson AD, Dahlman I, Batterham RL, Dash S. Evaluation of the Genetic Association Between Adult Obesity and Neuropsychiatric Disease. Diabetes 2019; 68:2235-2246. [PMID: 31506345 DOI: 10.2337/db18-1254] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 08/27/2019] [Indexed: 11/13/2022]
Abstract
Extreme obesity (EO) (BMI >50 kg/m2) is frequently associated with neuropsychiatric disease (NPD). As both EO and NPD are heritable central nervous system disorders, we assessed the prevalence of protein-truncating variants (PTVs) and copy number variants (CNVs) in genes/regions previously implicated in NPD in adults with EO (n = 149) referred for weight loss/bariatric surgery. We also assessed the prevalence of CNVs in patients referred to University College London Hospital (UCLH) with EO (n = 218) and obesity (O) (BMI 35-50 kg/m2; n = 374) and a Swedish cohort of participants from the community with predominantly O (n = 161). The prevalence of variants was compared with control subjects in the Exome Aggregation Consortium/Genome Aggregation Database. In the discovery cohort (high NPD prevalence: 77%), the cumulative PTV/CNV allele frequency (AF) was 7.7% vs. 2.6% in control subjects (odds ratio [OR] 3.1 [95% CI 2-4.1]; P < 0.0001). In the UCLH EO cohort (intermediate NPD prevalence: 47%), CNV AF (1.8% vs. 0.9% in control subjects; OR 1.95 [95% CI 0.96-3.93]; P = 0.06) was lower than the discovery cohort. CNV AF was not increased in the UCLH O cohort (0.8%). No CNVs were identified in the Swedish cohort with no NPD. These findings suggest that PTV/CNVs, in genes/regions previously associated with NPD, may contribute to NPD in patients with EO.
Collapse
Affiliation(s)
- Priska Stahel
- Department of Medicine, Banting & Best Diabetes Centre, University of Toronto, Toronto, Ontario, Canada
| | - Avital Nahmias
- Department of Medicine, Banting & Best Diabetes Centre, University of Toronto, Toronto, Ontario, Canada
| | - Shawn K Sud
- Department of Medicine, Banting & Best Diabetes Centre, University of Toronto, Toronto, Ontario, Canada
| | - So Jeong Lee
- Department of Medicine, Banting & Best Diabetes Centre, University of Toronto, Toronto, Ontario, Canada
| | - Andrea Pucci
- Centre for Obesity Research, Rayne Institute, Department of Medicine, University College London, London, U.K
- UCLH Bariatric Centre for Weight Management and Metabolic Surgery, University College London Hospital, London, U.K
- NIHR Biomedical Research Centre at University College London Hospitals NHS Foundation Trust and University College London, London, U.K
| | - Ahmed Yousseif
- Centre for Obesity Research, Rayne Institute, Department of Medicine, University College London, London, U.K
- UCLH Bariatric Centre for Weight Management and Metabolic Surgery, University College London Hospital, London, U.K
- NIHR Biomedical Research Centre at University College London Hospitals NHS Foundation Trust and University College London, London, U.K
| | - Alaa Youseff
- Institute of Medical Science, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Timothy Jackson
- Institute of Medical Science, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
- Division of General Surgery, University Health Network, Toronto, Ontario, Canada
| | - David R Urbach
- Division of General Surgery, University Health Network, Toronto, Ontario, Canada
| | - Allan Okrainec
- Division of General Surgery, University Health Network, Toronto, Ontario, Canada
- Department of Surgery, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Johane P Allard
- Bariatric Surgery Department, Toronto Western Hospital, Toronto, Ontario, Canada
- Department of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Sanjeev Sockalingam
- Department of Surgery, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
- Department of Nutritional Sciences, University of Toronto, Toronto, Ontario, Canada
- Centre for Mental Health, University Health Network, Toronto, Ontario, Canada
| | - Tony Yao
- Division of Epidemiology and Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Moumita Barua
- Division of Epidemiology and Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Hong Jiao
- Division of Nephrology, Department of Medicine, Toronto General Research Institute, University Health Network, Toronto, Ontario, Canada
| | - Reedik Magi
- Department of Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Anne S Bassett
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
- The Dalglish Family 22q Clinic, University Health Network, Toronto, Ontario, Canada
- Clinical Genetics Research Program, Centre for Addiction and Mental Health, Toronto, Ontario, Canada
- Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada
- Department of Psychiatry, University Health Network, Toronto, Ontario, Canada
- Division of Cardiology, Department of Medicine, University Health Network, Toronto, Ontario, Canada
| | - Andrew D Paterson
- Department of Nutritional Sciences, University of Toronto, Toronto, Ontario, Canada
- Toronto General Research Institute, University Health Network, Toronto, Ontario, Canada
| | - Ingrid Dahlman
- Division of Epidemiology and Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Rachel L Batterham
- Centre for Obesity Research, Rayne Institute, Department of Medicine, University College London, London, U.K
- UCLH Bariatric Centre for Weight Management and Metabolic Surgery, University College London Hospital, London, U.K
- NIHR Biomedical Research Centre at University College London Hospitals NHS Foundation Trust and University College London, London, U.K
| | - Satya Dash
- Department of Medicine, Banting & Best Diabetes Centre, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
10
|
Trost B, Walker S, Haider SA, Sung WWL, Pereira S, Phillips CL, Higginbotham EJ, Strug LJ, Nguyen C, Raajkumar A, Szego MJ, Marshall CR, Scherer SW. Impact of DNA source on genetic variant detection from human whole-genome sequencing data. J Med Genet 2019; 56:809-817. [PMID: 31515274 PMCID: PMC6929712 DOI: 10.1136/jmedgenet-2019-106281] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2019] [Revised: 07/04/2019] [Accepted: 07/20/2019] [Indexed: 12/28/2022]
Abstract
BACKGROUND Whole blood is currently the most common DNA source for whole-genome sequencing (WGS), but for studies requiring non-invasive collection, self-collection, greater sample stability or additional tissue references, saliva or buccal samples may be preferred. However, the relative quality of sequencing data and accuracy of genetic variant detection from blood-derived, saliva-derived and buccal-derived DNA need to be thoroughly investigated. METHODS Matched blood, saliva and buccal samples from four unrelated individuals were used to compare sequencing metrics and variant-detection accuracy among these DNA sources. RESULTS We observed significant differences among DNA sources for sequencing quality metrics such as percentage of reads aligned and mean read depth (p<0.05). Differences were negligible in the accuracy of detecting short insertions and deletions; however, the false positive rate for single nucleotide variation detection was slightly higher in some saliva and buccal samples. The sensitivity of copy number variant (CNV) detection was up to 25% higher in blood samples, depending on CNV size and type, and appeared to be worse in saliva and buccal samples with high bacterial concentration. We also show that methylation-based enrichment for eukaryotic DNA in saliva and buccal samples increased alignment rates but also reduced read-depth uniformity, hampering CNV detection. CONCLUSION For WGS, we recommend using DNA extracted from blood rather than saliva or buccal swabs; if saliva or buccal samples are used, we recommend against using methylation-based eukaryotic DNA enrichment. All data used in this study are available for further open-science investigation.
Collapse
Affiliation(s)
- Brett Trost
- The Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Susan Walker
- The Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Syed A Haider
- The Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Wilson W L Sung
- The Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Sergio Pereira
- The Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Charly L Phillips
- The Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Edward J Higginbotham
- The Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Lisa J Strug
- The Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ontario, Canada.,Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Charlotte Nguyen
- The Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Akshaya Raajkumar
- The Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ontario, Canada
| | - Michael J Szego
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.,Department of Family and Community Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Christian R Marshall
- Department of Paediatric Laboratory Medicine, Genome Diagnostics, Hospital for Sick Children, Toronto, Ontario, Canada.,Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada
| | - Stephen W Scherer
- The Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ontario, Canada .,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
11
|
Saad M, Wijsman EM. Association score testing for rare variants and binary traits in family data with shared controls. Brief Bioinform 2019; 20:245-253. [PMID: 28968627 DOI: 10.1093/bib/bbx107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Indexed: 11/12/2022] Open
Abstract
Genome-wide association studies have been an important approach used to localize trait loci, with primary focus on common variants. The multiple rare variant-common disease hypothesis may explain the missing heritability remaining after accounting for identified common variants. Advances of sequencing technologies with their decreasing costs, coupled with methodological advances in the context of association studies in large samples, now make the study of rare variants at a genome-wide scale feasible. The resurgence of family-based association designs because of their advantage in studying rare variants has also stimulated more methods development, mainly based on linear mixed models (LMMs). Other tests such as score tests can have advantages over the LMMs, but to date have mainly been proposed for single-marker association tests. In this article, we extend several score tests (χcorrected2, WQLS, and SKAT) to the multiple variant association framework. We evaluate and compare their statistical performances relative with the LMM. Moreover, we show that three tests can be cast as the difference between marker allele frequencies (AFs) estimated in each of the group of affected and unaffected subjects. We show that these tests are flexible, as they can be based on related, unrelated or both related and unrelated subjects. They also make feasible an increasingly common design that only sequences a subset of affected subjects (related or unrelated) and uses for comparison publicly available AFs estimated in a group of healthy subjects. Finally, we show the great impact of linkage disequilibrium on the performance of all these tests.
Collapse
Affiliation(s)
- Mohamad Saad
- Department of Biostatistics, University of Washington, Seattle, USA.,Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, USA.,Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Ellen M Wijsman
- Department of Biostatistics, University of Washington, Seattle, USA.,Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, USA
| |
Collapse
|
12
|
de Los Campos G, Vazquez AI, Hsu S, Lello L. Complex-Trait Prediction in the Era of Big Data. Trends Genet 2018; 34:746-754. [PMID: 30139641 PMCID: PMC6150788 DOI: 10.1016/j.tig.2018.07.004] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 07/09/2018] [Accepted: 07/16/2018] [Indexed: 01/18/2023]
Abstract
Accurate prediction of complex traits requires using a large number of DNA variants. Advances in statistical and machine learning methodology enable the identification of complex patterns in high-dimensional settings. However, training these highly parameterized methods requires very large data sets. Until recently, such data sets were not available. But the situation is changing rapidly as very large biomedical data sets comprising individual genotype-phenotype data for hundreds of thousands of individuals become available in public and private domains. We argue that the convergence of advances in methodology and the advent of Big Genomic Data will enable unprecedented improvements in complex-trait prediction; we review theory and evidence supporting our claim and discuss challenges and opportunities that Big Data will bring to complex-trait prediction.
Collapse
Affiliation(s)
- Gustavo de Los Campos
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA; Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA; Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA.
| | - Ana Ines Vazquez
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA; Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA
| | - Stephen Hsu
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI 48824, USA; Cognitive Genomics Laboratory, BGI, Shenzhen 518083, China
| | - Louis Lello
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
13
|
Nicastro E, D'Antiga L. Next generation sequencing in pediatric hepatology and liver transplantation. Liver Transpl 2018; 24:282-293. [PMID: 29080241 DOI: 10.1002/lt.24964] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Revised: 09/04/2017] [Accepted: 10/18/2017] [Indexed: 02/07/2023]
Abstract
Next generation sequencing (NGS) has revolutionized the analysis of human genetic variations, offering a highly cost-effective way to diagnose monogenic diseases (MDs). Because nearly half of the children with chronic liver disorders have a genetic cause and approximately 20% of pediatric liver transplantations are performed in children with MDs, NGS offers the opportunity to significantly improve the diagnostic yield in this field. Among the NGS strategies, the use of targeted gene panels has proven useful to rapidly and reliably confirm a clinical suspicion, whereas the whole exome sequencing (WES) with variants filtering has been adopted to assist the diagnostic workup in unclear clinical scenarios. WES is powerful but challenging because it detects a great number of variants of unknown significance that can be misinterpreted and lead to an incorrect diagnosis. In pediatric hepatology, targeted NGS can be very valuable to discriminate neonatal/infantile cholestatic disorders, disclose genetic causes of acute liver failure, and diagnose the subtype of inborn errors of metabolism presenting with a similar phenotype (such as glycogen storage disorders, mitochondrial cytopathies, or nonalcoholic fatty liver disease). The inclusion of NGS in diagnostic processes will lead to a paradigm shift in medicine, changing our approach to the patient as well as our understanding of factors affecting genotype-phenotype match. In this review, we discuss the opportunities and the challenges offered nowadays by NGS, and we propose a novel algorithm for cholestasis of infancy adopted in our center, including targeted NGS as a pivotal tool for the diagnosis of liver-based MDs. Liver Transplantation 24 282-293 2018 AASLD.
Collapse
Affiliation(s)
- Emanuele Nicastro
- Pediatric Hepatology, Gastroenterology and Transplantation, Hospital Papa Giovanni XXIII, Bergamo, Italy
| | - Lorenzo D'Antiga
- Pediatric Hepatology, Gastroenterology and Transplantation, Hospital Papa Giovanni XXIII, Bergamo, Italy
| |
Collapse
|
14
|
Genome-wide linkage and association study implicates the 10q26 region as a major genetic contributor to primary nonsyndromic vesicoureteric reflux. Sci Rep 2017; 7:14595. [PMID: 29097723 PMCID: PMC5668427 DOI: 10.1038/s41598-017-15062-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Accepted: 10/06/2017] [Indexed: 12/29/2022] Open
Abstract
Vesicoureteric reflux (VUR) is the commonest urological anomaly in children. Despite treatment improvements, associated renal lesions – congenital dysplasia, acquired scarring or both – are a common cause of childhood hypertension and renal failure. Primary VUR is familial, with transmission rate and sibling risk both approaching 50%, and appears highly genetically heterogeneous. It is often associated with other developmental anomalies of the urinary tract, emphasising its etiology as a disorder of urogenital tract development. We conducted a genome-wide linkage and association study in three European populations to search for loci predisposing to VUR. Family-based association analysis of 1098 parent-affected-child trios and case/control association analysis of 1147 cases and 3789 controls did not reveal any compelling associations, but parametric linkage analysis of 460 families (1062 affected individuals) under a dominant model identified a single region, on 10q26, that showed strong linkage (HLOD = 4.90; ZLRLOD = 4.39) to VUR. The ~9Mb region contains 69 genes, including some good biological candidates. Resequencing this region in selected individuals did not clearly implicate any gene but FOXI2, FANK1 and GLRX3 remain candidates for further investigation. This, the largest genetic study of VUR to date, highlights the 10q26 region as a major genetic contributor to VUR in European populations.
Collapse
|
15
|
Lee S, Kim S, Fuchsberger C. Improving power for rare-variant tests by integrating external controls. Genet Epidemiol 2017; 41:610-619. [PMID: 28657150 DOI: 10.1002/gepi.22057] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Revised: 03/16/2017] [Accepted: 04/25/2017] [Indexed: 11/07/2022]
Abstract
Due to the drop in sequencing cost, the number of sequenced genomes is increasing rapidly. To improve power of rare-variant tests, these sequenced samples could be used as external control samples in addition to control samples from the study itself. However, when using external controls, possible batch effects due to the use of different sequencing platforms or genotype calling pipelines can dramatically increase type I error rates. To address this, we propose novel summary statistics based single and gene- or region-based rare-variant tests that allow the integration of external controls while controlling for type I error. Our approach is based on the insight that batch effects on a given variant can be assessed by comparing odds ratio estimates using internal controls only vs. using combined control samples of internal and external controls. From simulation experiments and the analysis of data from age-related macular degeneration and type 2 diabetes studies, we demonstrate that our method can substantially improve power while controlling for type I error rate.
Collapse
Affiliation(s)
- Seunggeun Lee
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Sehee Kim
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Christian Fuchsberger
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.,Center for Biomedicine, European Academy of Bolzano/Bozen, affiliated to the University of Lübeck, Bolzano/Bozen, Italy
| |
Collapse
|
16
|
Luo Y, de Lange KM, Jostins L, Moutsianas L, Randall J, Kennedy NA, Lamb CA, McCarthy S, Ahmad T, Edwards C, Serra EG, Hart A, Hawkey C, Mansfield JC, Mowat C, Newman WG, Nichols S, Pollard M, Satsangi J, Simmons A, Tremelling M, Uhlig H, Wilson DC, Lee JC, Prescott NJ, Lees CW, Mathew CG, Parkes M, Barrett JC, Anderson CA. Exploring the genetic architecture of inflammatory bowel disease by whole-genome sequencing identifies association at ADCY7. Nat Genet 2017; 49:186-192. [PMID: 28067910 PMCID: PMC5289625 DOI: 10.1038/ng.3761] [Citation(s) in RCA: 114] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Accepted: 12/07/2016] [Indexed: 02/06/2023]
Abstract
To further resolve the genetic architecture of the inflammatory bowel diseases ulcerative colitis and Crohn's disease, we sequenced the whole genomes of 4,280 patients at low coverage and compared them to 3,652 previously sequenced population controls across 73.5 million variants. We then imputed from these sequences into new and existing genome-wide association study cohorts and tested for association at ∼12 million variants in a total of 16,432 cases and 18,843 controls. We discovered a 0.6% frequency missense variant in ADCY7 that doubles the risk of ulcerative colitis. Despite good statistical power, we did not identify any other new low-frequency risk variants and found that such variants explained little heritability. We detected a burden of very rare, damaging missense variants in known Crohn's disease risk genes, suggesting that more comprehensive sequencing studies will continue to improve understanding of the biology of complex diseases.
Collapse
Affiliation(s)
- Yang Luo
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK
- Division of Genetics and Rheumatology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | | | - Luke Jostins
- Wellcome Trust Centre for Human Genetics, University of Oxford, Headington, UK
- Christ Church, University of Oxford, St Aldates, UK
| | - Loukas Moutsianas
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | - Joshua Randall
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | - Nicholas A. Kennedy
- Precision Medicine Exeter, University of Exeter, Exeter, UK
- IBD Pharmacogenetics, Royal Devon and Exeter Foundation Trust, Exeter, UK
| | | | - Shane McCarthy
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | - Tariq Ahmad
- Precision Medicine Exeter, University of Exeter, Exeter, UK
- IBD Pharmacogenetics, Royal Devon and Exeter Foundation Trust, Exeter, UK
| | - Cathryn Edwards
- Department of Gastroenterology, Torbay Hospital, Torbay, Devon, UK
| | | | - Ailsa Hart
- Department of Medicine, St Mark's Hospital, Harrow, Middlesex, UK
| | - Chris Hawkey
- Nottingham Digestive Diseases Centre, Queens Medical Centre, Nottingham, UK
| | - John C. Mansfield
- Institute of Human Genetics, Newcastle University, Newcastle upon Tyne, UK
| | - Craig Mowat
- Department of Medicine, Ninewells Hospital and Medical School, Dundee, UK
| | - William G. Newman
- Genetic Medicine, Manchester Academic Health Science Centre, Manchester, UK
- The Manchester Centre for Genomic Medicine, University of Manchester, Manchester, UK
| | - Sam Nichols
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | - Martin Pollard
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | - Jack Satsangi
- Gastrointestinal Unit, Wester General Hospital University of Edinburgh, Edinburgh, UK
| | - Alison Simmons
- Translational Gastroenterology Unit, John Radcliffe Hospital, University of Oxford, Oxford OX3 9DS, UK
- Human Immunology Unit, Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, UK
| | - Mark Tremelling
- Gastroenterology & General Medicine, Norfolk and Norwich University Hospital, Norwich, UK
| | - Holm Uhlig
- Translational Gastroenterology Unit and the Department of Paediatrics, University of Oxford, Oxford, United Kingdom
| | - David C. Wilson
- Paediatric Gastroenterology and Nutrition, Royal Hospital for Sick Children, Edinburgh, UK
- Child Life and Health, University of Edinburgh, Edinburgh, Scotland, UK
| | - James C. Lee
- Inflammatory Bowel Disease Research Group, Addenbrooke's Hospital, Cambridge, UK
| | - Natalie J. Prescott
- Department of Medical and Molecular Genetics, Faculty of Life Science and Medicine, King's College London, Guy's Hospital, London, UK
| | - Charlie W. Lees
- Gastrointestinal Unit, Wester General Hospital University of Edinburgh, Edinburgh, UK
| | - Christopher G. Mathew
- Department of Medical and Molecular Genetics, Faculty of Life Science and Medicine, King's College London, Guy's Hospital, London, UK
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of Witwatersrand, South Africa
| | - Miles Parkes
- Inflammatory Bowel Disease Research Group, Addenbrooke's Hospital, Cambridge, UK
| | - Jeffrey C. Barrett
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | - Carl A. Anderson
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK
| |
Collapse
|
17
|
Panjwani N, Wilson MD, Addis L, Crosbie J, Wirrell E, Auvin S, Caraballo RH, Kinali M, McCormick D, Oren C, Taylor J, Trounce J, Clarke T, Akman CI, Kugler SL, Mandelbaum DE, McGoldrick P, Wolf SM, Arnold P, Schachar R, Pal DK, Strug LJ. A microRNA-328 binding site in PAX6 is associated with centrotemporal spikes of rolandic epilepsy. Ann Clin Transl Neurol 2016; 3:512-22. [PMID: 27386500 PMCID: PMC4931716 DOI: 10.1002/acn3.320] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Accepted: 04/28/2016] [Indexed: 12/21/2022] Open
Abstract
Objective Rolandic epilepsy is a common genetic focal epilepsy of childhood characterized by centrotemporal sharp waves on electroencephalogram. In previous genome‐wide analysis, we had reported linkage of centrotemporal sharp waves to chromosome 11p13, and fine mapping with 44 SNPs identified the ELP4‐PAX6 locus in two independent US and Canadian case–control samples. Here, we aimed to find a causative variant for centrotemporal sharp waves using a larger sample and higher resolution genotyping array. Methods We fine‐mapped the ELP4‐PAX6 locus in 186 individuals from rolandic epilepsy families and 1000 population controls of European origin using the Illumina HumanCoreExome‐12 v1.0 BeadChip. Controls were matched to cases on ethnicity using principal component analysis. We used generalized estimating equations to assess association, followed up with a bioinformatics survey and literature search to evaluate functional significance. Results Homozygosity at the T allele of SNP rs662702 in the 3′ untranslated region of PAX6 conferred increased risk of CTS: Odds ratio = 12.29 (95% CI: 3.20–47.22), P = 2.6 × 10−4 and is seen in 3.9% of cases but only 0.3% of controls. Interpretation The minor T allele of SNP rs662702 disrupts regulation by microRNA‐328, which is known to result in increased PAX6 expression in vitro. This study provides, for the first time, evidence of a noncoding genomic variant contributing to the etiology of a common human epilepsy via a posttranscriptional regulatory mechanism.
Collapse
Affiliation(s)
- Naim Panjwani
- Program in Genetics and Genome Biology The Hospital for Sick Children Toronto Ontario M5G 0A4 Canada
| | - Michael D Wilson
- Program in Genetics and Genome Biology The Hospital for Sick Children Toronto Ontario M5G 0A4 Canada; Department of Molecular Genetics University of Toronto Toronto Ontario M5S 1A1 Canada
| | - Laura Addis
- Department of Basic and Clinical Neuroscience Institute of Psychiatry, Psychology and Neuroscience King's College London London SE5 9RX United Kingdom; Neuroscience Discovery Research Eli Lilly and Company Erl Wood, Surrey GU20 6PH United Kingdom
| | - Jennifer Crosbie
- Neurosciences and Mental Health Program Research Institute The Hospital for Sick Children Toronto Ontario M5G 0A4 Canada; Department of Psychiatry The Hospital for Sick Children Toronto Ontario M5G 0A4 Canada
| | - Elaine Wirrell
- Division of Child and Adolescent Neurology Mayo Clinic Rochester Minnesota 55905
| | - Stéphane Auvin
- Service de neurologie pédiatrique/Inserm 1141 Hôpital Robert Debré AP-HP, 48 boulevard Sérurier Paris 75019 France
| | - Roberto H Caraballo
- Department of Neurology Hospital de Pediatría "Prof Dr Juan P Garrahan" Combate de los Pozos 1881 C1245AAM Buenos Aires Argentina
| | - Maria Kinali
- Chelsea and Westminster Hospital London SW10 9NH United Kingdom
| | | | - Caroline Oren
- Northwick Park Hospital Middlesex HA1 3UJ United Kingdom
| | - Jacqueline Taylor
- Barnet and Chase Farm Hospitals Enfield, Greater London EN2 8JL United Kingdom
| | - John Trounce
- Brighton and Sussex University Hospitals Brighton BN1 6AG United Kingdom
| | - Tara Clarke
- Department of Epidemiology Columbia University New York New York 10027
| | - Cigdem I Akman
- Neurological Institute Columbia University Medical Centre New York, New York 10032
| | - Steven L Kugler
- Children's Hospital of Philadelphia and University of Pennsylvania School of Medicine Philadelphia Pennsylvania 19104
| | - David E Mandelbaum
- Hasbro Children's Hospital and the Warren Alpert Medical School of Brown University Providence Rhode Island 02903
| | | | | | - Paul Arnold
- Neurosciences and Mental Health Program Research Institute The Hospital for Sick Children Toronto Ontario M5G 0A4 Canada; Department of Psychiatry The Hospital for Sick Children Toronto Ontario M5G 0A4 Canada; Mathison Centre for Mental Health Research and Education University of Calgary Calgary Alberta T2N 4Z6 Canada
| | - Russell Schachar
- Neurosciences and Mental Health Program Research Institute The Hospital for Sick Children Toronto Ontario M5G 0A4 Canada; Department of Psychiatry The Hospital for Sick Children Toronto Ontario M5G 0A4 Canada
| | - Deb K Pal
- Department of Basic and Clinical Neuroscience Institute of Psychiatry, Psychology and Neuroscience King's College London London SE5 9RX United Kingdom; King's College Hospital London SE5 9RS United Kingdom; Evelina London Children's Hospita lLondon SE1 7EH United Kingdom
| | - Lisa J Strug
- Program in Genetics and Genome Biology The Hospital for Sick Children Toronto Ontario M5G 0A4 Canada; Division of Biostatistics Dalla Lana School of Public Health University of Toronto Toronto Ontario M5T 3M7 Canada; The Centre for Applied Genomics The Hospital for Sick Children Toronto Ontario M5G 0A4 Canada
| |
Collapse
|
18
|
Testing Rare-Variant Association without Calling Genotypes Allows for Systematic Differences in Sequencing between Cases and Controls. PLoS Genet 2016; 12:e1006040. [PMID: 27152526 PMCID: PMC4859496 DOI: 10.1371/journal.pgen.1006040] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2015] [Accepted: 04/19/2016] [Indexed: 01/02/2023] Open
Abstract
Next-generation sequencing of DNA provides an unprecedented opportunity to discover rare genetic variants associated with complex diseases and traits. However, the common practice of first calling underlying genotypes and then treating the called values as known is prone to false positive findings, especially when genotyping errors are systematically different between cases and controls. This happens whenever cases and controls are sequenced at different depths, on different platforms, or in different batches. In this article, we provide a likelihood-based approach to testing rare variant associations that directly models sequencing reads without calling genotypes. We consider the (weighted) burden test statistic, which is the (weighted) sum of the score statistic for assessing effects of individual variants on the trait of interest. Because variant locations are unknown, we develop a simple, computationally efficient screening algorithm to estimate the loci that are variants. Because our burden statistic may not have mean zero after screening, we develop a novel bootstrap procedure for assessing the significance of the burden statistic. We demonstrate through extensive simulation studies that the proposed tests are robust to a wide range of differential sequencing qualities between cases and controls, and are at least as powerful as the standard genotype calling approach when the latter controls type I error. An application to the UK10K data reveals novel rare variants in gene BTBD18 associated with childhood onset obesity. The relevant software is freely available. In next-generation sequencing studies, there are typically systematic differences in sequencing qualities (e.g., depth) between cases and controls, because the entire studies are rarely sequenced in exactly the same way. It has long been appreciated that, in the presence of such differences, the standard genotype calling approach to detecting rare variant associations generally leads to excessive false positive findings. To deal with this, the current “state of the art” is to impose stringent quality control procedures that much of the data is eliminated. We present a method that allows analyzing data with a wide range of differential sequencing qualities between cases and controls. Our method is more powerful than the current practice and can accelerate the search for disease-causing mutations.
Collapse
|
19
|
Abstract
Empirical studies and evolutionary theory support a role for rare variants in the etiology of complex traits. Given this motivation and increasing affordability of whole-exome and whole-genome sequencing, methods for rare variant association have been an active area of research for the past decade. Here, we provide a survey of the current literature and developments from the Genetics Analysis Workshop 19 (GAW19) Collapsing Rare Variants working group. In particular, we present the generalized linear regression framework and associated score statistic for the 2 major types of methods: burden and variance components methods. We further show that by simply modifying weights within these frameworks we arrive at many of the popular existing methods, for example, the cohort allelic sums test and sequence kernel association test. Meta-analysis techniques are also described. Next, we describe the 6 contributions from the GAW19 Collapsing Rare Variants working group. These included development of new methods, such as a retrospective likelihood for family data, a method using genomic structure to compare cases and controls, a haplotype-based meta-analysis, and a permutation-based method for combining different statistical tests. In addition, one contribution compared a mega-analysis of family-based and population-based data to meta-analysis. Finally, the power of existing family-based methods for binary traits was compared. We conclude with suggestions for open research questions.
Collapse
Affiliation(s)
- Stephanie A Santorico
- Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO, 80217-3364, USA.
| | - Audrey E Hendricks
- Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO, 80217-3364, USA.
| |
Collapse
|
20
|
Saad M, Brkanac Z, Wijsman EM. Family-based genome scan for age at onset of late-onset Alzheimer's disease in whole exome sequencing data. GENES, BRAIN, AND BEHAVIOR 2015; 14:607-17. [PMID: 26394601 PMCID: PMC4715764 DOI: 10.1111/gbb.12250] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Revised: 08/08/2015] [Accepted: 08/24/2015] [Indexed: 01/31/2023]
Abstract
Alzheimer's disease (AD) is a common and complex neurodegenerative disease. Age at onset (AAO) of AD is an important component phenotype with a genetic basis, and identification of genes in which variation affects AAO would contribute to identification of factors that affect timing of onset. Increase in AAO through prevention or therapeutic measures would have enormous benefits by delaying AD and its associated morbidities. In this paper, we performed a family-based genome-wide association study for AAO of late-onset AD in whole exome sequence data generated in multigenerational families with multiple AD cases. We conducted single marker and gene-based burden tests for common and rare variants, respectively. We combined association analyses with variance component linkage analysis, and with reference to prior studies, in order to enhance evidence of the identified genes. For variants and genes implicated by the association study, we performed a gene-set enrichment analysis to identify potential novel pathways associated with AAO of AD. We found statistically significant association with AAO for three genes (WRN, NTN4 and LAMC3) with common associated variants, and for four genes (SLC8A3, SLC19A3, MADD and LRRK2) with multiple rare-associated variants that have a plausible biological function related to AD. The genes we have identified are in pathways that are strong candidates for involvement in the development of AD pathology and may lead to a better understanding of AD pathogenesis.
Collapse
Affiliation(s)
- Mohamad Saad
- Department of Biostatistics, University of Washington, Seattle, USA
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, USA
| | - Zoran Brkanac
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, USA
| | - Ellen M. Wijsman
- Department of Biostatistics, University of Washington, Seattle, USA
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, USA
| |
Collapse
|
21
|
Chapman NH, Nato AQ, Bernier R, Ankenman K, Sohi H, Munson J, Patowary A, Archer M, Blue EM, Webb SJ, Coon H, Raskind WH, Brkanac Z, Wijsman EM. Whole exome sequencing in extended families with autism spectrum disorder implicates four candidate genes. Hum Genet 2015; 134:1055-68. [PMID: 26204995 PMCID: PMC4578871 DOI: 10.1007/s00439-015-1585-y] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 07/11/2015] [Indexed: 12/26/2022]
Abstract
Autism spectrum disorders (ASDs) are a group of neurodevelopmental disorders, characterized by impairment in communication and social interactions, and by repetitive behaviors. ASDs are highly heritable, and estimates of the number of risk loci range from hundreds to >1000. We considered 7 extended families (size 12-47 individuals), each with ≥3 individuals affected by ASD. All individuals were genotyped with dense SNP panels. A small subset of each family was typed with whole exome sequence (WES). We used a 3-step approach for variant identification. First, we used family-specific parametric linkage analysis of the SNP data to identify regions of interest. Second, we filtered variants in these regions based on frequency and function, obtaining exactly 200 candidates. Third, we compared two approaches to narrowing this list further. We used information from the SNP data to impute exome variant dosages into those without WES. We regressed affected status on variant allele dosage, using pedigree-based kinship matrices to account for relationships. The p value for the test of the null hypothesis that variant allele dosage is unrelated to phenotype was used to indicate strength of evidence supporting the variant. A cutoff of p = 0.05 gave 28 variants. As an alternative third filter, we required Mendelian inheritance in those with WES, resulting in 70 variants. The imputation- and association-based approach was effective. We identified four strong candidate genes for ASD (SEZ6L, HISPPD1, FEZF1, SAMD11), all of which have been previously implicated in other studies, or have a strong biological argument for their relevance.
Collapse
Affiliation(s)
- Nicola H Chapman
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, WA, USA
| | - Alejandro Q Nato
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, WA, USA
| | - Raphael Bernier
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Katy Ankenman
- Department of Psychiatry, University of California, San Francisco, CA, USA
| | - Harkirat Sohi
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, WA, USA
| | - Jeff Munson
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
- Center on Child Development and Disability, University of Washington, Seattle, WA, USA
| | - Ashok Patowary
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Marilyn Archer
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Elizabeth M Blue
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, WA, USA
| | - Sara Jane Webb
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
- Center on Child Development and Disability, University of Washington, Seattle, WA, USA
| | - Hilary Coon
- Department of Internal Medicine, University of Utah, Salt Lake City, UT, USA
- Department of Psychiatry, School of Medicine, University of Utah, Salt Lake City, UT, USA
| | - Wendy H Raskind
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, WA, USA
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Zoran Brkanac
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Ellen M Wijsman
- Division of Medical Genetics, School of Medicine, University of Washington, Seattle, WA, USA.
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- University of Washington, University of Washington Tower, T15, 4333 Brooklyn Ave, NE, BOX 359460, Seattle, WA, 98195-9460, USA.
| |
Collapse
|
22
|
Yan S, Yuan S, Xu Z, Zhang B, Zhang B, Kang G, Byrnes A, Li Y. Likelihood-based complex trait association testing for arbitrary depth sequencing data. Bioinformatics 2015; 31:2955-62. [PMID: 25979475 PMCID: PMC4668777 DOI: 10.1093/bioinformatics/btv307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2014] [Revised: 05/06/2015] [Accepted: 05/11/2015] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED In next generation sequencing (NGS)-based genetic studies, researchers typically perform genotype calling first and then apply standard genotype-based methods for association testing. However, such a two-step approach ignores genotype calling uncertainty in the association testing step and may incur power loss and/or inflated type-I error. In the recent literature, a few robust and efficient likelihood based methods including both likelihood ratio test (LRT) and score test have been proposed to carry out association testing without intermediate genotype calling. These methods take genotype calling uncertainty into account by directly incorporating genotype likelihood function (GLF) of NGS data into association analysis. However, existing LRT methods are computationally demanding or do not allow covariate adjustment; while existing score tests are not applicable to markers with low minor allele frequency (MAF). We provide an LRT allowing flexible covariate adjustment, develop a statistically more powerful score test and propose a combination strategy (UNC combo) to leverage the advantages of both tests. We have carried out extensive simulations to evaluate the performance of our proposed LRT and score test. Simulations and real data analysis demonstrate the advantages of our proposed combination strategy: it offers a satisfactory trade-off in terms of computational efficiency, applicability (accommodating both common variants and variants with low MAF) and statistical power, particularly for the analysis of quantitative trait where the power gain can be up to ∼60% when the causal variant is of low frequency (MAF < 0.01). AVAILABILITY AND IMPLEMENTATION UNC combo and the associated R files, including documentation, examples, are available at http://www.unc.edu/∼yunmli/UNCcombo/ CONTACT yunli@med.unc.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Song Yan
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Shuai Yuan
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Zheng Xu
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Baqun Zhang
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Bo Zhang
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Guolian Kang
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Andrea Byrnes
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Yun Li
- Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA Department of Biostatistics, Department of Genetics, Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 USA, Merck Research Laboratories, North Wales, PA, USA, School of Statistics, Renmin University of China, Beijing, People's Republic of China, Department of Statistics, North Carolina State University, Raleigh, NC, 27607 USA, Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN 38105, USA and Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| |
Collapse
|
23
|
Exome Sequencing of Phenotypic Extremes Identifies CAV2 and TMC6 as Interacting Modifiers of Chronic Pseudomonas aeruginosa Infection in Cystic Fibrosis. PLoS Genet 2015; 11:e1005273. [PMID: 26047157 PMCID: PMC4457883 DOI: 10.1371/journal.pgen.1005273] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Accepted: 05/12/2015] [Indexed: 12/22/2022] Open
Abstract
Discovery of rare or low frequency variants in exome or genome data that are associated with complex traits often will require use of very large sample sizes to achieve adequate statistical power. For a fixed sample size, sequencing of individuals sampled from the tails of a phenotype distribution (i.e., extreme phenotypes design) maximizes power and this approach was recently validated empirically with the discovery of variants in DCTN4 that influence the natural history of P. aeruginosa airway infection in persons with cystic fibrosis (CF; MIM219700). The increasing availability of large exome/genome sequence datasets that serve as proxies for population-based controls affords the opportunity to test an alternative, potentially more powerful and generalizable strategy, in which the frequency of rare variants in a single extreme phenotypic group is compared to a control group (i.e., extreme phenotype vs. control population design). As proof-of-principle, we applied this approach to search for variants associated with risk for age-of-onset of chronic P. aeruginosa airway infection among individuals with CF and identified variants in CAV2 and TMC6 that were significantly associated with group status. These results were validated using a large, prospective, longitudinal CF cohort and confirmed a significant association of a variant in CAV2 with increased age-of-onset of P. aeruginosa airway infection (hazard ratio = 0.48, 95% CI=[0.32, 0.88]) and variants in TMC6 with diminished age-of-onset of P. aeruginosa airway infection (HR = 5.4, 95% CI=[2.2, 13.5]) A strong interaction between CAV2 and TMC6 variants was observed (HR=12.1, 95% CI=[3.8, 39]) for children with the deleterious TMC6 variant and without the CAV2 protective variant. Neither gene showed a significant association using an extreme phenotypes design, and conditions for which the power of an extreme phenotype vs. control population design was greater than that for the extreme phenotypes design were explored. Whole exome and whole genome sequencing provide the opportunity to test for associations between expressed traits and genetic variants that cannot be tested with chip technology, particularly variants that are too rare to be included on chips designed for genome-wide association analysis. We used exome sequencing to identify variants in CAV2 and TMC6 that modify the age-of-onset of chronic Pseudomonas aeruginosa infection among children with cystic fibrosis, and validated our findings in a large cohort of children with cystic fibrosis. For a fixed number of study participants, it is known that the extreme phenotypes design provides greater statistical power than a random sampling design. In the extreme phenotypes design, one compares the frequency of a given set of genetic variants in one extreme of age-of-onset (early onset) to that in the other extreme (late onset). Here, we employed an alternative design that compares genetic frequencies in exomes sampled from one extreme to that among exomes from a large set of controls. We show that this design confers substantially greater statistical power for discovery of CAV2 and TMC6 and provide general conditions under which this single extreme versus control design is more powerful than the extreme phenotypes design.
Collapse
|
24
|
Li W, Dobbins S, Tomlinson I, Houlston R, Pal DK, Strug LJ. Prioritizing rare variants with conditional likelihood ratios. Hum Hered 2015; 79:5-13. [PMID: 25659987 PMCID: PMC4759929 DOI: 10.1159/000371579] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2014] [Accepted: 12/15/2014] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Prioritizing individual rare variants within associated genes or regions often consists of an ad hoc combination of statistical and biological considerations. From the statistical perspective, rare variants are often ranked using Fisher's exact p values, which can lead to different rankings of the same set of variants depending on whether 1- or 2-sided p values are used. RESULTS We propose a likelihood ratio-based measure, maxLRc, for the statistical component of ranking rare variants under a case-control study design that avoids the hypothesis-testing paradigm. We prove analytically that the maxLRc is always well-defined, even when the data has zero cell counts in the 2×2 disease-variant table. Via simulation, we show that the maxLRc outperforms Fisher's exact p values in most practical scenarios considered. Using next-generation sequence data from 27 rolandic epilepsy cases and 200 controls in a region previously shown to be linked to and associated with rolandic epilepsy, we demonstrate that rankings assigned by the maxLRc and exact p values can differ substantially. CONCLUSION The maxLRc provides reliable statistical prioritization of rare variants using only the observed data, avoiding the need to specify parameters associated with hypothesis testing that can result in ranking discrepancies across p value procedures; and it is applicable to common variant prioritization.
Collapse
Affiliation(s)
- Weili Li
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Ont., Canada
| | | | | | | | | | | |
Collapse
|