1
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Excalibur: A new ensemble method based on an optimal combination of aggregation tests for rare-variant association testing for sequencing data. PLoS Comput Biol 2023; 19:e1011488. [PMID: 37708232 PMCID: PMC10522036 DOI: 10.1371/journal.pcbi.1011488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 09/26/2023] [Accepted: 09/04/2023] [Indexed: 09/16/2023] Open
Abstract
The development of high-throughput next-generation sequencing technologies and large-scale genetic association studies produced numerous advances in the biostatistics field. Various aggregation tests, i.e. statistical methods that analyze associations of a trait with multiple markers within a genomic region, have produced a variety of novel discoveries. Notwithstanding their usefulness, there is no single test that fits all needs, each suffering from specific drawbacks. Selecting the right aggregation test, while considering an unknown underlying genetic model of the disease, remains an important challenge. Here we propose a new ensemble method, called Excalibur, based on an optimal combination of 36 aggregation tests created after an in-depth study of the limitations of each test and their impact on the quality of result. Our findings demonstrate the ability of our method to control type I error and illustrate that it offers the best average power across all scenarios. The proposed method allows for novel advances in Whole Exome/Genome sequencing association studies, able to handle a wide range of association models, providing researchers with an optimal aggregation analysis for the genetic regions of interest.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- WELBIO department, WEL Research Institute, Wavre, Belgium
| |
Collapse
|
2
|
Xu Z, Yan S, Wu C, Duan Q, Chen S, Li Y. Next-Generation Sequencing Data-Based Association Testing of a Group of Genetic Markers for Complex Responses Using a Generalized Linear Model Framework. MATHEMATICS (BASEL, SWITZERLAND) 2023; 11:2560. [PMID: 38721066 PMCID: PMC11078158 DOI: 10.3390/math11112560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/12/2024]
Abstract
Association testing has been widely used to study the relationship between genetic variants and phenotypes. Most association testing methods are genotype-based, i.e. first estimate genotype and then regress phenotype on estimated genotype and other variables. Directly testing methods based on next generation sequencing (NGS) data without genotype calling have been proposed and shown advantage over genotype-based methods in the scenarios when genotype calling is not accurate. NGS data-based single-variant testing have been proposed including our previously proposed single-variant testing method, i.e. UNC combo method [1]. NGS data-based group testing methods for continuous phenotype have also been proposed by us using a linear model framework which can handle continuous responses [2]. In this paper, we extend our linear model-based framework to a generalized linear model-based framework so that the methods can handle other types of responses especially binary responses which is commonly-faced in association studies. We have conducted extensive simulation studies to evaluate the performance of different estimators and compare our estimators with their corresponding genotype-based methods. We found that all methods have Type I errors controlled, and our NGS data-based testing methods have better performance than their corresponding genotype-based methods in the literature for other types of responses including binary responses (logistic regression) and count responses (Poisson regression especially when sequencing depth is low. In conclusion, we have extended our previous linear model (LM) framework to a generalized linear model (GLM) framework and derived NGS data-based testing methods for a group of genetic variants. Compared with our previously proposed LM-based methods [2], the new GLM-based methods can handle more complex responses (for example, binary responses and count responses) in addition to continuous responses. Our methods have filled the literature gap and shown advantage over their corresponding genotype-based methods in the literature.
Collapse
Affiliation(s)
- Zheng Xu
- Department of Mathematics and Statistics, Wright State University, Dayton, Ohio, 45324, USA
| | - Song Yan
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Cong Wu
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68508, USA
| | - Qing Duan
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Sixia Chen
- Department of Biostatistics and Epidemiology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
3
|
Shivakumar M, Miller JE, Dasari VR, Zhang Y, Lee MTM, Carey DJ, Gogoi R, Kim D. Genetic Analysis of Functional Rare Germline Variants across Nine Cancer Types from an Electronic Health Record Linked Biobank. Cancer Epidemiol Biomarkers Prev 2021; 30:1681-1688. [PMID: 34244158 DOI: 10.1158/1055-9965.epi-21-0082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 02/15/2021] [Accepted: 06/17/2021] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Rare variants play an essential role in the etiology of cancer. In this study, we aim to characterize rare germline variants that impact the risk of cancer. METHODS We performed a genome-wide rare variant analysis using germline whole exome sequencing (WES) data derived from the Geisinger MyCode initiative to discover cancer predisposition variants. The case-control association analysis was conducted by binning variants in 5,538 patients with cancer and 7,286 matched controls in a discovery set and 1,991 patients with cancer and 2,504 matched controls in a validation set across nine cancer types. Further, The Cancer Genome Atlas (TCGA) germline data were used to replicate the findings. RESULTS We identified 133 significant pathway-cancer pairs (85 replicated) and 90 significant gene-cancer pairs (12 replicated). In addition, we identified 18 genes and 3 pathways that were associated with survival outcome across cancers (Bonferroni P < 0.05). CONCLUSIONS In this study, we identified potential predisposition genes and pathways based on rare variants in nine cancers. IMPACT This work adds to the knowledge base and progress being made in precision medicine.
Collapse
Affiliation(s)
- Manu Shivakumar
- Biomedical & Translational Informatics Institute, Geisinger, Danville, Pennsylvania
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Jason E Miller
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, Pennsylvania
| | | | - Yanfei Zhang
- Genomic Medicine Institute, Geisinger, Danville, Pennsylvania
| | | | - David J Carey
- Department of Molecular and Functional Genomics, Geisinger, Danville, Pennsylvania
| | - Radhika Gogoi
- Weis Center for Research, Geisinger Clinic, Danville, Pennsylvania.
| | | |
Collapse
|
4
|
Blumhagen RZ, Schwartz DA, Langefeld CD, Fingerlin TE. Identification of Influential Variants in Significant Aggregate Rare Variant Tests. Hum Hered 2021; 85:1-13. [PMID: 33567433 PMCID: PMC8353006 DOI: 10.1159/000513290] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 11/19/2020] [Indexed: 12/17/2022] Open
Abstract
INTRODUCTION Studies that examine the role of rare variants in both simple and complex disease are increasingly common. Though the usual approach of testing rare variants in aggregate sets is more powerful than testing individual variants, it is of interest to identify the variants that are plausible drivers of the association. We present a novel method for prioritization of rare variants after a significant aggregate test by quantifying the influence of the variant on the aggregate test of association. METHODS In addition to providing a measure used to rank variants, we use outlier detection methods to present the computationally efficient Rare Variant Influential Filtering Tool (RIFT) to identify a subset of variants that influence the disease association. We evaluated several outlier detection methods that vary based on the underlying variance measure: interquartile range (Tukey fences), median absolute deviation, and SD. We performed 1,000 simulations for 50 regions of size 3 kb and compared the true and false positive rates. We compared RIFT using the Inner Tukey to 2 existing methods: adaptive combination of p values (ADA) and a Bayesian hierarchical model (BeviMed). Finally, we applied this method to data from our targeted resequencing study in idiopathic pulmonary fibrosis (IPF). RESULTS All outlier detection methods observed higher sensitivity to detect uncommon variants (0.001 < minor allele frequency, MAF > 0.03) compared to very rare variants (MAF <0.001). For uncommon variants, RIFT had a lower median false positive rate compared to the ADA. ADA and RIFT had significantly higher true positive rates than that observed for BeviMed. When applied to 2 regions found previously associated with IPF including 100 rare variants, we identified 6 polymorphisms with the greatest evidence for influencing the association with IPF. DISCUSSION In summary, RIFT has a high true positive rate while maintaining a low false positive rate for identifying polymorphisms influencing rare variant association tests. This work provides an approach to obtain greater resolution of the rare variant signals within significant aggregate sets; this information can provide an objective measure to prioritize variants for follow-up experimental studies and insight into the biological pathways involved.
Collapse
Affiliation(s)
- Rachel Z Blumhagen
- Center for Genes, Environment and Health, National Jewish Health, Denver, Colorado, USA,
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, Colorado, USA,
| | - David A Schwartz
- School of Medicine, University of Colorado, Aurora, Colorado, USA
| | - Carl D Langefeld
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
- Comprehensive Cancer Center, Wake Forest Baptist Medical Center, Winston-Salem, North Carolina, USA
- Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Tasha E Fingerlin
- Center for Genes, Environment and Health, National Jewish Health, Denver, Colorado, USA
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, Colorado, USA
- School of Medicine, University of Colorado, Aurora, Colorado, USA
| |
Collapse
|
5
|
Swietlik EM, Gräf S, Morrell NW. The role of genomics and genetics in pulmonary arterial hypertension. Glob Cardiol Sci Pract 2020; 2020:e202013. [PMID: 33150157 PMCID: PMC7590931 DOI: 10.21542/gcsp.2020.13] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Affiliation(s)
- Emilia M Swietlik
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom.,Addenbrooke's Hospital NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, United Kingdom.,Royal Papworth Hospital NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Stefan Gräf
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom.,Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom.,NIHR BioResource for Translational Research, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Nicholas W Morrell
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom.,Addenbrooke's Hospital NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, United Kingdom.,Royal Papworth Hospital NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, United Kingdom.,NIHR BioResource for Translational Research, Cambridge Biomedical Campus, Cambridge, United Kingdom
| |
Collapse
|
6
|
Leongamornlert DA, Saunders EJ, Wakerell S, Whitmore I, Dadaev T, Cieza-Borrella C, Benafif S, Brook MN, Donovan JL, Hamdy FC, Neal DE, Muir K, Govindasami K, Conti DV, Kote-Jarai Z, Eeles RA. Germline DNA Repair Gene Mutations in Young-onset Prostate Cancer Cases in the UK: Evidence for a More Extensive Genetic Panel. Eur Urol 2019; 76:329-337. [PMID: 30777372 PMCID: PMC6695475 DOI: 10.1016/j.eururo.2019.01.050] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 01/31/2019] [Indexed: 12/30/2022]
Abstract
BACKGROUND Rare germline mutations in DNA repair genes are associated with prostate cancer (PCa) predisposition and prognosis. OBJECTIVE To quantify the frequency of germline DNA repair gene mutations in UK PCa cases and controls, in order to more comprehensively evaluate the contribution of individual genes to overall PCa risk and likelihood of aggressive disease. DESIGN, SETTING, AND PARTICIPANTS We sequenced 167 DNA repair and eight PCa candidate genes in a UK-based cohort of 1281 young-onset PCa cases (diagnosed at ≤60yr) and 1160 selected controls. OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS Gene-level SKAT-O and gene-set adaptive combination of p values (ADA) analyses were performed separately for cases versus controls, and aggressive (Gleason score ≥8, n=201) versus nonaggressive (Gleason score ≤7, n=1048) cases. RESULTS AND LIMITATIONS We identified 233 unique protein truncating variants (PTVs) with minor allele frequency <0.5% in controls in 97 genes. The total proportion of PTV carriers was higher in cases than in controls (15% vs 12%, odds ratio [OR]=1.29, 95% confidence interval [CI] 1.01-1.64, p=0.036). Gene-level analyses selected NBN (pSKAT-O=2.4×10-4) for overall risk and XPC (pSKAT-O=1.6×10-4) for aggressive disease, both at candidate-level significance (p<3.1×10-4 and p<3.4×10-4, respectively). Gene-set analysis identified a subset of 20 genes associated with increased PCa risk (OR=3.2, 95% CI 2.1-4.8, pADA=4.1×10-3) and four genes that increased risk of aggressive disease (OR=11.2, 95% CI 4.6-27.7, pADA=5.6×10-3), three of which overlap the predisposition gene set. CONCLUSIONS The union of the gene-level and gene-set-level analyses identified 23 unique DNA repair genes associated with PCa predisposition or risk of aggressive disease. These findings will help facilitate the development of a PCa-specific sequencing panel with both predictive and prognostic potential. PATIENT SUMMARY This large sequencing study assessed the rate of inherited DNA repair gene mutations between prostate cancer patients and disease-free men. A panel of 23 genes was identified, which may improve risk prediction or treatment pathways in future clinical practice.
Collapse
Affiliation(s)
- Daniel A Leongamornlert
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Edward J Saunders
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Sarah Wakerell
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Ian Whitmore
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Tokhir Dadaev
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Clara Cieza-Borrella
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Sarah Benafif
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Mark N Brook
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Jenny L Donovan
- School of Social and Community Medicine, University of Bristol, Bristol, UK
| | - Freddie C Hamdy
- Nuffield Department of Surgical Sciences, University of Oxford, Oxford, UK; Faculty of Medical Science, John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - David E Neal
- Department of Oncology, Addenbrooke's Hospital, University of Cambridge, Cambridge, UK; Cancer Research UK Cambridge Research Institute, Li Ka Shing Centre, Cambridge, UK
| | - Kenneth Muir
- Division of Population Health, University of Manchester, Manchester, UK
| | - Koveela Govindasami
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - David V Conti
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, CA, USA
| | - Zsofia Kote-Jarai
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK.
| | - Rosalind A Eeles
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK; The Royal Marsden NHS Foundation Trust, London, UK
| |
Collapse
|
7
|
Khlebus E, Kutsenko V, Meshkov A, Ershova A, Kiseleva A, Shevtsov A, Shcherbakova N, Zharikova A, Lankin V, Tikhaze A, Chazova I, Yarovaya E, Drapkina O, Boytsov S. Multiple rare and common variants in APOB gene locus associated with oxidatively modified low-density lipoprotein levels. PLoS One 2019; 14:e0217620. [PMID: 31150472 PMCID: PMC6544350 DOI: 10.1371/journal.pone.0217620] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Accepted: 05/15/2019] [Indexed: 01/17/2023] Open
Abstract
Oxidatively modified low-density lipoproteins (oxLDL) play an important role in the occurrence and progression of atherosclerosis. To identify the genetic factors influencing the oxLDL levels, we have genotyped 776 DNA samples of Russian individuals for 196,725 single-nucleotide polymorphisms (SNPs) using the Cardio-MetaboChip (Illumina, USA) and conducted genome-wide association study (GWAS). Fourteen common variants in the locus including APOB gene were significantly associated with the oxLDL levels (P < 2.18 × 10−7). These variants explained only 6% of the variation in the oxLDL levels. Then, we assessed the contribution of rare coding variants of APOB gene to the oxLDL levels. Individuals with the extreme oxLDL levels (48 with the lowest and 48 with the highest values) were selected for targeted sequencing of the region including APOB gene. To evaluate the contribution of the SNPs to the oxLDL levels we used various statistical methods for the association analysis of rare variants: WST, SKAT, and SKAT-O. We revealed that both synonymous and nonsynonymous SNPs affected the oxLDL levels. For the joint analysis of the rare and common variants, we conducted the SKAT-C testing and found a group of 15 SNPs significantly associated with the oxLDL levels (P = 2.14 × 10−9). Our results indicate that the oxLDL levels depend on both common and rare variants of the APOB gene.
Collapse
Affiliation(s)
- Eleonora Khlebus
- Federal State Institution National Medical Research Center for Preventive Medicine of the Ministry of Healthcare of the Russian Federation, Moscow, Russia
- Moscow Institute of Physics and Technology (State University), Moscow, Russia
- * E-mail:
| | - Vladimir Kutsenko
- Federal State Institution National Medical Research Center for Preventive Medicine of the Ministry of Healthcare of the Russian Federation, Moscow, Russia
- Lomonosov Moscow State University, Moscow, Russia
| | - Alexey Meshkov
- Federal State Institution National Medical Research Center for Preventive Medicine of the Ministry of Healthcare of the Russian Federation, Moscow, Russia
| | - Alexandra Ershova
- Federal State Institution National Medical Research Center for Preventive Medicine of the Ministry of Healthcare of the Russian Federation, Moscow, Russia
| | - Anna Kiseleva
- Federal State Institution National Medical Research Center for Preventive Medicine of the Ministry of Healthcare of the Russian Federation, Moscow, Russia
| | | | - Natalia Shcherbakova
- Federal State Institution National Medical Research Center for Preventive Medicine of the Ministry of Healthcare of the Russian Federation, Moscow, Russia
| | - Anastasiia Zharikova
- Federal State Institution National Medical Research Center for Preventive Medicine of the Ministry of Healthcare of the Russian Federation, Moscow, Russia
| | - Vadim Lankin
- Federal State Budget Organization National Medical Research Center of Cardiology of the Ministry of Healthcare of the Russian Federation, Moscow, Russia
| | - Alla Tikhaze
- Federal State Budget Organization National Medical Research Center of Cardiology of the Ministry of Healthcare of the Russian Federation, Moscow, Russia
| | - Irina Chazova
- Federal State Budget Organization National Medical Research Center of Cardiology of the Ministry of Healthcare of the Russian Federation, Moscow, Russia
| | | | - Oksana Drapkina
- Federal State Institution National Medical Research Center for Preventive Medicine of the Ministry of Healthcare of the Russian Federation, Moscow, Russia
| | - Sergey Boytsov
- Federal State Budget Organization National Medical Research Center of Cardiology of the Ministry of Healthcare of the Russian Federation, Moscow, Russia
| |
Collapse
|
8
|
Marceau West R, Lu W, Rotroff DM, Kuenemann MA, Chang SM, Wu MC, Wagner MJ, Buse JB, Motsinger-Reif AA, Fourches D, Tzeng JY. Identifying individual risk rare variants using protein structure guided local tests (POINT). PLoS Comput Biol 2019; 15:e1006722. [PMID: 30779729 PMCID: PMC6396946 DOI: 10.1371/journal.pcbi.1006722] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 03/01/2019] [Accepted: 12/17/2018] [Indexed: 01/08/2023] Open
Abstract
Rare variants are of increasing interest to genetic association studies because of their etiological contributions to human complex diseases. Due to the rarity of the mutant events, rare variants are routinely analyzed on an aggregate level. While aggregation analyses improve the detection of global-level signal, they are not able to pinpoint causal variants within a variant set. To perform inference on a localized level, additional information, e.g., biological annotation, is often needed to boost the information content of a rare variant. Following the observation that important variants are likely to cluster together on functional domains, we propose a protein structure guided local test (POINT) to provide variant-specific association information using structure-guided aggregation of signal. Constructed under a kernel machine framework, POINT performs local association testing by borrowing information from neighboring variants in the 3-dimensional protein space in a data-adaptive fashion. Besides merely providing a list of promising variants, POINT assigns each variant a p-value to permit variant ranking and prioritization. We assess the selection performance of POINT using simulations and illustrate how it can be used to prioritize individual rare variants in PCSK9, ANGPTL4 and CETP in the Action to Control Cardiovascular Risk in Diabetes (ACCORD) clinical trial data.
Collapse
Affiliation(s)
- Rachel Marceau West
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Daniel M. Rotroff
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Melaine A. Kuenemann
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Sheng-Mao Chang
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
| | - Michael C. Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Michael J. Wagner
- Center for Pharmacogenomics and Individualized Therapy, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - John B. Buse
- Department of Medicine, University of North Carolina School of Medicine, Chapel Hill, North Carolina, United States of America
| | - Alison A. Motsinger-Reif
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Denis Fourches
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
- Department of Chemistry, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Jung-Ying Tzeng
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
9
|
Wang Z, Sha Q, Fang S, Zhang K, Zhang S. Testing an optimally weighted combination of common and/or rare variants with multiple traits. PLoS One 2018; 13:e0201186. [PMID: 30048520 PMCID: PMC6062080 DOI: 10.1371/journal.pone.0201186] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2018] [Accepted: 07/10/2018] [Indexed: 12/25/2022] Open
Abstract
Recently, joint analysis of multiple traits has become popular because it can increase statistical power to identify genetic variants associated with complex diseases. In addition, there is increasing evidence indicating that pleiotropy is a widespread phenomenon in complex diseases. Currently, most of existing methods test the association between multiple traits and a single genetic variant. However, these methods by analyzing one variant at a time may not be ideal for rare variant association studies because of the allelic heterogeneity as well as the extreme rarity of rare variants. In this article, we developed a statistical method by testing an optimally weighted combination of variants with multiple traits (TOWmuT) to test the association between multiple traits and a weighted combination of variants (rare and/or common) in a genomic region. TOWmuT is robust to the directions of effects of causal variants and is applicable to different types of traits. Using extensive simulation studies, we compared the performance of TOWmuT with the following five existing methods: gene association with multiple traits (GAMuT), multiple sequence kernel association test (MSKAT), adaptive weighting reverse regression (AWRR), single-TOW, and MANOVA. Our results showed that, in all of the simulation scenarios, TOWmuT has correct type I error rates and is consistently more powerful than the other five tests. We also illustrated the usefulness of TOWmuT by analyzing a whole-genome genotyping data from a lung function study.
Collapse
Affiliation(s)
- Zhenchuan Wang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shurong Fang
- Department of Mathematics and Computer Science, John Carroll University, University Heights, Ohio, United States of America
| | - Kui Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
10
|
Sun J, Oualkacha K, Forgetta V, Zheng HF, Richards JB, Evans DS, Orwoll E, Greenwood CMT. Exome-wide rare variant analyses of two bone mineral density phenotypes: the challenges of analyzing rare genetic variation. Sci Rep 2018; 8:220. [PMID: 29317680 PMCID: PMC5760616 DOI: 10.1038/s41598-017-18385-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Accepted: 12/11/2017] [Indexed: 11/08/2022] Open
Abstract
Performance of a recently developed test for association between multivariate phenotypes and sets of genetic variants (MURAT) is demonstrated using measures of bone mineral density (BMD). By combining individual-level whole genome sequenced data from the UK10K study, and imputed genome-wide genetic data on individuals from the Study of Osteoporotic Fractures (SOF) and the Osteoporotic Fractures in Men Study (MrOS), a data set of 8810 individuals was assembled; tests of association were performed between autosomal gene-sets of genetic variants and BMD measured at lumbar spine and femoral neck. Distributions of p-values obtained from analyses of a single BMD phenotype are compared to those from the multivariate tests, across several region definitions and variant weightings. There is evidence of increased power with the multivariate test, although no new loci for BMD were identified. Among 17 genes highlighted either because there were significant p-values in region-based association tests or because they were in well-known BMD genes, 4 windows in 2 genes as well as 6 single SNPs in one of these genes showed association at genome-wide significant thresholds with the multivariate phenotype test but not with the single-phenotype test, Sequence Kernel Association Test (SKAT).
Collapse
Affiliation(s)
- Jianping Sun
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada
| | - Karim Oualkacha
- Département de mathématiques, Université du Québec à Montréal, Montreal, QC, Canada
| | - Vincenzo Forgetta
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada
| | - Hou-Feng Zheng
- Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Westlake University, Hangzhou, Zhejiang, China
- Institute of Aging Research and the Affiliated Hospital, School of Medicine, Hangzhou Normal University, Hangzhou, Zhejiang, China
| | - J Brent Richards
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada
- Department of Human Genetics, McGill University, Montreal, QC, Canada
| | - Daniel S Evans
- California Pacific Medical Center Research Institute, San Francisco, CA, USA
| | - Eric Orwoll
- Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland, OR, USA
| | - Celia M T Greenwood
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada.
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada.
- Department of Human Genetics, McGill University, Montreal, QC, Canada.
- Department of Oncology, McGill University, Montreal, QC, Canada.
| |
Collapse
|
11
|
Adaptive combination of Bayes factors as a powerful method for the joint analysis of rare and common variants. Sci Rep 2017; 7:13858. [PMID: 29066733 PMCID: PMC5654754 DOI: 10.1038/s41598-017-13177-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 09/21/2017] [Indexed: 11/30/2022] Open
Abstract
Multi-marker association tests can be more powerful than single-locus analyses because they aggregate the variant information within a gene/region. However, combining the association signals of multiple markers within a gene/region may cause noise due to the inclusion of neutral variants, which usually compromises the power of a test. To reduce noise, the “adaptive combination of P-values” (ADA) method removes variants with larger P-values. However, when both rare and common variants are considered, it is not optimal to truncate variants according to their P-values. An alternative summary measure, the Bayes factor (BF), is defined as the ratio of the probability of the data under the alternative hypothesis to that under the null hypothesis. The BF quantifies the “relative” evidence supporting the alternative hypothesis. Here, we propose an “adaptive combination of Bayes factors” (ADABF) method that can be directly applied to variants with a wide spectrum of minor allele frequencies. The simulations show that ADABF is more powerful than single-nucleotide polymorphism (SNP)-set kernel association tests and burden tests. We also analyzed 1,109 case-parent trios from the Schizophrenia Trio Genomic Research in Taiwan. Three genes on chromosome 19p13.2 were found to be associated with schizophrenia at the suggestive significance level of 5 × 10−5.
Collapse
|
12
|
Greene D, Richardson S, Turro E, Turro E. A Fast Association Test for Identifying Pathogenic Variants Involved in Rare Diseases. Am J Hum Genet 2017; 101:104-114. [PMID: 28669401 PMCID: PMC5501869 DOI: 10.1016/j.ajhg.2017.05.015] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Accepted: 05/22/2017] [Indexed: 11/26/2022] Open
Abstract
We present a rapid and powerful inference procedure for identifying loci associated with rare hereditary disorders using Bayesian model comparison. Under a baseline model, disease risk is fixed across all individuals in a study. Under an association model, disease risk depends on a latent bipartition of rare variants into pathogenic and non-pathogenic variants, the number of pathogenic alleles that each individual carries, and the mode of inheritance. A parameter indicating presence of an association and the parameters representing the pathogenicity of each variant and the mode of inheritance can be inferred in a Bayesian framework. Variant-specific prior information derived from allele frequency databases, consequence prediction algorithms, or genomic datasets can be integrated into the inference. Association models can be fitted to different subsets of variants in a locus and compared using a model selection procedure. This procedure can improve inference if only a particular class of variants confers disease risk and can suggest particular disease etiologies related to that class. We show that our method, called BeviMed, is more powerful and informative than existing rare variant association methods in the context of dominant and recessive disorders. The high computational efficiency of our algorithm makes it feasible to test for associations in the large non-coding fraction of the genome. We have applied BeviMed to whole-genome sequencing data from 6,586 individuals with diverse rare diseases. We show that it can identify multiple loci involved in rare diseases, while correctly inferring the modes of inheritance, the likely pathogenic variants, and the variant classes responsible.
Collapse
Affiliation(s)
| | | | | | - Ernest Turro
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0XY, UK; NHS Blood and Transplant, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK; Medical Research Council Biostatistics Unit, Cambridge Biomedical Campus, Cambridge CB2 0SR, UK
| |
Collapse
|
13
|
Lin WY, Liang YC. Conditioning adaptive combination of P-values method to analyze case-parent trios with or without population controls. Sci Rep 2016; 6:28389. [PMID: 27341039 PMCID: PMC4920030 DOI: 10.1038/srep28389] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Accepted: 06/02/2016] [Indexed: 11/24/2022] Open
Abstract
Detection of rare causal variants can help uncover the etiology of complex diseases. Recruiting case-parent trios is a popular study design in family-based studies. If researchers can obtain data from population controls, utilizing them in trio analyses can improve the power of methods. The transmission disequilibrium test (TDT) is a well-known method to analyze case-parent trio data. It has been extended to rare-variant association testing (abbreviated as "rvTDT"), with the flexibility to incorporate population controls. The rvTDT method is robust to population stratification. However, power loss may occur in the conditioning process. Here we propose a "conditioning adaptive combination of P-values method" (abbreviated as "conADA"), to analyze trios with/without unrelated controls. By first truncating the variants with larger P-values, we decrease the vulnerability of conADA to the inclusion of neutral variants. Moreover, because the test statistic is developed by conditioning on parental genotypes, conADA generates valid statistical inference in the presence of population stratification. With regard to statistical methods for next-generation sequencing data analyses, validity may be hampered by population stratification, whereas power may be affected by the inclusion of neutral variants. We recommend conADA for its robustness to these two factors (population stratification and the inclusion of neutral variants).
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
- Department of Public Health, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Yun-Chieh Liang
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|