1
|
Kontou PI, Bagos PG. The goldmine of GWAS summary statistics: a systematic review of methods and tools. BioData Min 2024; 17:31. [PMID: 39238044 PMCID: PMC11375927 DOI: 10.1186/s13040-024-00385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 08/27/2024] [Indexed: 09/07/2024] Open
Abstract
Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.
Collapse
Affiliation(s)
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece.
| |
Collapse
|
2
|
Zhu L, Zhang S, Sha Q. Meta-analysis of set-based multiple phenotype association test based on GWAS summary statistics from different cohorts. Front Genet 2024; 15:1359591. [PMID: 39301532 PMCID: PMC11410627 DOI: 10.3389/fgene.2024.1359591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 08/23/2024] [Indexed: 09/22/2024] Open
Abstract
Genome-wide association studies (GWAS) have emerged as popular tools for identifying genetic variants that are associated with complex diseases. Standard analysis of a GWAS involves assessing the association between each variant and a disease. However, this approach suffers from limited reproducibility and difficulties in detecting multi-variant and pleiotropic effects. Although joint analysis of multiple phenotypes for GWAS can identify and interpret pleiotropic loci which are essential to understand pleiotropy in diseases and complex traits, most of the multiple phenotype association tests are designed for a single variant, resulting in much lower power, especially when their effect sizes are small and only their cumulative effect is associated with multiple phenotypes. To overcome these limitations, set-based multiple phenotype association tests have been developed to enhance statistical power and facilitate the identification and interpretation of pleiotropic regions. In this research, we propose a new method, named Meta-TOW-S, which conducts joint association tests between multiple phenotypes and a set of variants (such as variants in a gene) utilizing GWAS summary statistics from different cohorts. Our approach applies the set-based method that Tests for the effect of an Optimal Weighted combination of variants in a gene (TOW) and accounts for sample size differences across GWAS cohorts by employing the Cauchy combination method. Meta-TOW-S combines the advantages of set-based tests and multi-phenotype association tests, exhibiting computational efficiency and enabling analysis across multiple phenotypes while accommodating overlapping samples from different GWAS cohorts. To assess the performance of Meta-TOW-S, we develop a phenotype simulator package that encompasses a comprehensive simulation scheme capable of modeling multiple phenotypes and multiple variants, including noise structures and diverse correlation patterns among phenotypes. Simulation studies validate that Meta-TOW-S maintains a desirable Type I error rate. Further simulation under different scenarios shows that Meta-TOW-S can improve power compared with other existing meta-analysis methods. When applied to four psychiatric disorders summary data, Meta-TOW-S detects a greater number of significant genes.
Collapse
Affiliation(s)
- Lirong Zhu
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| |
Collapse
|
3
|
Ray D, Loomis SJ, Venkataraghavan S, Zhang J, Tin A, Yu B, Chatterjee N, Selvin E, Duggal P. Characterizing Common and Rare Variations in Nontraditional Glycemic Biomarkers Using Multivariate Approaches on Multiancestry ARIC Study. Diabetes 2024; 73:1537-1550. [PMID: 38869630 PMCID: PMC11333373 DOI: 10.2337/db23-0318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 06/05/2024] [Indexed: 06/14/2024]
Abstract
Genetic studies of nontraditional glycemic biomarkers, glycated albumin and fructosamine, can shed light on unknown aspects of type 2 diabetes genetics and biology. We performed a multiphenotype genome-wide association study of glycated albumin and fructosamine from 7,395 White and 2,016 Black participants in the Atherosclerosis Risk in Communities (ARIC) study on common variants from genotyped/imputed data. We discovered two genome-wide significant loci, one mapping to a known type 2 diabetes gene (ARAP1/STARD10) and another mapping to a novel region (UGT1A complex of genes), using multiomics gene-mapping strategies in diabetes-relevant tissues. We identified additional loci that were ancestry- and sex-specific (e.g., PRKCA in African ancestry, FCGRT in European ancestry, TEX29 in males). Further, we implemented multiphenotype gene-burden tests on whole-exome sequence data from 6,590 White and 2,309 Black ARIC participants. Ten variant sets annotated to genes across different variant aggregation strategies were exome-wide significant only in multiancestry analysis, of which CD1D, EGFL7/AGPAT2, and MIR126 had notable enrichment of rare predicted loss of function variants in African ancestry despite smaller sample sizes. Overall, 8 of 14 discovered loci and genes were implicated to influence these biomarkers via glycemic pathways, and most of them were not previously implicated in studies of type 2 diabetes. This study illustrates improved locus discovery and potential effector gene discovery by leveraging joint patterns of related biomarkers across the entire allele frequency spectrum in multiancestry analysis. Future investigation of the loci and genes potentially acting through glycemic pathways may help us better understand the risk of developing type 2 diabetes. ARTICLE HIGHLIGHTS
Collapse
Affiliation(s)
- Debashree Ray
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
| | | | - Sowmya Venkataraghavan
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
| | - Jiachen Zhang
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
| | - Adrienne Tin
- School of Medicine, University of Mississippi Medical Center, Jackson, MS
| | - Bing Yu
- Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX
| | - Nilanjan Chatterjee
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD
| | - Elizabeth Selvin
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
- Welch Center for Prevention, Epidemiology, & Clinical Research, Johns Hopkins University, Baltimore, MD
| | - Priya Duggal
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
| |
Collapse
|
4
|
Mbatchou J, McPeek MS. JASPER: Fast, powerful, multitrait association testing in structured samples gives insight on pleiotropy in gene expression. Am J Hum Genet 2024; 111:1750-1769. [PMID: 39025064 PMCID: PMC11339629 DOI: 10.1016/j.ajhg.2024.06.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 06/19/2024] [Accepted: 06/20/2024] [Indexed: 07/20/2024] Open
Abstract
Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction, and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks, or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture, and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits, and microbiome abundances. It allows for covariates, ascertainment, and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, most of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.
Collapse
Affiliation(s)
- Joelle Mbatchou
- Regeneron Genetics Center, Tarrytown, NY 10591, USA; Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
| | - Mary Sara McPeek
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA; Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
5
|
Choi J, Xu Z, Sun R. Variance-components tests for genetic association with multiple interval-censored outcomes. Stat Med 2024; 43:2560-2574. [PMID: 38636557 PMCID: PMC11116038 DOI: 10.1002/sim.10081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 02/18/2024] [Accepted: 04/02/2024] [Indexed: 04/20/2024]
Abstract
Massive genetic compendiums such as the UK Biobank have become an invaluable resource for identifying genetic variants that are associated with complex diseases. Due to the difficulties of massive data collection, a common practice of these compendiums is to collect interval-censored data. One challenge in analyzing such data is the lack of methodology available for genetic association studies with interval-censored data. Genetic effects are difficult to detect because of their rare and weak nature, and often the time-to-event outcomes are transformed to binary phenotypes for access to more powerful signal detection approaches. However transforming the data to binary outcomes can result in loss of valuable information. To alleviate such challenges, this work develops methodology to associate genetic variant sets with multiple interval-censored outcomes. Testing sets of variants such as genes or pathways is a common approach in genetic association settings to lower the multiple testing burden, aggregate small effects, and improve interpretations of results. Instead of performing inference with only a single outcome, utilizing multiple outcomes can increase statistical power by aggregating information across multiple correlated phenotypes. Simulations show that the proposed strategy can offer significant power gains over a single outcome approach. We apply the proposed test to the investigation that motivated this study, a search for the genes that perturb risks of bone fractures and falls in the UK Biobank.
Collapse
Affiliation(s)
- Jaihee Choi
- Department of Statistics, Rice University, Texas, USA
| | - Zhichao Xu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Texas, USA
| | - Ryan Sun
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Texas, USA
| |
Collapse
|
6
|
Bass AJ, Bian S, Wingo AP, Wingo TS, Cutler DJ, Epstein MP. Identifying latent genetic interactions in genome-wide association studies using multiple traits. Genome Med 2024; 16:62. [PMID: 38664839 PMCID: PMC11044415 DOI: 10.1186/s13073-024-01329-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 04/02/2024] [Indexed: 04/28/2024] Open
Abstract
The "missing" heritability of complex traits may be partly explained by genetic variants interacting with other genes or environments that are difficult to specify, observe, and detect. We propose a new kernel-based method called Latent Interaction Testing (LIT) to screen for genetic interactions that leverages pleiotropy from multiple related traits without requiring the interacting variable to be specified or observed. Using simulated data, we demonstrate that LIT increases power to detect latent genetic interactions compared to univariate methods. We then apply LIT to obesity-related traits in the UK Biobank and detect variants with interactive effects near known obesity-related genes (URL: https://CRAN.R-project.org/package=lit ).
Collapse
Affiliation(s)
- Andrew J Bass
- Department of Human Genetics, Emory University, Atlanta, GA, 30322, USA.
| | - Shijia Bian
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30322, USA
| | - Aliza P Wingo
- Department of Psychiatry, Emory University, Atlanta, GA, 30322, USA
| | - Thomas S Wingo
- Department of Human Genetics, Emory University, Atlanta, GA, 30322, USA
- Department of Neurology, Emory University, Atlanta, GA, 30322, USA
| | - David J Cutler
- Department of Human Genetics, Emory University, Atlanta, GA, 30322, USA
| | - Michael P Epstein
- Department of Human Genetics, Emory University, Atlanta, GA, 30322, USA.
| |
Collapse
|
7
|
Tissink EP, Shadrin AA, van der Meer D, Parker N, Hindley G, Roelfs D, Frei O, Fan CC, Nagel M, Nærland T, Budisteanu M, Djurovic S, Westlye LT, van den Heuvel MP, Posthuma D, Kaufmann T, Dale AM, Andreassen OA. Abundant pleiotropy across neuroimaging modalities identified through a multivariate genome-wide association study. Nat Commun 2024; 15:2655. [PMID: 38531894 DOI: 10.1038/s41467-024-46817-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 03/12/2024] [Indexed: 03/28/2024] Open
Abstract
Genetic pleiotropy is abundant across spatially distributed brain characteristics derived from one neuroimaging modality (e.g. structural, functional or diffusion magnetic resonance imaging [MRI]). A better understanding of pleiotropy across modalities could inform us on the integration of brain function, micro- and macrostructure. Here we show extensive genetic overlap across neuroimaging modalities at a locus and gene level in the UK Biobank (N = 34,029) and ABCD Study (N = 8607). When jointly analysing phenotypes derived from structural, functional and diffusion MRI in a genome-wide association study (GWAS) with the Multivariate Omnibus Statistical Test (MOSTest), we boost the discovery of loci and genes beyond previously identified effects for each modality individually. Cross-modality genes are involved in fundamental biological processes and predominantly expressed during prenatal brain development. We additionally boost prediction of psychiatric disorders by conditioning independent GWAS on our multimodal multivariate GWAS. These findings shed light on the shared genetic mechanisms underlying variation in brain morphology, functional connectivity, and tissue composition.
Collapse
Affiliation(s)
- E P Tissink
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Vrije Universiteit Amsterdam, Amsterdam Neuroscience, 1081 HV, Amsterdam, The Netherlands.
- Department of Sleep and Cognition, Netherlands Institute for Neuroscience, an institute of the Royal Netherlands Academy of Arts and Sciences, Amsterdam, The Netherlands.
| | - A A Shadrin
- NORMENT Centre, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Building 48, Oslo, Norway
| | - D van der Meer
- NORMENT Centre, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Building 48, Oslo, Norway
- School of Mental Health and Neuroscience, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands
| | - N Parker
- NORMENT Centre, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Building 48, Oslo, Norway
| | - G Hindley
- NORMENT Centre, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Building 48, Oslo, Norway
- Psychosis Studies, Institute of Psychiatry, Psychology and Neurosciences, King's College London, 16 De Crespigny Park, London, SE5 8AB, United Kingdom
| | - D Roelfs
- NORMENT Centre, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Building 48, Oslo, Norway
| | - O Frei
- NORMENT Centre, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Building 48, Oslo, Norway
| | - C C Fan
- Laureate Institute for Brain Research, Tulsa, OK, USA
- Department of Radiology, University of California San Diego, La Jolla, CA, 92037, USA
| | - M Nagel
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Vrije Universiteit Amsterdam, Amsterdam Neuroscience, 1081 HV, Amsterdam, The Netherlands
| | - T Nærland
- K.G. Jebsen Centre for Neurodevelopmental disorders, Division of Paediatric Medicine, Institute of Clinical Medicine, University of Oslo, Building 31, Oslo, Norway
| | - M Budisteanu
- Prof. Dr. Alex Obregia Clinical Hospital of Psychiatry, Bucharest, Romania
- "Victor Babes" National Institute of Pathology, Bucharest, Romania
| | - S Djurovic
- NORMENT Centre, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Building 48, Oslo, Norway
- K.G. Jebsen Centre for Neurodevelopmental disorders, Division of Paediatric Medicine, Institute of Clinical Medicine, University of Oslo, Building 31, Oslo, Norway
- Department of Medical Genetics, Oslo University Hospital, Oslo, Norway
| | - L T Westlye
- NORMENT Centre, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Building 48, Oslo, Norway
- K.G. Jebsen Centre for Neurodevelopmental disorders, Division of Paediatric Medicine, Institute of Clinical Medicine, University of Oslo, Building 31, Oslo, Norway
- Department of Psychology, University of Oslo, Oslo, Norway
| | - M P van den Heuvel
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Vrije Universiteit Amsterdam, Amsterdam Neuroscience, 1081 HV, Amsterdam, The Netherlands
- Department of Child and Adolescent Psychology and Psychiatry, section Complex Trait Genetics, Amsterdam Neuroscience, VU University Medical Centre, Amsterdam, The Netherlands
| | - D Posthuma
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Vrije Universiteit Amsterdam, Amsterdam Neuroscience, 1081 HV, Amsterdam, The Netherlands
- Department of Child and Adolescent Psychology and Psychiatry, section Complex Trait Genetics, Amsterdam Neuroscience, VU University Medical Centre, Amsterdam, The Netherlands
| | - T Kaufmann
- NORMENT Centre, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Building 48, Oslo, Norway
- Department of Psychiatry and Psychotherapy, Tübingen Center for Mental Health, University of Tübingen, Tübingen, Germany
| | - A M Dale
- Department of Radiology, University of California San Diego, La Jolla, CA, 92037, USA
- Center for Multimodal Imaging and Genetics, University of California San Diego, La Jolla, CA, 92037, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, 92037, USA
| | - O A Andreassen
- NORMENT Centre, Division of Mental Health and Addiction, Oslo University Hospital and Institute of Clinical Medicine, University of Oslo, Building 48, Oslo, Norway.
- K.G. Jebsen Centre for Neurodevelopmental disorders, Division of Paediatric Medicine, Institute of Clinical Medicine, University of Oslo, Building 31, Oslo, Norway.
| |
Collapse
|
8
|
Mbatchou J, McPeek MS. JASPER: fast, powerful, multitrait association testing in structured samples gives insight on pleiotropy in gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.18.571948. [PMID: 38187553 PMCID: PMC10769254 DOI: 10.1101/2023.12.18.571948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits and microbiome abundances. It allows for covariates, ascertainment and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, some of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.
Collapse
Affiliation(s)
- Joelle Mbatchou
- Regeneron Genetics Center, Tarrytown, NY 10591, USA
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
| | - Mary Sara McPeek
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
- Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
9
|
Li X, Chen H, Selvaraj MS, Van Buren E, Zhou H, Wang Y, Sun R, McCaw ZR, Yu Z, Arnett DK, Bis JC, Blangero J, Boerwinkle E, Bowden DW, Brody JA, Cade BE, Carson AP, Carlson JC, Chami N, Chen YDI, Curran JE, de Vries PS, Fornage M, Franceschini N, Freedman BI, Gu C, Heard-Costa NL, He J, Hou L, Hung YJ, Irvin MR, Kaplan RC, Kardia SL, Kelly T, Konigsberg I, Kooperberg C, Kral BG, Li C, Loos RJ, Mahaney MC, Martin LW, Mathias RA, Minster RL, Mitchell BD, Montasser ME, Morrison AC, Palmer ND, Peyser PA, Psaty BM, Raffield LM, Redline S, Reiner AP, Rich SS, Sitlani CM, Smith JA, Taylor KD, Tiwari H, Vasan RS, Wang Z, Yanek LR, Yu B, Rice KM, Rotter JI, Peloso GM, Natarajan P, Li Z, Liu Z, Lin X. A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.30.564764. [PMID: 37961350 PMCID: PMC10634938 DOI: 10.1101/2023.10.30.564764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Large-scale whole-genome sequencing (WGS) studies have improved our understanding of the contributions of coding and noncoding rare variants to complex human traits. Leveraging association effect sizes across multiple traits in WGS rare variant association analysis can improve statistical power over single-trait analysis, and also detect pleiotropic genes and regions. Existing multi-trait methods have limited ability to perform rare variant analysis of large-scale WGS data. We propose MultiSTAAR, a statistical framework and computationally-scalable analytical pipeline for functionally-informed multi-trait rare variant analysis in large-scale WGS studies. MultiSTAAR accounts for relatedness, population structure and correlation among phenotypes by jointly analyzing multiple traits, and further empowers rare variant association analysis by incorporating multiple functional annotations. We applied MultiSTAAR to jointly analyze three lipid traits (low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides) in 61,861 multi-ethnic samples from the Trans-Omics for Precision Medicine (TOPMed) Program. We discovered new associations with lipid traits missed by single-trait analysis, including rare variants within an enhancer of NIPSNAP3A and an intergenic region on chromosome 1.
Collapse
Affiliation(s)
- Xihao Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Margaret Sunitha Selvaraj
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Eric Van Buren
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Hufeng Zhou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Yuxuan Wang
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Ryan Sun
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Zachary R. McCaw
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Zhi Yu
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Donna K. Arnett
- Provost Office, University of South Carolina, Columbia, SC, USA
| | - Joshua C. Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Donald W. Bowden
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Jennifer A. Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Brian E. Cade
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - April P. Carson
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Jenna C. Carlson
- Department of Human Genetics and Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Nathalie Chami
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yii-Der Ida Chen
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Joanne E. Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Paul S. de Vries
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Myriam Fornage
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, the University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Nora Franceschini
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Barry I. Freedman
- Department of Internal Medicine, Nephrology, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Charles Gu
- Division of Biology & Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
| | - Nancy L. Heard-Costa
- Department of Neurology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Framingham Heart Study, Framingham, MA, USA
| | - Jiang He
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
- Tulane University Translational Science Institute, New Orleans, LA, USA
| | - Lifang Hou
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Yi-Jen Hung
- Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
| | - Marguerite R. Irvin
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Robert C. Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Sharon L.R. Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Tanika Kelly
- Department of Medicine, Division of Nephrology, University of Illinois Chicago, Chicago, IL, USA
| | - Iain Konigsberg
- Department of Biomedical Informatics, University of Colorado, Aurora, CO, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Brian G. Kral
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Changwei Li
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
- Tulane University Translational Science Institute, New Orleans, LA, USA
| | - Ruth J.F. Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Michael C. Mahaney
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Lisa W. Martin
- George Washington University School of Medicine and Health Sciences, Washington, DC, USA
| | - Rasika A. Mathias
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Ryan L. Minster
- Department of Human Genetics and Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Braxton D. Mitchell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - May E. Montasser
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Alanna C. Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Nicholette D. Palmer
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Patricia A. Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Bruce M. Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
- Departments of Epidemiology, University of Washington, Seattle, WA, USA
- Department of Health Systems and Population Health, University of Washington, Seattle, WA, USA
| | - Laura M. Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Alexander P. Reiner
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Departments of Epidemiology, University of Washington, Seattle, WA, USA
| | - Stephen S. Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Colleen M. Sitlani
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Jennifer A. Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Hemant Tiwari
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Ramachandran S. Vasan
- Framingham Heart Study, Framingham, MA, USA
- Department of Quantitative and Qualitative Health Sciences, UT Health San Antonio School of Public Health, San Antonia, TX, USA
| | - Zhe Wang
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lisa R. Yanek
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Bing Yu
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | | | - Kenneth M. Rice
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Gina M. Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Pradeep Natarajan
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Zhonghua Liu
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Xihong Lin
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| |
Collapse
|
10
|
He X, Li SM. Gene-environment interaction in myopia. Ophthalmic Physiol Opt 2023; 43:1438-1448. [PMID: 37486033 DOI: 10.1111/opo.13206] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Revised: 07/06/2023] [Accepted: 07/07/2023] [Indexed: 07/25/2023]
Abstract
Myopia is a health issue that has attracted global attention due to its high prevalence and vision-threatening complications. It is well known that the onset and progression of myopia are related to both genetic and environmental factors: more than 450 common genetic loci have been found to be associated with myopia, while near work and outdoor time are the main environmental risk factors. As for many complex traits, gene-environment interactions are implicated in myopia development. To date, several genetic loci have been found to interact with near work or educational level. Gene-environment interaction research on myopia could yield models that provide more accurate risk predictions, thus improving targeted treatments and preventive strategies. Additionally, such investigations might have the potential to reveal novel genetic information. In this review, we summarised the findings in this field and proposed some topics for future investigations.
Collapse
Affiliation(s)
- Xi He
- Beijing Tongren Hospital, Capital Medical University, Beijing, China
- Beijing Ophthalmology & Visual Sciences Key Laboratory, Beijing, China
| | - Shi-Ming Li
- Beijing Tongren Hospital, Capital Medical University, Beijing, China
- Beijing Ophthalmology & Visual Sciences Key Laboratory, Beijing, China
| |
Collapse
|
11
|
St-Pierre J, Oualkacha K. A copula-based set-variant association test for bivariate continuous, binary or mixed phenotypes. Int J Biostat 2023; 19:369-387. [PMID: 36279152 PMCID: PMC10644254 DOI: 10.1515/ijb-2022-0010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 05/26/2022] [Accepted: 08/23/2022] [Indexed: 11/15/2022]
Abstract
In genome wide association studies (GWAS), researchers are often dealing with dichotomous and non-normally distributed traits, or a mixture of discrete-continuous traits. However, most of the current region-based methods rely on multivariate linear mixed models (mvLMMs) and assume a multivariate normal distribution for the phenotypes of interest. Hence, these methods are not applicable to disease or non-normally distributed traits. Therefore, there is a need to develop unified and flexible methods to study association between a set of (possibly rare) genetic variants and non-normal multivariate phenotypes. Copulas are multivariate distribution functions with uniform margins on the [0, 1] interval and they provide suitable models to deal with non-normality of errors in multivariate association studies. We propose a novel unified and flexible copula-based multivariate association test (CBMAT) for discovering association between a genetic region and a bivariate continuous, binary or mixed phenotype. We also derive a data-driven analytic p-value procedure of the proposed region-based score-type test. Through simulation studies, we demonstrate that CBMAT has well controlled type I error rates and higher power to detect associations compared with other existing methods, for discrete and non-normally distributed traits. At last, we apply CBMAT to detect the association between two genes located on chromosome 11 and several lipid levels measured on 1477 subjects from the ASLPAC study.
Collapse
Affiliation(s)
- Julien St-Pierre
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
| | - Karim Oualkacha
- Département de Mathématiques, Université du Québec à Montréal, Montreal, QC, Canada
| |
Collapse
|
12
|
Liang X, Sun H. Weighted Selection Probability to Prioritize Susceptible Rare Variants in Multi-Phenotype Association Studies with Application to a Soybean Genetic Data Set. J Comput Biol 2023; 30:1075-1088. [PMID: 37871292 DOI: 10.1089/cmb.2022.0487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023] Open
Abstract
Rare variant association studies with multiple traits or diseases have drawn a lot of attention since association signals of rare variants can be boosted if more than one phenotype outcome is associated with the same rare variants. Most of the existing statistical methods to identify rare variants associated with multiple phenotypes are based on a group test, where a pre-specified genetic region is tested one at a time. However, these methods are not designed to locate susceptible rare variants within the genetic region. In this article, we propose new statistical methods to prioritize rare variants within a genetic region when a group test for the genetic region identifies a statistical association with multiple phenotypes. It computes the weighted selection probability (WSP) of individual rare variants and ranks them from largest to smallest according to their WSP. In simulation studies, we demonstrated that the proposed method outperforms other statistical methods in terms of true positive selection, when multiple phenotypes are correlated with each other. We also applied it to our soybean single nucleotide polymorphism (SNP) data with 13 highly correlated amino acids, where we identified some potentially susceptible rare variants in chromosome 19.
Collapse
Affiliation(s)
- Xianglong Liang
- Department of Statistic, Pusan National University, Busan, Korea
| | - Hokeun Sun
- Department of Statistic, Pusan National University, Busan, Korea
| |
Collapse
|
13
|
Bass AJ, Bian S, Wingo AP, Wingo TS, Cutler DJ, Epstein MP. Identifying latent genetic interactions in genome-wide association studies using multiple traits. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.11.557155. [PMID: 37745553 PMCID: PMC10515795 DOI: 10.1101/2023.09.11.557155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Genome-wide association studies of complex traits frequently find that SNP-based estimates of heritability are considerably smaller than estimates from classic family-based studies. This 'missing' heritability may be partly explained by genetic variants interacting with other genes or environments that are difficult to specify, observe, and detect. To circumvent these challenges, we propose a new method to detect genetic interactions that leverages pleiotropy from multiple related traits without requiring the interacting variable to be specified or observed. Our approach, Latent Interaction Testing (LIT), uses the observation that correlated traits with shared latent genetic interactions have trait variance and covariance patterns that differ by genotype. LIT examines the relationship between trait variance/covariance patterns and genotype using a flexible kernel-based framework that is computationally scalable for biobank-sized datasets with a large number of traits. We first use simulated data to demonstrate that LIT substantially increases power to detect latent genetic interactions compared to a trait-by-trait univariate method. We then apply LIT to four obesity-related traits in the UK Biobank and detect genetic variants with interactive effects near known obesity-related genes. Overall, we show that LIT, implemented in the R package lit, uses shared information across traits to improve detection of latent genetic interactions compared to standard approaches.
Collapse
Affiliation(s)
- Andrew J. Bass
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | - Shijia Bian
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Aliza P. Wingo
- Department of Psychiatry, Emory University, Atlanta, GA 30322, USA
| | - Thomas S. Wingo
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
- Department of Neurology, Emory University, Atlanta, GA 30322, USA
| | - David J. Cutler
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | | |
Collapse
|
14
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Excalibur: A new ensemble method based on an optimal combination of aggregation tests for rare-variant association testing for sequencing data. PLoS Comput Biol 2023; 19:e1011488. [PMID: 37708232 PMCID: PMC10522036 DOI: 10.1371/journal.pcbi.1011488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 09/26/2023] [Accepted: 09/04/2023] [Indexed: 09/16/2023] Open
Abstract
The development of high-throughput next-generation sequencing technologies and large-scale genetic association studies produced numerous advances in the biostatistics field. Various aggregation tests, i.e. statistical methods that analyze associations of a trait with multiple markers within a genomic region, have produced a variety of novel discoveries. Notwithstanding their usefulness, there is no single test that fits all needs, each suffering from specific drawbacks. Selecting the right aggregation test, while considering an unknown underlying genetic model of the disease, remains an important challenge. Here we propose a new ensemble method, called Excalibur, based on an optimal combination of 36 aggregation tests created after an in-depth study of the limitations of each test and their impact on the quality of result. Our findings demonstrate the ability of our method to control type I error and illustrate that it offers the best average power across all scenarios. The proposed method allows for novel advances in Whole Exome/Genome sequencing association studies, able to handle a wide range of association models, providing researchers with an optimal aggregation analysis for the genetic regions of interest.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- WELBIO department, WEL Research Institute, Wavre, Belgium
| |
Collapse
|
15
|
Ray D, Loomis SJ, Venkataraghavan S, Tin A, Yu B, Chatterjee N, Selvin E, Duggal P. Characterizing common and rare variations in non-traditional glycemic biomarkers using multivariate approaches on multi-ancestry ARIC study. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.06.13.23289200. [PMID: 37398180 PMCID: PMC10312851 DOI: 10.1101/2023.06.13.23289200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Glycated hemoglobin, fasting glucose, glycated albumin, and fructosamine are biomarkers that reflect different aspects of the glycemic process. Genetic studies of these glycemic biomarkers can shed light on unknown aspects of type 2 diabetes genetics and biology. While there exists several GWAS of glycated hemoglobin and fasting glucose, very few GWAS have focused on glycated albumin or fructosamine. We performed a multi-phenotype GWAS of glycated albumin and fructosamine from 7,395 White and 2,016 Black participants in the Atherosclerosis Risk in Communities (ARIC) study on the common variants from genotyped/imputed data. We found 2 genome-wide significant loci, one mapping to known type 2 diabetes gene (ARAP1/STARD10, p = 2.8 × 10-8) and another mapping to a novel gene (UGT1A, p = 1.4 × 10-8) using multi-omics gene mapping strategies in diabetes-relevant tissues. We identified additional loci that were ancestry-specific (e.g., PRKCA from African ancestry individuals, p = 1.7 × 10-8) and sex-specific (TEX29 locus in males only, p = 3.0 × 10-8). Further, we implemented multi-phenotype gene-burden tests on whole-exome sequence data from 6,590 White and 2,309 Black ARIC participants. Eleven genes across different rare variant aggregation strategies were exome-wide significant only in multi-ancestry analysis. Four out of 11 genes had notable enrichment of rare predicted loss of function variants in African ancestry participants despite smaller sample size. Overall, 8 out of 15 loci/genes were implicated to influence these biomarkers via glycemic pathways. This study illustrates improved locus discovery and potential effector gene discovery by leveraging joint patterns of related biomarkers across entire allele frequency spectrum in multi-ancestry analyses. Most of the loci/genes we identified have not been previously implicated in studies of type 2 diabetes, and future investigation of the loci/genes potentially acting through glycemic pathways may help us better understand risk of developing type 2 diabetes.
Collapse
Affiliation(s)
- Debashree Ray
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
| | | | - Sowmya Venkataraghavan
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
| | - Adrienne Tin
- School of Medicine, University of Mississippi Medical Center, Jackson, MS
| | - Bing Yu
- Department of Epidemiology, UTHealth School of Public Health, Houston, TX
| | - Nilanjan Chatterjee
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD
| | - Elizabeth Selvin
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
- Welch Center for Prevention, Epidemiology, & Clinical Research, Johns Hopkins University, Baltimore, MD
| | - Priya Duggal
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
| |
Collapse
|
16
|
Liu H, Ling W, Hua X, Moon JY, Williams-Nguyen JS, Zhan X, Plantinga AM, Zhao N, Zhang A, Knight R, Qi Q, Burk RD, Kaplan RC, Wu MC. Kernel-based genetic association analysis for microbiome phenotypes identifies host genetic drivers of beta-diversity. MICROBIOME 2023; 11:80. [PMID: 37081571 PMCID: PMC10116795 DOI: 10.1186/s40168-023-01530-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 03/21/2023] [Indexed: 05/03/2023]
Abstract
BACKGROUND Understanding human genetic influences on the gut microbiota helps elucidate the mechanisms by which genetics may influence health outcomes. Typical microbiome genome-wide association studies (GWAS) marginally assess the association between individual genetic variants and individual microbial taxa. We propose a novel approach, the covariate-adjusted kernel RV (KRV) framework, to map genetic variants associated with microbiome beta-diversity, which focuses on overall shifts in the microbiota. The KRV framework evaluates the association between genetics and microbes by comparing similarity in genetic profiles, based on groups of variants at the gene level, to similarity in microbiome profiles, based on the overall microbiome composition, across all pairs of individuals. By reducing the multiple-testing burden and capturing intrinsic structure within the genetic and microbiome data, the KRV framework has the potential of improving statistical power in microbiome GWAS. RESULTS We apply the covariate-adjusted KRV to the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) in a two-stage (first gene-level, then variant-level) genome-wide association analysis for gut microbiome beta-diversity. We have identified an immunity-related gene, IL23R, reported in a previous microbiome genetic association study and discovered 3 other novel genes, 2 of which are involved in immune functions or autoimmune disorders. In addition, simulation studies show that the covariate-adjusted KRV has a greater power than other microbiome GWAS methods that rely on univariate microbiome phenotypes across a range of scenarios. CONCLUSIONS Our findings highlight the value of the covariate-adjusted KRV as a powerful microbiome GWAS approach and support an important role of immunity-related genes in shaping the gut microbiome composition. Video Abstract.
Collapse
Affiliation(s)
- Hongjiao Liu
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Wodan Ling
- Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Xing Hua
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Jee-Young Moon
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Jessica S Williams-Nguyen
- Institute for Research and Education to Advance Community Health, Washington State University, Seattle, WA, 98101, USA
| | - Xiang Zhan
- Department of Biostatistics and Beijing International Center for Mathematical Research, Peking University, Beijing, 100191, China
| | - Anna M Plantinga
- Department of Mathematics and Statistics, Williams College, Williamstown, MA, 01267, USA
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, 21205, USA
| | - Angela Zhang
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Rob Knight
- Departments of Pediatrics, Computer Science & Engineering, and Bioengineering; Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Qibin Qi
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Robert D Burk
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
- Departments of Pediatrics; Microbiology & Immunology; and, Obstetrics, Gynecology & Women's Health, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Robert C Kaplan
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Michael C Wu
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA.
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA.
| |
Collapse
|
17
|
Zigarelli AM, Venera HM, Receveur BA, Wolf JM, Westra J, Tintle NL. Multimarker omnibus tests by leveraging individual marker summary statistics from large biobanks. Ann Hum Genet 2023; 87:125-136. [PMID: 36683423 DOI: 10.1111/ahg.12495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 12/24/2022] [Accepted: 01/04/2023] [Indexed: 01/24/2023]
Abstract
As biobanks become increasingly popular, access to genotypic and phenotypic data continues to increase in the form of precomputed summary statistics (PCSS). Widespread accessibility of PCSS alleviates many issues related to biobank data, including that of data privacy and confidentiality, as well as high computational costs. However, questions remain about how to maximally leverage PCSS for downstream statistical analyses. Here we present a novel method for testing the association of an arbitrary number of single nucleotide variants (SNVs) on a linear combination of phenotypes after adjusting for covariates for common multimarker tests (e.g., SKAT, SKAT-O) without access to individual patient-level data (IPD). We validate exact formulas for each method, and demonstrate their accuracy through simulation studies and an application to fatty acid phenotypic data from the Framingham Heart Study.
Collapse
Affiliation(s)
- Angela M Zigarelli
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Massachusetts, USA
| | - Hanna M Venera
- Division of Biostatistics, University of Michigan, Michigan, USA
| | - Brody A Receveur
- Department of Statistics, George Mason University, Virginia, USA
| | - Jack M Wolf
- Division of Biostatistics, University of Minnesota, Minnesota, USA
| | - Jason Westra
- Department of Math, Computer Science, and Statistics, Dordt University, Iowa, USA
| | - Nathan L Tintle
- Department of Population Health Nursing Sciences, University of Illinois Chicago, Chicago, Illinois, USA
| |
Collapse
|
18
|
Woodward AA, Urbanowicz RJ, Naj AC, Moore JH. Genetic heterogeneity: Challenges, impacts, and methods through an associative lens. Genet Epidemiol 2022; 46:555-571. [PMID: 35924480 PMCID: PMC9669229 DOI: 10.1002/gepi.22497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/06/2022] [Accepted: 07/19/2022] [Indexed: 01/07/2023]
Abstract
Genetic heterogeneity describes the occurrence of the same or similar phenotypes through different genetic mechanisms in different individuals. Robustly characterizing and accounting for genetic heterogeneity is crucial to pursuing the goals of precision medicine, for discovering novel disease biomarkers, and for identifying targets for treatments. Failure to account for genetic heterogeneity may lead to missed associations and incorrect inferences. Thus, it is critical to review the impact of genetic heterogeneity on the design and analysis of population level genetic studies, aspects that are often overlooked in the literature. In this review, we first contextualize our approach to genetic heterogeneity by proposing a high-level categorization of heterogeneity into "feature," "outcome," and "associative" heterogeneity, drawing on perspectives from epidemiology and machine learning to illustrate distinctions between them. We highlight the unique nature of genetic heterogeneity as a heterogeneous pattern of association that warrants specific methodological considerations. We then focus on the challenges that preclude effective detection and characterization of genetic heterogeneity across a variety of epidemiological contexts. Finally, we discuss systems heterogeneity as an integrated approach to using genetic and other high-dimensional multi-omic data in complex disease research.
Collapse
Affiliation(s)
- Alexa A. Woodward
- Department of Biostatistics, Epidemiology and InformaticsUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Ryan J. Urbanowicz
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| | - Adam C. Naj
- Department of Biostatistics, Epidemiology and InformaticsUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Jason H. Moore
- Department of Computational BiomedicineCedars‐Sinai Medical CenterLos AngelesCaliforniaUSA
| |
Collapse
|
19
|
Chen W, Coombes BJ, Larson NB. Recent advances and challenges of rare variant association analysis in the biobank sequencing era. Front Genet 2022; 13:1014947. [PMID: 36276986 PMCID: PMC9582646 DOI: 10.3389/fgene.2022.1014947] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 09/22/2022] [Indexed: 12/04/2022] Open
Abstract
Causal variants for rare genetic diseases are often rare in the general population. Rare variants may also contribute to common complex traits and can have much larger per-allele effect sizes than common variants, although power to detect these associations can be limited. Sequencing costs have steadily declined with technological advancements, making it feasible to adopt whole-exome and whole-genome profiling for large biobank-scale sample sizes. These large amounts of sequencing data provide both opportunities and challenges for rare-variant association analysis. Herein, we review the basic concepts of rare-variant analysis methods, the current state-of-the-art methods in utilizing variant annotations or external controls to improve the statistical power, and particular challenges facing rare variant analysis such as accounting for population structure, extremely unbalanced case-control design. We also review recent advances and challenges in rare variant analysis for familial sequencing data and for more complex phenotypes such as survival data. Finally, we discuss other potential directions for further methodology investigation.
Collapse
Affiliation(s)
- Wenan Chen
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| | - Brandon J. Coombes
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| | - Nicholas B. Larson
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| |
Collapse
|
20
|
Bielak LF, Peyser PA, Smith JA, Zhao W, Ruiz‐Narvaez EA, Kardia SLR, Harlow SD. Multivariate, region-based genetic analyses of facets of reproductive aging in White and Black women. Mol Genet Genomic Med 2022; 10:e1896. [PMID: 35179313 PMCID: PMC9000932 DOI: 10.1002/mgg3.1896] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 01/14/2022] [Accepted: 01/31/2022] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Age at final menstrual period (FMP) and the accompanying hormone trajectories across the menopause transition do not occur in isolation, but likely share molecular pathways. Understanding the genetics underlying the endocrinology of the menopause transition may be enhanced by jointly analyzing multiple interrelated traits. METHODS In a sample of 347 White and 164 Black women from the Study of Women's Health Across the Nation (SWAN), we investigated pleiotropic effects of 54 candidate genetic regions of interest (ROI) on 5 menopausal traits (age at FMP and premenopausal and postmenopausal levels of follicle stimulation hormone and estradiol) using multivariate kernel regression (Multi-SKAT). A backward elimination procedure was used to identify which subset of traits were most strongly associated with a specific ROI. RESULTS In White women, the 20 kb ROI around rs10734411 was significantly associated with the multivariate distribution of age at FMP, premenopausal estradiol, and postmenopausal estradiol (omnibus p-value = .00004). This association did not replicate in the smaller sample of Black women. CONCLUSION This study using a region-based, multiple-trait approach suggests a shared genetic basis among multiple facets of reproductive aging.
Collapse
Affiliation(s)
- Lawrence F. Bielak
- Department of Epidemiology, School of Public HealthUniversity of MichiganAnn ArborMichiganUSA
| | - Patricia A. Peyser
- Department of Epidemiology, School of Public HealthUniversity of MichiganAnn ArborMichiganUSA
| | - Jennifer A. Smith
- Department of Epidemiology, School of Public HealthUniversity of MichiganAnn ArborMichiganUSA,Survey Research Center, Institute for Social ResearchUniversity of MichiganAnn ArborMichiganUSA
| | - Wei Zhao
- Department of Epidemiology, School of Public HealthUniversity of MichiganAnn ArborMichiganUSA
| | - Edward A. Ruiz‐Narvaez
- Department of Nutritional Sciences, School of Public HealthUniversity of MichiganAnn ArborMichiganUSA
| | - Sharon L. R. Kardia
- Department of Epidemiology, School of Public HealthUniversity of MichiganAnn ArborMichiganUSA
| | - Sioban D. Harlow
- Department of Epidemiology, School of Public HealthUniversity of MichiganAnn ArborMichiganUSA
| |
Collapse
|
21
|
Kernel-based gene-environment interaction tests for rare variants with multiple quantitative phenotypes. PLoS One 2022; 17:e0275929. [PMID: 36223383 PMCID: PMC9555665 DOI: 10.1371/journal.pone.0275929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Accepted: 09/26/2022] [Indexed: 11/19/2022] Open
Abstract
Previous studies have suggested that gene-environment interactions (GEIs) between a common variant and an environmental factor can influence multiple correlated phenotypes simultaneously, that is, GEI pleiotropy, and that analyzing multiple phenotypes jointly is more powerful than analyzing phenotypes separately by using single-phenotype GEI tests. Methods to test the GEI for rare variants with multiple phenotypes are, however, lacking. In our work, we model the correlation among the GEI effects of a variant on multiple quantitative phenotypes through four kernels and propose four multiphenotype GEI tests for rare variants, which are a test with a homogeneous kernel (Hom-GEI), a test with a heterogeneous kernel (Het-GEI), a test with a projection phenotype kernel (PPK-GEI) and a test with a linear phenotype kernel (LPK-GEI). Through numerical simulations, we show that correlation among phenotypes can enhance the statistical power except for LPK-GEI, which simply combines statistics from single-phenotype GEI tests and ignores the phenotypic correlations. Among almost all considered scenarios, Het-GEI and PPK-GEI are more powerful than Hom-GEI and LPK-GEI. We apply Het-GEI and PPK-GEI in the genome-wide GEI analysis of systolic blood pressure (SBP) and diastolic blood pressure (DBP) in the UK Biobank. We analyze 18,101 genes and find that LEUTX is associated with SBP and DBP (p = 2.20×10-6) through its interaction with hemoglobin. The single-phenotype GEI test and our multiphenotype GEI tests Het-GEI and PPK-GEI are also used to evaluate the gene-hemoglobin interactions for 22 genes that were previously reported to be associated with SBP or DBP in a meta-analysis of genetic main effects. MYO1C shows nominal significance (p < 0.05) by the Het-GEI test. NOS3 shows nominal significance in DBP and MYO1C in both SBP and DBP by the single-phenotype GEI test.
Collapse
|
22
|
Kim J, Shen J, Wang A, Mehrotra DV, Ko S, Zhou JJ, Zhou H. VCSEL: Prioritizing SNP-set by penalized variance component selection. Ann Appl Stat 2021; 15:1652-1672. [DOI: 10.1214/21-aoas1491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Juhyun Kim
- Department of Biostatistics, University of California, Los Angeles
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc
| | - Anran Wang
- Biostatistics and Research Decision Sciences, Merck & Co., Inc
| | | | - Seyoon Ko
- Department of Biostatistics, University of California, Los Angeles
| | - Jin J. Zhou
- Department of Medicine, University of California, Los Angeles
| | - Hua Zhou
- Department of Biostatistics, University of California, Los Angeles
| |
Collapse
|
23
|
Shi J, Boehnke M, Lee S. Trans-ethnic meta-analysis of rare variants in sequencing association studies. Biostatistics 2021; 22:706-722. [PMID: 31883325 DOI: 10.1093/biostatistics/kxz061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 11/06/2019] [Accepted: 12/02/2019] [Indexed: 11/15/2022] Open
Abstract
Trans-ethnic meta-analysis is a powerful tool for detecting novel loci in genetic association studies. However, in the presence of heterogeneity among different populations, existing gene-/region-based rare variants meta-analysis methods may be unsatisfactory because they do not consider genetic similarity or dissimilarity among different populations. In response, we propose a score test under the modified random effects model for gene-/region-based rare variants associations. We adapt the kernel regression framework to construct the model and incorporate genetic similarities across populations into modeling the heterogeneity structure of the genetic effect coefficients. We use a resampling-based copula method to approximate asymptotic distribution of the test statistic, enabling efficient estimation of p-values. Simulation studies show that our proposed method controls type I error rates and increases power over existing approaches in the presence of heterogeneity. We illustrate our method by analyzing T2D-GENES consortium exome sequence data to explore rare variant associations with several traits.
Collapse
Affiliation(s)
- Jingchunzi Shi
- Thomas Francis, Jr. School of Public Health II, 1420 Washington Heights, Ann Arbor, MI 48109, USA
| | - Michael Boehnke
- Thomas Francis, Jr. School of Public Health II, 1420 Washington Heights, Ann Arbor, MI 48109, USA
| | - Seunggeun Lee
- Thomas Francis, Jr. School of Public Health II, 1420 Washington Heights, Ann Arbor, MI 48109, USA
| |
Collapse
|
24
|
Wolf JM, Westra J, Tintle N. Using Summary Statistics to Model Multiplicative Combinations of Initially Analyzed Phenotypes With a Flexible Choice of Covariates. Front Genet 2021; 12:745901. [PMID: 34712269 PMCID: PMC8546319 DOI: 10.3389/fgene.2021.745901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 09/23/2021] [Indexed: 12/03/2022] Open
Abstract
While the promise of electronic medical record and biobank data is large, major questions remain about patient privacy, computational hurdles, and data access. One promising area of recent development is pre-computing non-individually identifiable summary statistics to be made publicly available for exploration and downstream analysis. In this manuscript we demonstrate how to utilize pre-computed linear association statistics between individual genetic variants and phenotypes to infer genetic relationships between products of phenotypes (e.g., ratios; logical combinations of binary phenotypes using "and" and "or") with customized covariate choices. We propose a method to approximate covariate adjusted linear models for products and logical combinations of phenotypes using only pre-computed summary statistics. We evaluate our method's accuracy through several simulation studies and an application modeling ratios of fatty acids using data from the Framingham Heart Study. These studies show consistent ability to recapitulate analysis results performed on individual level data including maintenance of the Type I error rate, power, and effect size estimates. An implementation of this proposed method is available in the publicly available R package pcsstools.
Collapse
Affiliation(s)
- Jack M. Wolf
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, United States
| | - Jason Westra
- Department of Mathematics, Computer Science, and Statistics, Dordt University, Sioux Center, IA, United States
| | - Nathan Tintle
- Department of Mathematics, Computer Science, and Statistics, Dordt University, Sioux Center, IA, United States
- Department of Population Health Nursing Science, College of Nursing, University of Illinois Chicago, Chicago, IL, United States
| |
Collapse
|
25
|
Shao Z, Wang T, Zhang M, Jiang Z, Huang S, Zeng P. IUSMMT: Survival mediation analysis of gene expression with multiple DNA methylation exposures and its application to cancers of TCGA. PLoS Comput Biol 2021; 17:e1009250. [PMID: 34464378 PMCID: PMC8437300 DOI: 10.1371/journal.pcbi.1009250] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 09/13/2021] [Accepted: 07/06/2021] [Indexed: 02/07/2023] Open
Abstract
Effective and powerful survival mediation models are currently lacking. To partly fill such knowledge gap, we particularly focus on the mediation analysis that includes multiple DNA methylations acting as exposures, one gene expression as the mediator and one survival time as the outcome. We proposed IUSMMT (intersection-union survival mixture-adjusted mediation test) to effectively examine the existence of mediation effect by fitting an empirical three-component mixture null distribution. With extensive simulation studies, we demonstrated the advantage of IUSMMT over existing methods. We applied IUSMMT to ten TCGA cancers and identified multiple genes that exhibited mediating effects. We further revealed that most of the identified regions, in which genes behaved as active mediators, were cancer type-specific and exhibited a full mediation from DNA methylation CpG sites to the survival risk of various types of cancers. Overall, IUSMMT represents an effective and powerful alternative for survival mediation analysis; our results also provide new insights into the functional role of DNA methylation and gene expression in cancer progression/prognosis and demonstrate potential therapeutic targets for future clinical practice.
Collapse
Affiliation(s)
- Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Meng Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, China
| |
Collapse
|
26
|
Mbatchou J, Barnard L, Backman J, Marcketta A, Kosmicki JA, Ziyatdinov A, Benner C, O'Dushlaine C, Barber M, Boutkov B, Habegger L, Ferreira M, Baras A, Reid J, Abecasis G, Maxwell E, Marchini J. Computationally efficient whole-genome regression for quantitative and binary traits. Nat Genet 2021; 53:1097-1103. [PMID: 34017140 DOI: 10.1038/s41588-021-00870-7] [Citation(s) in RCA: 431] [Impact Index Per Article: 143.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Accepted: 04/13/2021] [Indexed: 11/08/2022]
Abstract
Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case-control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | - Aris Baras
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | |
Collapse
|
27
|
Bi W, Lee S. Scalable and Robust Regression Methods for Phenome-Wide Association Analysis on Large-Scale Biobank Data. Front Genet 2021; 12:682638. [PMID: 34211504 PMCID: PMC8239389 DOI: 10.3389/fgene.2021.682638] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 05/17/2021] [Indexed: 02/05/2023] Open
Abstract
With the advances in genotyping technologies and electronic health records (EHRs), large biobanks have been great resources to identify novel genetic associations and gene-environment interactions on a genome-wide and even a phenome-wide scale. To date, several phenome-wide association studies (PheWAS) have been performed on biobank data, which provides comprehensive insights into many aspects of human genetics and biology. Although inspiring, PheWAS on large-scale biobank data encounters new challenges including computational burden, unbalanced phenotypic distribution, and genetic relationship. In this paper, we first discuss these new challenges and their potential impact on data analysis. Then, we summarize approaches that are scalable and robust in GWAS and PheWAS. This review can serve as a practical guide for geneticists, epidemiologists, and other medical researchers to identify genetic variations associated with health-related phenotypes in large-scale biobank data analysis. Meanwhile, it can also help statisticians to gain a comprehensive and up-to-date understanding of the current technical tool development.
Collapse
Affiliation(s)
- Wenjian Bi
- Department of Medical Genetics, School of Basic Medical Sciences, Peking University, Beijing, China
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, United States
| | - Seunggeun Lee
- Graduate School of Data Science, Seoul National University, Seoul, South Korea
| |
Collapse
|
28
|
Borda V, da Silva Francisco Junior R, Carvalho JB, Morais GL, Duque Rossi Á, Pezzuto P, Azevedo GS, Schamber-Reis BL, Portari EA, Melo A, Moreira MEL, Guida LC, Cunha DP, Gomes L, Vasconcelos ZFM, Faucz FR, Tanuri A, Stratakis CA, Aguiar RS, Cardoso CC, de Vasconcelos ATR. Whole-exome sequencing reveals insights into genetic susceptibility to Congenital Zika Syndrome. PLoS Negl Trop Dis 2021; 15:e0009507. [PMID: 34125832 PMCID: PMC8224898 DOI: 10.1371/journal.pntd.0009507] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 06/24/2021] [Accepted: 05/26/2021] [Indexed: 12/30/2022] Open
Abstract
Congenital Zika Syndrome (CZS) is a critical illness with a wide range of severity caused by Zika virus (ZIKV) infection during pregnancy. Life-threatening neurodevelopmental dysfunctions are among the most common phenotypes observed in affected newborns. Risk factors that contribute to susceptibility and response to ZIKV infection may be related to the virus itself, the environment, and maternal genetic background. Nevertheless, the newborn’s genetic contribution to the critical illness is still not elucidated. Here, we aimed to identify possible genetic variants as well as relevant biological pathways that might be associated with CZS phenotypes. For this purpose, we performed a whole-exome sequencing in 40 children born to women with confirmed exposure to ZIKV during pregnancy. We investigated the occurrence of rare harmful single-nucleotide variants (SNVs) possibly associated with inborn errors in genes ontologically related to CZS phenotypes. Moreover, an exome-wide association analysis was also performed using a case-control design (29 CZS cases and 11 controls), for both common and rare variants. Five out of the 29 CZS patients harbored known pathogenic variants likely to contribute to mild to severe manifestations observed. Approximately, 30% of affected individuals carried at least one pathogenic or likely pathogenic SNV in genes candidates to play a role in CZS. Our common variant association analysis detected a suggestive protective effect of the rs2076469 in DISP3 gene (p-value: 1.39 x 10−5). The IL12RB2 gene (p-value: 2.18x10-11) also showed an unusual distribution of nonsynonymous rare SNVs in control samples. Finally, genes harboring harmful variants are involved in processes related to CZS phenotypes such as neurological development and immunity. Therefore, both rare and common variations may be likely to contribute as the underlying genetic cause of CZS susceptibility. The variations and pathways identified in this study may also have implications for the development of therapeutic strategies in the future. Since the beginning of Zika virus outbreak in Brazil, five years ago, we still don’t understand the genetic factors associated with the small number of babies born with Congenital Zika Syndrome (CZS). Here, we focused on the host genetic susceptibility by studying the whole-exome of the CZS affected (n = 29) and healthy (n = 11) neonates, both born to ZIKV infected women from Brazil. We applied two strategies: 1) Determine whether cases individuals have pathogenic or harmful variants that explain the CZS outcomes (i.e. microcephaly) independently of ZIKV infection or not, 2) Exploring the common and rare variants association with CZS. We found that common and rare variants in genes like DISP3 and IL12RB2 could explain some level of the susceptibility to CZS. Moreover, by considering these and other candidate genes, we observed an over-representation of Gene Ontology terms related to neurological system, metabolism and microtubule-cytoskeleton organization.
Collapse
Affiliation(s)
- Victor Borda
- Laboratório de Bioinformática, Laboratório Nacional de Computação Científica LNCC/MCTIC Petrópolis, Brazil
| | | | - Joseane B. Carvalho
- Laboratório de Bioinformática, Laboratório Nacional de Computação Científica LNCC/MCTIC Petrópolis, Brazil
| | - Guilherme L. Morais
- Laboratório de Bioinformática, Laboratório Nacional de Computação Científica LNCC/MCTIC Petrópolis, Brazil
| | - Átila Duque Rossi
- Laboratório de Virologia Molecular, Instituto de Biologia, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Paula Pezzuto
- Laboratório de Virologia Molecular, Instituto de Biologia, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | | | - Bruno L. Schamber-Reis
- Faculdade de Ciências Médicas de Campina Grande, Núcleo de Genética Médica, Centro Universitário UniFacisa, Campina Grande, Brazil
| | | | - Adriana Melo
- Instituto de Pesquisa Professor Amorim Neto, Campina Grande Brazil
- Faculdade de Ciências Médicas de Campina Grande, Núcleo de Genética Médica, Centro Universitário UniFacisa, Campina Grande, Brazil
| | | | | | | | - Leonardo Gomes
- Instituto Fernandes Figueira, Fiocruz, Rio de Janeiro, Brazil
| | | | - Fabio R. Faucz
- Section on Endocrinology and Genetics, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Amilcar Tanuri
- Laboratório de Virologia Molecular, Instituto de Biologia, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Constantine A. Stratakis
- Section on Endocrinology and Genetics, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Renato S. Aguiar
- Departamento de Genética, Ecologia e Evolução Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
- * E-mail: (RSA); (CCC); (ATRV)
| | - Cynthia Chester Cardoso
- Laboratório de Virologia Molecular, Instituto de Biologia, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
- * E-mail: (RSA); (CCC); (ATRV)
| | - Ana Tereza Ribeiro de Vasconcelos
- Laboratório de Bioinformática, Laboratório Nacional de Computação Científica LNCC/MCTIC Petrópolis, Brazil
- * E-mail: (RSA); (CCC); (ATRV)
| |
Collapse
|
29
|
Dutta D, VandeHaar P, Fritsche LG, Zöllner S, Boehnke M, Scott LJ, Lee S. A powerful subset-based method identifies gene set associations and improves interpretation in UK Biobank. Am J Hum Genet 2021; 108:669-681. [PMID: 33730541 DOI: 10.1016/j.ajhg.2021.02.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Accepted: 02/19/2021] [Indexed: 02/06/2023] Open
Abstract
Tests of association between a phenotype and a set of genes in a biological pathway can provide insights into the genetic architecture of complex phenotypes beyond those obtained from single-variant or single-gene association analysis. However, most existing gene set tests have limited power to detect gene set-phenotype association when a small fraction of the genes are associated with the phenotype and cannot identify the potentially "active" genes that might drive a gene set-based association. To address these issues, we have developed Gene set analysis Association Using Sparse Signals (GAUSS), a method for gene set association analysis that requires only GWAS summary statistics. For each significantly associated gene set, GAUSS identifies the subset of genes that have the maximal evidence of association and can best account for the gene set association. Using pre-computed correlation structure among test statistics from a reference panel, our p value calculation is substantially faster than other permutation- or simulation-based approaches. In simulations with varying proportions of causal genes, we find that GAUSS effectively controls type 1 error rate and has greater power than several existing methods, particularly when a small proportion of genes account for the gene set signal. Using GAUSS, we analyzed UK Biobank GWAS summary statistics for 10,679 gene sets and 1,403 binary phenotypes. We found that GAUSS is scalable and identified 13,466 phenotype and gene set association pairs. Within these gene sets, we identify an average of 17.2 (max = 405) genes that underlie these gene set associations.
Collapse
Affiliation(s)
- Diptavo Dutta
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Peter VandeHaar
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Lars G Fritsche
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Sebastian Zöllner
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Michael Boehnke
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Laura J Scott
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Seunggeun Lee
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Graduate School of Data Science, Seoul National University, Seoul 08826, Republic of Korea.
| |
Collapse
|
30
|
Associating Multivariate Traits with Genetic Variants Using Collapsing and Kernel Methods with Pedigree- or Population-Based Studies. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:8812282. [PMID: 33628328 PMCID: PMC7889379 DOI: 10.1155/2021/8812282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Revised: 01/02/2021] [Accepted: 01/08/2021] [Indexed: 11/18/2022]
Abstract
In genetic association analysis, several relevant phenotypes or multivariate traits with different types of components are usually collected to study complex or multifactorial diseases. Over the past few years, jointly testing for association between multivariate traits and multiple genetic variants has become more popular because it can increase statistical power to identify causal genes in pedigree- or population-based studies. However, most of the existing methods mainly focus on testing genetic variants associated with multiple continuous phenotypes. In this investigation, we develop a framework for identifying the pleiotropic effects of genetic variants on multivariate traits by using collapsing and kernel methods with pedigree- or population-structured data. The proposed framework is applicable to the burden test, the kernel test, and the omnibus test for autosomes and the X chromosome. The proposed multivariate trait association methods can accommodate continuous phenotypes or binary phenotypes and further can adjust for covariates. Simulation studies show that the performance of our methods is satisfactory with respect to the empirical type I error rates and power rates in comparison with the existing methods.
Collapse
|
31
|
Chen L, Zhou Y. A fast and powerful aggregated Cauchy association test for joint analysis of multiple phenotypes. Genes Genomics 2021; 43:69-77. [PMID: 33432394 DOI: 10.1007/s13258-020-01034-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 12/23/2020] [Indexed: 11/27/2022]
Abstract
BACKGROUND Pleiotropy is a widespread phenomenon in complex human diseases. Jointly analyzing multiple phenotypes can improve power performance of detecting genetic variants and uncover the underlying genetic mechanism. OBJECTIVE This study aims to detect the association between genetic variants in a genomic region and multiple phenotypes. METHODS We develop the aggregated Cauchy association test to detect the association between rare variants in a genomic region and multiple phenotypes (abbreviated as "Multi-ACAT"). Multi-ACAT first detects the association between each rare variant and multiple phenotypes based on reverse regression and obtains variant-level p-values, then takes linear combination of transformed p-values as the test statistic which approximately follows Cauchy distribution under the null hypothesis. RESULTS Extensive simulation studies show that when the proportion of causal variants in a genomic region is extremely small, Multi-ACAT is more powerful than the other several methods and is robust to bi-directional effects of causal variants. Finally, we illustrate our proposed method by analyzing two phenotypes [systolic blood pressure (SBP) and diastolic blood pressure (DBP)] from Genetic Analysis Workshop 19 (GAW19). CONCLUSION The Multi-ACAT computes extremely fast, does not consider complex distributions of multiple correlated phenotypes, and can be applied to the case with noise phenotypes.
Collapse
Affiliation(s)
- Lili Chen
- School of Mathematical Sciences, Heilongjiang University, No. 74 Xuefu Road, Nangang District, Harbin, 150080, People's Republic of China
| | - Yajing Zhou
- School of Mathematical Sciences, Heilongjiang University, No. 74 Xuefu Road, Nangang District, Harbin, 150080, People's Republic of China.
| |
Collapse
|
32
|
Liu D, Alhazmi N, Matthews H, Lee MK, Li J, Hecht JT, Wehby GL, Moreno LM, Heike CL, Roosenboom J, Feingold E, Marazita ML, Claes P, Liao EC, Weinberg SM, Shaffer JR. Impact of low-frequency coding variants on human facial shape. Sci Rep 2021; 11:748. [PMID: 33436952 PMCID: PMC7804299 DOI: 10.1038/s41598-020-80661-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Accepted: 12/18/2020] [Indexed: 01/29/2023] Open
Abstract
The contribution of low-frequency variants to the genetic architecture of normal-range facial traits is unknown. We studied the influence of low-frequency coding variants (MAF < 1%) in 8091 genes on multi-dimensional facial shape phenotypes in a European cohort of 2329 healthy individuals. Using three-dimensional images, we partitioned the full face into 31 hierarchically arranged segments to model facial morphology at multiple levels, and generated multi-dimensional phenotypes representing the shape variation within each segment. We used MultiSKAT, a multivariate kernel regression approach to scan the exome for face-associated low-frequency variants in a gene-based manner. After accounting for multiple tests, seven genes (AR, CARS2, FTSJ1, HFE, LTB4R, TELO2, NECTIN1) were significantly associated with shape variation of the cheek, chin, nose and mouth areas. These genes displayed a wide range of phenotypic effects, with some impacting the full face and others affecting localized regions. The missense variant rs142863092 in NECTIN1 had a significant effect on chin morphology and was predicted bioinformatically to have a deleterious effect on protein function. Notably, NECTIN1 is an established craniofacial gene that underlies a human syndrome that includes a mandibular phenotype. We further showed that nectin1a mutations can affect zebrafish craniofacial development, with the size and shape of the mandibular cartilage altered in mutant animals. Findings from this study expanded our understanding of the genetic basis of normal-range facial shape by highlighting the role of low-frequency coding variants in several novel genes.
Collapse
Affiliation(s)
- Dongjing Liu
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Nora Alhazmi
- Department of Oral Biology, Harvard School of Dental Medicine, Boston, MA, USA
- King Saud Bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
| | - Harold Matthews
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Medical Imaging Research Center, UZ Gasthuisberg, Leuven, Belgium
| | - Myoung Keun Lee
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Jiarui Li
- Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium
| | - Jacqueline T Hecht
- Department of Pediatrics, University of Texas McGovern Medical Center, Houston, TX, USA
| | - George L Wehby
- Department of Health Management and Policy, University of Iowa, Iowa City, IA, USA
| | - Lina M Moreno
- Department of Orthodontics, University of Iowa, Iowa City, IA, USA
| | - Carrie L Heike
- Department of Pediatrics, Seattle Children's Craniofacial Center, University of Washington, Seattle, WA, USA
| | - Jasmien Roosenboom
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Eleanor Feingold
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Mary L Marazita
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Peter Claes
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Department of Electrical Engineering, ESAT/PSI, KU Leuven, Leuven, Belgium
| | - Eric C Liao
- Department of Surgery, Center for Regenerative Medicine, Massachusetts General Hospital, Shriners Hospital, Boston, MA, USA
| | - Seth M Weinberg
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA.
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, USA.
| | - John R Shaffer
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA.
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
33
|
Wen Y, Lu Q. An optimal kernel-based multivariate U-statistic to test for associations with multiple phenotypes. Biostatistics 2020; 23:705-720. [PMID: 33108446 DOI: 10.1093/biostatistics/kxaa049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 09/24/2020] [Accepted: 10/03/2020] [Indexed: 11/13/2022] Open
Abstract
Set-based analysis that jointly considers multiple predictors in a group has been broadly conducted for association tests. However, their power can be sensitive to the distribution of phenotypes, and the underlying relationships between predictors and outcomes. Moreover, most of the set-based methods are designed for single-trait analysis, making it hard to explore the pleiotropic effect and borrow information when multiple phenotypes are available. Here, we propose a kernel-based multivariate U-statistics (KMU) that is robust and powerful in testing the association between a set of predictors and multiple outcomes. We employed a rank-based kernel function for the outcomes, which makes our method robust to various outcome distributions. Rather than selecting a single kernel, our test statistics is built based on multiple kernels selected in a data-driven manner, and thus is capable of capturing various complex relationships between predictors and outcomes. The asymptotic properties of our test statistics have been developed. Through simulations, we have demonstrated that KMU has controlled type I error and higher power than its counterparts. We further showed its practical utility by analyzing a whole genome sequencing data from Alzheimer's Disease Neuroimaging Initiative study, where novel genes have been detected to be associated with imaging phenotypes.
Collapse
Affiliation(s)
- Y Wen
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Qing Lu
- Department of Biostatistics, College of Public Health, University of Florida, Gainesville, FL, USA
| |
Collapse
|
34
|
Polygenic inheritance, GWAS, polygenic risk scores, and the search for functional variants. Proc Natl Acad Sci U S A 2020; 117:18924-18933. [PMID: 32753378 DOI: 10.1073/pnas.2005634117] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The reconciliation between Mendelian inheritance of discrete traits and the genetically based correlation between relatives for quantitative traits was Fisher's infinitesimal model of a large number of genetic variants, each with very small effects, whose causal effects could not be individually identified. The development of genome-wide genetic association studies (GWAS) raised the hope that it would be possible to identify single polymorphic variants with identifiable functional effects on complex traits. It soon became clear that, with larger and larger GWAS on more and more complex traits, most of the significant associations had such small effects, that identifying their individual functional effects was essentially hopeless. Polygenic risk scores that provide an overall estimate of the genetic propensity to a trait at the individual level have been developed using GWAS data. These provide useful identification of groups of individuals with substantially increased risks, which can lead to recommendations of medical treatments or behavioral modifications to reduce risks. However, each such claim will require extensive investigation to justify its practical application. The challenge now is to use limited genetic association studies to find individually identifiable variants of significant functional effect that can help to understand the molecular basis of complex diseases and traits, and so lead to improved disease prevention and treatment. This can best be achieved by 1) the study of rare variants, often chosen by careful candidate assessment, and 2) the careful choice of phenotypes, often extremes of a quantitative variable, or traits with relatively high heritability.
Collapse
|
35
|
Rotroff DM. A Bioinformatics Crash Course for Interpreting Genomics Data. Chest 2020; 158:S113-S123. [PMID: 32658646 PMCID: PMC8176646 DOI: 10.1016/j.chest.2020.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 11/11/2019] [Accepted: 03/09/2020] [Indexed: 10/23/2022] Open
Abstract
Reductions in genotyping costs and improvements in computational power have made conducting genome-wide association studies (GWAS) standard practice for many complex diseases. GWAS is the assessment of genetic variants across the genome of many individuals to determine which, if any, genetic variants are associated with a specific trait. As with any analysis, there are evolving best practices that should be followed to ensure scientific rigor and reliability in the conclusions. This article presents a brief summary for many of the key bioinformatics considerations when either planning or evaluating GWAS. This review is meant to serve as a guide to those without deep expertise in bioinformatics and GWAS and give them tools to critically evaluate this popular approach to investigating complex diseases. In addition, a checklist is provided that can be used by investigators to evaluate whether a GWAS has appropriately accounted for the many potential sources of bias and generally followed current best practices.
Collapse
Affiliation(s)
- Daniel M Rotroff
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH.
| |
Collapse
|
36
|
Igolkina AA, Meshcheryakov G, Gretsova MV, Nuzhdin SV, Samsonova MG. Multi-trait multi-locus SEM model discriminates SNPs of different effects. BMC Genomics 2020; 21:490. [PMID: 32723302 PMCID: PMC7385891 DOI: 10.1186/s12864-020-06833-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 06/16/2020] [Indexed: 11/21/2022] Open
Abstract
Background There is a plethora of methods for genome-wide association studies. However, only a few of them may be classified as multi-trait and multi-locus, i.e. consider the influence of multiple genetic variants to several correlated phenotypes. Results We propose a multi-trait multi-locus model which employs structural equation modeling (SEM) to describe complex associations between SNPs and traits - multi-trait multi-locus SEM (mtmlSEM). The structure of our model makes it possible to discriminate pleiotropic and single-trait SNPs of direct and indirect effect. We also propose an automatic procedure to construct the model using factor analysis and the maximum likelihood method. For estimating a large number of parameters in the model, we performed Bayesian inference and implemented Gibbs sampling. An important feature of the model is that it correctly copes with non-normally distributed variables, such as some traits and variants. Conclusions We applied the model to Vavilov’s collection of 404 chickpea (Cicer arietinum L.) accessions with 20-fold cross-validation. We analyzed 16 phenotypic traits which we organized into five groups and found around 230 SNPs associated with traits, 60 of which were of pleiotropic effect. The model demonstrated high accuracy in predicting trait values.
Collapse
|
37
|
Luo L, Shen J, Zhang H, Chhibber A, Mehrotra DV, Tang ZZ. Multi-trait analysis of rare-variant association summary statistics using MTAR. Nat Commun 2020; 11:2850. [PMID: 32503972 PMCID: PMC7275056 DOI: 10.1038/s41467-020-16591-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Accepted: 05/09/2020] [Indexed: 12/13/2022] Open
Abstract
Integrating association evidence across multiple traits can improve the power of gene discovery and reveal pleiotropy. Most multi-trait analysis methods focus on individual common variants in genome-wide association studies. Here, we introduce multi-trait analysis of rare-variant associations (MTAR), a framework for joint analysis of association summary statistics between multiple rare variants and different traits. MTAR achieves substantial power gain by leveraging the genome-wide genetic correlation measure to inform the degree of gene-level effect heterogeneity across traits. We apply MTAR to rare-variant summary statistics for three lipid traits in the Global Lipids Genetics Consortium. 99 genome-wide significant genes were identified in the single-trait-based tests, and MTAR increases this to 139. Among the 11 novel lipid-associated genes discovered by MTAR, 7 are replicated in an independent UK Biobank GWAS analysis. Our study demonstrates that MTAR is substantially more powerful than single-trait-based tests and highlights the value of MTAR for novel gene discovery.
Collapse
Affiliation(s)
- Lan Luo
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, 53706, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, 07065, USA
| | - Hong Zhang
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, 07065, USA
| | - Aparna Chhibber
- Genetics and Pharmacogenomics, Merck & Co., Inc., West Point, Pennsylvania, 19446, USA
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, Pennsylvania, 19454, USA
| | - Zheng-Zheng Tang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, 53715, USA.
- Wisconsin Institute for Discovery, Madison, Wisconsin, 53715, USA.
| |
Collapse
|
38
|
Abstract
Radiogenomics, defined as the integrated analysis of radiologic imaging and genetic data, is a well-established tool shown to augment neuroimaging in the clinical diagnosis, prognostication, and scientific study of late-onset Alzheimer disease (LOAD). Early work using candidate single nucleotide polymorphisms (SNPs) identified genetic variation in APOE, BIN1, CLU, and CR1 as key modifiers of brain structure and function using magnetic resonance imaging (MRI). More recently, polygenic risk scores used in conjunction with MRI and positron emission tomography have shown great promise as a risk-stratification tool for clinical trials and care-management decisions. In addition, recent work using multimodal MRI and positron emission tomography as proxies of LOAD progression has identified novel risk variants that are enhancing our understanding of LOAD pathophysiology and progression. Herein, we highlight key studies and trends in the radiogenomics of LOAD over the past two decades and their implications for clinical practice and scientific research.
Collapse
|
39
|
Konigorski S, Yilmaz YE, Janke J, Bergmann MM, Boeing H, Pischon T. Powerful rare variant association testing in a copula-based joint analysis of multiple phenotypes. Genet Epidemiol 2019; 44:26-40. [PMID: 31732979 DOI: 10.1002/gepi.22265] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Revised: 08/13/2019] [Accepted: 09/16/2019] [Indexed: 12/16/2022]
Abstract
In genetic association studies of rare variants, the low power of association tests is one of the main challenges. In this study, we propose a new single-marker association test called C-JAMP (Copula-based Joint Analysis of Multiple Phenotypes), which is based on a joint model of multiple phenotypes given genetic markers and other covariates. We evaluated its performance and compared its empirical type I error and power with existing univariate and multivariate single-marker and multi-marker rare-variant tests in extensive simulation studies. C-JAMP yielded unbiased genetic effect estimates and valid type I errors with an adjusted test statistic. When strongly dependent traits were jointly analyzed, C-JAMP had the highest power in all scenarios except when a high percentage of variants were causal with moderate/small effect sizes. When traits with weak or moderate dependence were analyzed, whether C-JAMP or competing approaches had higher power depended on the effect size. When C-JAMP was applied with a misspecified copula function, it still achieved high power in some of the scenarios considered. In a real-data application, we analyzed sequencing data using C-JAMP and performed the first genome-wide association studies of high-molecular-weight and medium-molecular-weight adiponectin plasma concentrations. C-JAMP identified 20 rare variants with p-values smaller than 10-5 , while all other tests resulted in the identification of fewer variants with higher p-values. In summary, the results indicate that C-JAMP is a powerful, flexible, and robust method for association studies, and we identified novel candidate markers for adiponectin. C-JAMP is implemented as an R package and freely available from https://cran.r-project.org/package=CJAMP.
Collapse
Affiliation(s)
- Stefan Konigorski
- Molecular Epidemiology Research Group, Max Delbrück Center (MDC) for Molecular Medicine in the Helmholtz Association, Berlin, Germany.,Digital Health and Machine Learning Research Group, Hasso Plattner Institute for Digital Engineering, Potsdam, Germany
| | - Yildiz E Yilmaz
- Department of Mathematics and Statistics, Memorial University of Newfoundland, St. John's, NL, Canada.,Discipline of Genetics, Faculty of Medicine, Memorial University of Newfoundland, St. John's, NL, Canada.,Discipline of Medicine, Faculty of Medicine, Memorial University of Newfoundland, St. John's, NL, Canada
| | - Jürgen Janke
- Molecular Epidemiology Research Group, Max Delbrück Center (MDC) for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Manuela M Bergmann
- Department of Epidemiology, German Institute of Human Nutrition Potsdam-Rehbrücke (DIfE), Nuthetal, Germany
| | - Heiner Boeing
- Department of Epidemiology, German Institute of Human Nutrition Potsdam-Rehbrücke (DIfE), Nuthetal, Germany
| | - Tobias Pischon
- Molecular Epidemiology Research Group, Max Delbrück Center (MDC) for Molecular Medicine in the Helmholtz Association, Berlin, Germany.,Charité-Universitätsmedizin Berlin, Berlin, Germany.,DZHK (German Center for Cardiovascular Research), partner site Berlin, Berlin, Germany
| |
Collapse
|
40
|
Dutta D, Gagliano Taliun SA, Weinstock JS, Zawistowski M, Sidore C, Fritsche LG, Cucca F, Schlessinger D, Abecasis GR, Brummett CM, Lee S. Meta-MultiSKAT: Multiple phenotype meta-analysis for region-based association test. Genet Epidemiol 2019; 43:800-814. [PMID: 31433078 DOI: 10.1002/gepi.22248] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Accepted: 06/13/2019] [Indexed: 12/17/2022]
Abstract
The power of genetic association analyses can be increased by jointly meta-analyzing multiple correlated phenotypes. Here, we develop a meta-analysis framework, Meta-MultiSKAT, that uses summary statistics to test for association between multiple continuous phenotypes and variants in a region of interest. Our approach models the heterogeneity of effects between studies through a kernel matrix and performs a variance component test for association. Using a genotype kernel, our approach can test for rare-variants and the combined effects of both common and rare-variants. To achieve robust power, within Meta-MultiSKAT, we developed fast and accurate omnibus tests combining different models of genetic effects, functional genomic annotations, multiple correlated phenotypes, and heterogeneity across studies. In addition, Meta-MultiSKAT accommodates situations where studies do not share exactly the same set of phenotypes or have differing correlation patterns among the phenotypes. Simulation studies confirm that Meta-MultiSKAT can maintain the type-I error rate at the exome-wide level of 2.5 × 10-6 . Further simulations under different models of association show that Meta-MultiSKAT can improve the power of detection from 23% to 38% on average over single phenotype-based meta-analysis approaches. We demonstrate the utility and improved power of Meta-MultiSKAT in the meta-analyses of four white blood cell subtype traits from the Michigan Genomics Initiative (MGI) and SardiNIA studies.
Collapse
Affiliation(s)
- Diptavo Dutta
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan.,Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, Michigan
| | - Sarah A Gagliano Taliun
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan.,Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, Michigan
| | - Joshua S Weinstock
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan.,Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, Michigan
| | - Matthew Zawistowski
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan.,Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, Michigan
| | - Carlo Sidore
- Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche (CNR), Monserrato, Cagliari, Italy
| | - Lars G Fritsche
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan.,Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, Michigan
| | - Francesco Cucca
- Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche (CNR), Monserrato, Cagliari, Italy.,Dipartimento di Scienze Biomediche, Università degli Studi di Sassari, Sassari, Italy
| | - David Schlessinger
- Laboratory of Genetics, National Institute on Aging, US National Institutes of Health, Baltimore, Maryland
| | - Gonçalo R Abecasis
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan.,Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, Michigan
| | - Chad M Brummett
- Division of Pain Medicine, Department of Anesthesiology, University of Michigan Medical School, Ann Arbor, Michigan.,Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, Michigan
| | - Seunggeun Lee
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan.,Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
41
|
Weir BS. The Summer Institute in Statistical Genetics. Genetics 2019; 212:955-957. [PMID: 31405996 PMCID: PMC6707471 DOI: 10.1534/genetics.119.302506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 07/11/2019] [Indexed: 11/18/2022] Open
Abstract
The Elizabeth W. Jones Award for Excellence in Education recognizes an individual or group that has had significant, sustained impact on genetics education at any level, from K-12 through graduate school and beyond. Bruce Weir (University of Washington) is the 2019 recipient in recognition of his work training thousands of researchers in the rigorous use of statistical analysis methods for genetic and genomic data. His contributions fall into three categories: the acclaimed Summer Institute in Statistical Genetics, which has been held continuously for 23 years and has trained > 10,000 researchers worldwide; the popular graduate-level textbook Genetic Data Analysis; and the training of a growing number of forensic geneticists during the rise of DNA evidence in courts around the world.
Collapse
Affiliation(s)
- Bruce S Weir
- Department of Biostatistics, University of Washington, Seattle, Washington 98195
| |
Collapse
|
42
|
Yu Y, Xia L, Lee S, Zhou X, Stringham HM, Boehnke M, Mukherjee B. Subset-Based Analysis Using Gene-Environment Interactions for Discovery of Genetic Associations across Multiple Studies or Phenotypes. Hum Hered 2019; 83:283-314. [PMID: 31132756 DOI: 10.1159/000496867] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Accepted: 01/04/2019] [Indexed: 01/11/2023] Open
Abstract
OBJECTIVES Classical methods for combining summary data from genome-wide association studies only use marginal genetic effects, and power can be compromised in the presence of heterogeneity. We aim to enhance the discovery of novel associated loci in the presence of heterogeneity of genetic effects in subgroups defined by an environmental factor. METHODS We present a pvalue-assisted subset testing for associations (pASTA) framework that generalizes the previously proposed association analysis based on subsets (ASSET) method by incorporating gene-environment (G-E) interactions into the testing procedure. We conduct simulation studies and provide two data examples. RESULTS Simulation studies show that our proposal is more powerful than methods based on marginal associations in the presence of G-E interactions and maintains comparable power even in their absence. Both data examples demonstrate that our method can increase power to detect overall genetic associations and identify novel studies/phenotypes that contribute to the association. CONCLUSIONS Our proposed method can be a useful screening tool to identify candidate single nucleotide polymorphisms that are potentially associated with the trait(s) of interest for further validation. It also allows researchers to determine the most probable subset of traits that exhibit genetic associations in addition to the enhancement of power.
Collapse
Affiliation(s)
- Youfei Yu
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Lu Xia
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Seunggeun Lee
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, USA
| | - Heather M Stringham
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, USA
| | - Michael Boehnke
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, USA
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA, .,Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, USA,
| |
Collapse
|