1
|
Kontou PI, Bagos PG. The goldmine of GWAS summary statistics: a systematic review of methods and tools. BioData Min 2024; 17:31. [PMID: 39238044 PMCID: PMC11375927 DOI: 10.1186/s13040-024-00385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 08/27/2024] [Indexed: 09/07/2024] Open
Abstract
Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.
Collapse
Affiliation(s)
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece.
| |
Collapse
|
2
|
Schipper M, Ulirsch J, Posthuma D, Ripke S, Heilbron K. Simplifying causal gene identification in GWAS loci. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.07.26.24311057. [PMID: 39132490 PMCID: PMC11312651 DOI: 10.1101/2024.07.26.24311057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
Genome-wide association studies (GWAS) help to identify disease-linked genetic variants, but pinpointing the most likely causal genes in GWAS loci remains challenging. Existing GWAS gene prioritization tools are powerful, but often use complex black box models trained on datasets containing unaddressed biases. Here we present CALDERA, a gene prioritization tool that achieves similar or better performance than state-of-the-art methods, but uses just 12 features and a simple logistic regression model with L1 regularization. We use a data-driven approach to construct a truth set of causal genes in 406 GWAS loci and correct for potential confounders. We demonstrate that CALDERA is well-calibrated in external datasets and prioritizes genes with expected properties, such as being mutation-intolerant (OR = 1.751 for pLI > 90%, P = 8.45×10-3). CALDERA facilitates the prioritization of potentially causal genes in GWAS loci and may help identify novel genetics-driven drug targets.
Collapse
Affiliation(s)
- Marijn Schipper
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Jacob Ulirsch
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Danielle Posthuma
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Department of Child and Adolescent Psychiatry and Pediatric Psychology, Section Complex Trait Genetics, Amsterdam Neuroscience, Vrije Universiteit Medical Center, Amsterdam, The Netherlands
| | - Stephan Ripke
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Department of Psychiatry and Psychotherapy, Charité – Universitätsmedizin Berlin, Berlin, Germany
- German Center for Mental Health (DZPG), partner site Berlin/Potsdam, Berlin, Germany
| | - Karl Heilbron
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Department of Psychiatry and Psychotherapy, Charité – Universitätsmedizin Berlin, Berlin, Germany
- German Center for Mental Health (DZPG), partner site Berlin/Potsdam, Berlin, Germany
| |
Collapse
|
3
|
Kojima S. Investigating mobile element variations by statistical genetics. Hum Genome Var 2024; 11:23. [PMID: 38816353 PMCID: PMC11140006 DOI: 10.1038/s41439-024-00280-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/17/2024] [Accepted: 04/24/2024] [Indexed: 06/01/2024] Open
Abstract
The integration of structural variations (SVs) in statistical genetics provides an opportunity to understand the genetic factors influencing complex human traits and disease. Recent advances in long-read technology and variant calling methods for short reads have improved the accurate discovery and genotyping of SVs, enabling their use in expression quantitative trait loci (eQTL) analysis and genome-wide association studies (GWAS). Mobile elements are DNA sequences that insert themselves into various genome locations. Insertional polymorphisms of mobile elements between humans, called mobile element variations (MEVs), contribute to approximately 25% of human SVs. We recently developed a variant caller that can accurately identify and genotype MEVs from biobank-scale short-read whole-genome sequencing (WGS) datasets and integrate them into statistical genetics. The use of MEVs in eQTL analysis and GWAS has a minimal impact on the discovery of genome loci associated with gene expression and disease; most disease-associated haplotypes can be identified by single nucleotide variations (SNVs). On the other hand, it helps make hypotheses about causal variants or effector variants. Focusing on MEVs, we identified multiple MEVs that contribute to differential gene expression and one of them is a potential cause of skin disease, emphasizing the importance of the integration of MEVs in medical genetics. Here, I will provide an overview of MEVs, MEV calling from WGS, and the integration of MEVs in statistical genetics. Finally, I will discuss the unanswered questions about MEVs, such as rare variants.
Collapse
Affiliation(s)
- Shohei Kojima
- Genome Immunobiology RIKEN Hakubi Research Team, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan.
| |
Collapse
|
4
|
Kunkel D, Sørensen P, Shankar V, Morgante F. Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.06.592745. [PMID: 38766136 PMCID: PMC11100663 DOI: 10.1101/2024.05.06.592745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Polygenic prediction of complex trait phenotypes has become important in human genetics, especially in the context of precision medicine. Recently, Morgante et al. introduced mr.mash, a flexible and computationally efficient method that models multiple phenotypes jointly and leverages sharing of effects across such phenotypes to improve prediction accuracy. However, a drawback of mr.mash is that it requires individual-level data, which are often not publicly available. In this work, we introduce mr.mash-rss, an extension of the mr.mash model that requires only summary statistics from Genome-Wide Association Studies (GWAS) and linkage disequilibrium (LD) estimates from a reference panel. By using summary data, we achieve the twin goal of increasing the applicability of the mr.mash model to data sets that are not publicly available and making it scalable to biobank-size data. Through simulations, we show that mr.mash-rss is competitive with, and often outperforms, current state-of-the-art methods for single- and multi-phenotype polygenic prediction in a variety of scenarios that differ in the pattern of effect sharing across phenotypes, the number of phenotypes, the number of causal variants, and the genomic heritability. We also present a real data analysis of 16 blood cell phenotypes in UK Biobank, showing that mr.mash-rss achieves higher prediction accuracy than competing methods for the majority of traits, especially when the data has smaller sample size.
Collapse
Affiliation(s)
- Deborah Kunkel
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, SC, United States of America
| | - Peter Sørensen
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Vijay Shankar
- Center for Human Genetics, Clemson University, Greenwood, SC, United States of America
| | - Fabio Morgante
- Center for Human Genetics, Clemson University, Greenwood, SC, United States of America
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, United States of America
| |
Collapse
|
5
|
Zheng Z, Liu S, Sidorenko J, Wang Y, Lin T, Yengo L, Turley P, Ani A, Wang R, Nolte IM, Snieder H, Yang J, Wray NR, Goddard ME, Visscher PM, Zeng J. Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries. Nat Genet 2024; 56:767-777. [PMID: 38689000 PMCID: PMC11096109 DOI: 10.1038/s41588-024-01704-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Accepted: 03/05/2024] [Indexed: 05/02/2024]
Abstract
We develop a method, SBayesRC, that integrates genome-wide association study (GWAS) summary statistics with functional genomic annotations to improve polygenic prediction of complex traits. Our method is scalable to whole-genome variant analysis and refines signals from functional annotations by allowing them to affect both causal variant probability and causal effect distribution. We analyze 50 complex traits and diseases using ∼7 million common single-nucleotide polymorphisms (SNPs) and 96 annotations. SBayesRC improves prediction accuracy by 14% in European ancestry and up to 34% in cross-ancestry prediction compared to the baseline method SBayesR, which does not use annotations, and outperforms other methods, including LDpred2, LDpred-funct, MegaPRS, PolyPred-S and PRS-CSx. Investigation of factors affecting prediction accuracy identifies a significant interaction between SNP density and annotation information, suggesting whole-genome sequence variants with annotations may further improve prediction. Functional partitioning analysis highlights a major contribution of evolutionary constrained regions to prediction accuracy and the largest per-SNP contribution from nonsynonymous SNPs.
Collapse
Affiliation(s)
- Zhili Zheng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia.
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| | - Shouye Liu
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Julia Sidorenko
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Ying Wang
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Tian Lin
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Loic Yengo
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Patrick Turley
- Center for Economic and Social Research, University of Southern California, Los Angeles, CA, USA
- Department of Economics, University of Southern California, Los Angeles, CA, USA
| | - Alireza Ani
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
- Department of Bioinformatics, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Rujia Wang
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Ilja M Nolte
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Harold Snieder
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Jian Yang
- School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, China
| | - Naomi R Wray
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
- Department of Psychiatry, University of Oxford, Oxford, UK
| | - Michael E Goddard
- Faculty of Veterinary and Agricultural Science, University of Melbourne, Parkville, Victoria, Australia
- Biosciences Research Division, Department of Economic Development, Jobs, Transport and Resources, Bundoora, Victoria, Australia
| | - Peter M Visscher
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Jian Zeng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia.
| |
Collapse
|
6
|
Conery M, Pippin JA, Wagley Y, Trang K, Pahl MC, Villani DA, Favazzo LJ, Ackert-Bicknell CL, Zuscik MJ, Katsevich E, Wells AD, Zemel BS, Voight BF, Hankenson KD, Chesi A, Grant SF. GWAS-informed data integration and non-coding CRISPRi screen illuminate genetic etiology of bone mineral density. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.19.585778. [PMID: 38562830 PMCID: PMC10983984 DOI: 10.1101/2024.03.19.585778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Over 1,100 independent signals have been identified with genome-wide association studies (GWAS) for bone mineral density (BMD), a key risk factor for mortality-increasing fragility fractures; however, the effector gene(s) for most remain unknown. Informed by a variant-to-gene mapping strategy implicating 89 non-coding elements predicted to regulate osteoblast gene expression at BMD GWAS loci, we executed a single-cell CRISPRi screen in human fetal osteoblast 1.19 cells (hFOBs). The BMD relevance of hFOBs was supported by heritability enrichment from cross-cell type stratified LD-score regression involving 98 cell types grouped into 15 tissues. 24 genes showed perturbation in the screen, with four (ARID5B, CC2D1B, EIF4G2, and NCOA3) exhibiting consistent effects upon siRNA knockdown on three measures of osteoblast maturation and mineralization. Lastly, additional heritability enrichments, genetic correlations, and multi-trait fine-mapping revealed that many BMD GWAS signals are pleiotropic and likely mediate their effects via non-bone tissues that warrant attention in future screens.
Collapse
Affiliation(s)
- Mitchell Conery
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - James A. Pippin
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yadav Wagley
- Department of Orthopaedic Surgery, University of Michigan Medical School, Ann Arbor, MI 48109
| | - Khanh Trang
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Matthew C. Pahl
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - David A. Villani
- Colorado Program for Musculoskeletal Research, University of Colorado Anschutz Medical Campus, Aurora, CO
- Cell Biology, Stems Cells and Development Ph.D. Program, University of Colorado Anschutz Medical Campus, Aurora, CO
| | - Lacey J. Favazzo
- Colorado Program for Musculoskeletal Research, University of Colorado Anschutz Medical Campus, Aurora, CO
- Department of Orthopedics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States
- University of Colorado Interdisciplinary Joint Biology Program, University of Colorado Anschutz Medical Campus, Aurora, CO
| | - Cheryl L. Ackert-Bicknell
- Colorado Program for Musculoskeletal Research, University of Colorado Anschutz Medical Campus, Aurora, CO
- Department of Orthopedics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States
- University of Colorado Interdisciplinary Joint Biology Program, University of Colorado Anschutz Medical Campus, Aurora, CO
| | - Michael J. Zuscik
- Colorado Program for Musculoskeletal Research, University of Colorado Anschutz Medical Campus, Aurora, CO
- Department of Orthopedics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States
- University of Colorado Interdisciplinary Joint Biology Program, University of Colorado Anschutz Medical Campus, Aurora, CO
| | - Eugene Katsevich
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Andrew D. Wells
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Babette S. Zemel
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Division of Gastroenterology, Hepatology and Nutrition, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Benjamin F. Voight
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute of Diabetes, Obesity and Metabolism, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Kurt D. Hankenson
- Department of Orthopaedic Surgery, University of Michigan Medical School, Ann Arbor, MI 48109
| | - Alessandra Chesi
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Struan F.A. Grant
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute of Diabetes, Obesity and Metabolism, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Division of Endocrinology and Diabetes, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| |
Collapse
|
7
|
Meng X, Navoly G, Giannakopoulou O, Levey DF, Koller D, Pathak GA, Koen N, Lin K, Adams MJ, Rentería ME, Feng Y, Gaziano JM, Stein DJ, Zar HJ, Campbell ML, van Heel DA, Trivedi B, Finer S, McQuillin A, Bass N, Chundru VK, Martin HC, Huang QQ, Valkovskaya M, Chu CY, Kanjira S, Kuo PH, Chen HC, Tsai SJ, Liu YL, Kendler KS, Peterson RE, Cai N, Fang Y, Sen S, Scott LJ, Burmeister M, Loos RJF, Preuss MH, Actkins KV, Davis LK, Uddin M, Wani AH, Wildman DE, Aiello AE, Ursano RJ, Kessler RC, Kanai M, Okada Y, Sakaue S, Rabinowitz JA, Maher BS, Uhl G, Eaton W, Cruz-Fuentes CS, Martinez-Levy GA, Campos AI, Millwood IY, Chen Z, Li L, Wassertheil-Smoller S, Jiang Y, Tian C, Martin NG, Mitchell BL, Byrne EM, Awasthi S, Coleman JRI, Ripke S, Sofer T, Walters RG, McIntosh AM, Polimanti R, Dunn EC, Stein MB, Gelernter J, Lewis CM, Kuchenbaecker K. Multi-ancestry genome-wide association study of major depression aids locus discovery, fine mapping, gene prioritization and causal inference. Nat Genet 2024; 56:222-233. [PMID: 38177345 PMCID: PMC10864182 DOI: 10.1038/s41588-023-01596-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 10/26/2023] [Indexed: 01/06/2024]
Abstract
Most genome-wide association studies (GWAS) of major depression (MD) have been conducted in samples of European ancestry. Here we report a multi-ancestry GWAS of MD, adding data from 21 cohorts with 88,316 MD cases and 902,757 controls to previously reported data. This analysis used a range of measures to define MD and included samples of African (36% of effective sample size), East Asian (26%) and South Asian (6%) ancestry and Hispanic/Latin American participants (32%). The multi-ancestry GWAS identified 53 significantly associated novel loci. For loci from GWAS in European ancestry samples, fewer than expected were transferable to other ancestry groups. Fine mapping benefited from additional sample diversity. A transcriptome-wide association study identified 205 significantly associated novel genes. These findings suggest that, for MD, increasing ancestral and global diversity in genetic studies may be particularly important to ensure discovery of core genes and inform about transferability of findings.
Collapse
Affiliation(s)
| | | | | | - Daniel F Levey
- Department of Psychiatry, VA CT Healthcare Center, West Haven, CT, USA
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
| | - Dora Koller
- Department of Psychiatry, VA CT Healthcare Center, West Haven, CT, USA
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
- Department of Genetics, Microbiology and Statistics, University of Barcelona, Barcelona, Spain
| | - Gita A Pathak
- Department of Psychiatry, VA CT Healthcare Center, West Haven, CT, USA
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
| | - Nastassja Koen
- SAMRC Unit on Risk and Resilience in Mental Disorders, Department of Psychiatry and Neuroscience Institute, University of Cape Town, Cape Town, South Africa
| | - Kuang Lin
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Mark J Adams
- Division of Psychiatry, University of Edinburgh, Edinburgh, UK
| | - Miguel E Rentería
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | | | - J Michael Gaziano
- Department of Medicine, VA Boston Healthcare System, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Dan J Stein
- SAMRC Unit on Risk and Resilience in Mental Disorders, Department of Psychiatry and Neuroscience Institute, University of Cape Town, Cape Town, South Africa
| | - Heather J Zar
- SAMRC Unit on Child and Adolescent Health, Department of Paediatrics and Child Health, University of Cape Town, Cape Town, South Africa
| | - Megan L Campbell
- Department of Psychiatry and Neuroscience Institute, University of Cape Town, Cape Town, South Africa
| | | | - Bhavi Trivedi
- Blizard Institute, Queen Mary University of London, London, UK
| | - Sarah Finer
- Wolfson Institute of Population Health, Queen Mary University of London, London, UK
| | | | - Nick Bass
- Division of Psychiatry, UCL, London, UK
| | | | | | | | | | | | - Susan Kanjira
- Division of Psychiatry, University of Edinburgh, Edinburgh, UK
| | - Po-Hsiu Kuo
- Department of Public Health and Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
- Department of Psychiatry, National Taiwan University Hospital, Taipei, Taiwan
| | - Hsi-Chung Chen
- Department of Psychiatry, National Taiwan University Hospital, Taipei, Taiwan
- Center of Sleep Disorders, National Taiwan University Hospital, Taipei, Taiwan
| | - Shih-Jen Tsai
- Institute of Brain Science and Division of Psychiatry, National Yang-Ming Chiao Tung University, Taipei, Taiwan
- Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan
| | - Yu-Li Liu
- Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli County, Taiwan
| | | | - Roseann E Peterson
- Department of Psychiatry, VCU, Richmond, VA, USA
- Department of Psychiatry, SUNY Downstate Health Sciences University, Brooklyn, NY, USA
| | - Na Cai
- Helmholtz Pioneer Campus, Helmholtz Munich, Neuherberg, Germany
- Computational Health Centre, Helmholtz Munich, Neuherberg, Germany
- Department of Medicine, Technical University of Munich, Munich, Germany
| | - Yu Fang
- Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
| | - Srijan Sen
- Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - Laura J Scott
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Margit Burmeister
- Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Ruth J F Loos
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Michael H Preuss
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ky'Era V Actkins
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Lea K Davis
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Monica Uddin
- College of Public Health, University of South Florida, Tampa, FL, USA
| | - Agaz H Wani
- College of Public Health, University of South Florida, Tampa, FL, USA
| | - Derek E Wildman
- Genomics Program, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Allison E Aiello
- Robert N. Butler Columbia Aging Center, Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Robert J Ursano
- Department of Psychiatry, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| | - Ronald C Kessler
- Department of Health Care Policy, Harvard Medical School, Boston, MA, USA
| | - Masahiro Kanai
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Osaka, Japan
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Osaka, Japan
- Department of Genome Informatics, Graduate School of Medicine, University of Tokyo, Tokyo, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Saori Sakaue
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Osaka, Japan
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Jill A Rabinowitz
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Brion S Maher
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - George Uhl
- Neurology and Pharmacology, University of Maryland, Maryland VA Healthcare System, Baltimore, MD, USA
| | - William Eaton
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Carlos S Cruz-Fuentes
- Departamento de Genética, Instituto Nacional de Psiquiatría 'Ramón de la Fuente Muñíz', Mexico City, Mexico
| | - Gabriela A Martinez-Levy
- Departamento de Genética, Instituto Nacional de Psiquiatría 'Ramón de la Fuente Muñíz', Mexico City, Mexico
| | - Adrian I Campos
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Iona Y Millwood
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, University of Oxford, Oxford, UK
| | - Zhengming Chen
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, University of Oxford, Oxford, UK
| | - Liming Li
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China
- Peking University Center for Public Health and Epidemic Preparedness and Response, Peking University, Beijing, China
- Key Laboratory of Epidemiology of Major Diseases (Peking University), Ministry of Education, Beijing, China
| | | | - Yunxuan Jiang
- Department of Biostatistics, Emory University, Atlanta, GA, USA
- 23andMe, Inc., Mountain View, CA, USA
| | - Chao Tian
- 23andMe, Inc., Mountain View, CA, USA
| | - Nicholas G Martin
- Mental Health and Neuroscience Research Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Brittany L Mitchell
- Mental Health and Neuroscience Research Program, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Enda M Byrne
- Child Health Research Centre, The University of Queensland, Brisbane, Queensland, Australia
| | - Swapnil Awasthi
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Psychiatry and Psychotherapy, Charité - Universitätsmedizin, Berlin, Germany
| | - Jonathan R I Coleman
- Social Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Stephan Ripke
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Psychiatry and Psychotherapy, Charité - Universitätsmedizin, Berlin, Germany
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Cambridge, MA, USA
| | - Tamar Sofer
- Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Robin G Walters
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, University of Oxford, Oxford, UK
| | - Andrew M McIntosh
- Division of Psychiatry, University of Edinburgh, Edinburgh, UK
- Institute for Genomics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Renato Polimanti
- Department of Psychiatry, VA CT Healthcare Center, West Haven, CT, USA
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
- VA Connecticut Healthcare Center, West Haven, CT, USA
| | - Erin C Dunn
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit (PNGU), Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA, USA
| | - Murray B Stein
- Department of Psychiatry, UC San Diego School of Medicine, La Jolla, CA, USA
- Herbert Wertheim School of Public Health and Human Longevity, University of California San Diego, La Jolla, CA, USA
- Psychiatry Service, VA San Diego Healthcare System, San Diego, CA, USA
| | - Joel Gelernter
- Department of Psychiatry, VA CT Healthcare Center, West Haven, CT, USA
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
| | - Cathryn M Lewis
- Social Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- Department of Medical and Molecular Genetics, King's College London, London, UK
| | | |
Collapse
|
8
|
Zilinskas R, Li C, Shen X, Pan W, Yang T. Inferring a directed acyclic graph of phenotypes from GWAS summary statistics. Biometrics 2024; 80:ujad039. [PMID: 38470257 PMCID: PMC10928990 DOI: 10.1093/biomtc/ujad039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 11/24/2023] [Accepted: 01/04/2024] [Indexed: 03/13/2024]
Abstract
Estimating phenotype networks is a growing field in computational biology. It deepens the understanding of disease etiology and is useful in many applications. In this study, we present a method that constructs a phenotype network by assuming a Gaussian linear structure model embedding a directed acyclic graph (DAG). We utilize genetic variants as instrumental variables and show how our method only requires access to summary statistics from a genome-wide association study (GWAS) and a reference panel of genotype data. Besides estimation, a distinct feature of the method is its summary statistics-based likelihood ratio test on directed edges. We applied our method to estimate a causal network of 29 cardiovascular-related proteins and linked the estimated network to Alzheimer's disease (AD). A simulation study was conducted to demonstrate the effectiveness of this method. An R package sumdag implementing the proposed method, all relevant code, and a Shiny application are available.
Collapse
Affiliation(s)
| | - Chunlin Li
- Department of Statistics, Iowa State University, Ames, IA 50011, United States
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Wei Pan
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, MN 55455, United States
| | - Tianzhong Yang
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, MN 55455, United States
| |
Collapse
|
9
|
Sarsani V, Brotman SM, Xianyong Y, Fernandes Silva L, Laakso M, Spracklen CN. A cross-ancestry genome-wide meta-analysis, fine-mapping, and gene prioritization approach to characterize the genetic architecture of adiponectin. HGG ADVANCES 2024; 5:100252. [PMID: 37859345 PMCID: PMC10652123 DOI: 10.1016/j.xhgg.2023.100252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 10/16/2023] [Accepted: 10/16/2023] [Indexed: 10/21/2023] Open
Abstract
Previous genome-wide association studies (GWASs) for adiponectin, a complex trait linked to type 2 diabetes and obesity, identified >20 associated loci. However, most loci were identified in populations of European ancestry, and many of the target genes underlying the associations remain unknown. We conducted a cross-ancestry adiponectin GWAS meta-analysis in ≤46,434 individuals from the Metabolic Syndrome in Men (METSIM) cohort and the ADIPOGen and AGEN consortiums. We combined study-specific association summary statistics using a fixed-effects, inverse variance-weighted approach. We identified 22 loci associated with adiponectin (p < 5×10-8), including 15 known and seven previously unreported loci. Among individuals of European ancestry, Genome-wide Complex Traits Analysis joint conditional analysis (GCTA-COJO) identified 14 additional distinct signals at the ADIPOQ, CDH13, HCAR1, and ZNF664 loci. Leveraging the cross-ancestry data, FINEMAP + SuSiE identified 45 causal variants (PP > 0.9), which also exhibited potential pleiotropy for cardiometabolic traits. To prioritize target genes at associated loci, we propose a combinatorial likelihood scoring formalism (Gene Priority Score [GPScore]) based on measures derived from 11 gene prioritization strategies and the physical distance to the transcription start site. With GPScore, we prioritize the 30 most probable target genes underlying the adiponectin-associated variants in the cross-ancestry analysis, including well-known causal genes (e.g., ADIPOQ, CDH13) and additional genes (e.g., CSF1, RGS17). Functional association networks revealed complex interactions of prioritized genes, their functionally connected genes, and their underlying pathways centered around insulin and adiponectin signaling, indicating an essential role in regulating energy balance in the body, inflammation, coagulation, fibrinolysis, insulin resistance, and diabetes. Overall, our analyses identify and characterize adiponectin association signals and inform experimental interrogation of target genes for adiponectin.
Collapse
Affiliation(s)
- Vishal Sarsani
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst, MA, USA
| | - Sarah M Brotman
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Yin Xianyong
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Lillian Fernandes Silva
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland, Kuopio, Finland
| | - Markku Laakso
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland, Kuopio, Finland
| | - Cassandra N Spracklen
- Department of Biostatistics and Epidemiology, University of Massachusetts Amherst, Amherst, MA, USA.
| |
Collapse
|
10
|
Cao R, Olawsky E, McFowland E, Marcotte E, Spector L, Yang T. Subset scanning for multi-trait analysis using GWAS summary statistics. Bioinformatics 2024; 40:btad777. [PMID: 38191683 PMCID: PMC11087659 DOI: 10.1093/bioinformatics/btad777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 11/23/2023] [Accepted: 01/05/2024] [Indexed: 01/10/2024] Open
Abstract
MOTIVATION Multi-trait analysis has been shown to have greater statistical power than single-trait analysis. Most of the existing multi-trait analysis methods only work with a limited number of traits and usually prioritize high statistical power over identifying relevant traits, which heavily rely on domain knowledge. RESULTS To handle diseases and traits with obscure etiology, we developed TraitScan, a powerful and fast algorithm that identifies potential pleiotropic traits from a moderate or large number of traits (e.g. dozens to thousands) and tests the association between one genetic variant and the selected traits. TraitScan can handle either individual-level or summary-level GWAS data. We evaluated TraitScan using extensive simulations and found that it outperformed existing methods in terms of both testing power and trait selection when sparsity was low or modest. We then applied it to search for traits associated with Ewing Sarcoma, a rare bone tumor with peak onset in adolescence, among 754 traits in UK Biobank. Our analysis revealed a few promising traits worthy of further investigation, highlighting the use of TraitScan for more effective multi-trait analysis as biobanks emerge. We also extended TraitScan to search and test association with a polygenic risk score and genetically imputed gene expression. AVAILABILITY AND IMPLEMENTATION Our algorithm is implemented in an R package "TraitScan" available at https://github.com/RuiCao34/TraitScan.
Collapse
Affiliation(s)
- Rui Cao
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, United States
| | - Evan Olawsky
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, United States
| | - Edward McFowland
- Technology and Operations Management, Harvard Business School, Harvard University, Boston, MA 02163, United States
| | - Erin Marcotte
- Division of Epidemiology and Clinical Research, Department of Pediatrics, University of Minnesota, Minneapolis, MN 55454, United States
| | - Logan Spector
- Division of Epidemiology and Clinical Research, Department of Pediatrics, University of Minnesota, Minneapolis, MN 55454, United States
| | - Tianzhong Yang
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, United States
- Division of Epidemiology and Clinical Research, Department of Pediatrics, University of Minnesota, Minneapolis, MN 55454, United States
| |
Collapse
|
11
|
Zilinskas R, Li C, Shen X, Pan W, Yang T. Inferring a directed acyclic graph of phenotypes from GWAS summary statistics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.10.528092. [PMID: 38045347 PMCID: PMC10690198 DOI: 10.1101/2023.02.10.528092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Estimating phenotype networks is a growing field in computational biology. It deepens the understanding of disease etiology and is useful in many applications. In this study, we present a method that constructs a phenotype network by assuming a Gaussian linear structure model embedding a directed acyclic graph (DAG). We utilize genetic variants as instrumental variables and show how our method only requires access to summary statistics from a genome-wide association study (GWAS) and a reference panel of genotype data. Besides estimation, a distinct feature of the method is its summary statistics-based likelihood ratio test on directed edges. We applied our method to estimate a causal network of 29 cardiovascular-related proteins and linked the estimated network to Alzheimer's disease (AD). A simulation study was conducted to demonstrate the effectiveness of this method. An R package sumdag implementing the proposed method, all relevant code, and a Shiny application are available at https://github.com/chunlinli/sumdag.
Collapse
Affiliation(s)
| | - Chunlin Li
- Department of Statistics, Iowa State University, Ames, Iowa 50011, U.S.A
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, Minnesota 55455, U.S.A
| | - Wei Pan
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota 55455, U.S.A
| | - Tianzhong Yang
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota 55455, U.S.A
| |
Collapse
|
12
|
Cai M, Wang Z, Xiao J, Hu X, Chen G, Yang C. XMAP: Cross-population fine-mapping by leveraging genetic diversity and accounting for confounding bias. Nat Commun 2023; 14:6870. [PMID: 37898663 PMCID: PMC10613261 DOI: 10.1038/s41467-023-42614-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Accepted: 10/17/2023] [Indexed: 10/30/2023] Open
Abstract
Fine-mapping prioritizes risk variants identified by genome-wide association studies (GWASs), serving as a critical step to uncover biological mechanisms underlying complex traits. However, several major challenges still remain for existing fine-mapping methods. First, the strong linkage disequilibrium among variants can limit the statistical power and resolution of fine-mapping. Second, it is computationally expensive to simultaneously search for multiple causal variants. Third, the confounding bias hidden in GWAS summary statistics can produce spurious signals. To address these challenges, we develop a statistical method for cross-population fine-mapping (XMAP) by leveraging genetic diversity and accounting for confounding bias. By using cross-population GWAS summary statistics from global biobanks and genomic consortia, we show that XMAP can achieve greater statistical power, better control of false positive rate, and substantially higher computational efficiency for identifying multiple causal signals, compared to existing methods. Importantly, we show that the output of XMAP can be integrated with single-cell datasets, which greatly improves the interpretation of putative causal variants in their cellular context at single-cell resolution.
Collapse
Affiliation(s)
- Mingxuan Cai
- Department of Biostatistics, City University of Hong Kong, Hong Kong SAR, China.
| | - Zhiwei Wang
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou, 511458, China
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Jiashun Xiao
- Shenzhen Research Institute of Big Data, Shenzhen, 518172, China
| | - Xianghong Hu
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou, 511458, China
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Gang Chen
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- WeGene, Shenzhen Zaozhidao Technology Co., Ltd, Shenzhen, 518040, China
- Graduate Affairs, Faculty of Medicine, Chulalongkorn University, 10330, Bangkok, Thailand
| | - Can Yang
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou, 511458, China.
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
| |
Collapse
|
13
|
Chepelev I, Harley IT, Harley JB. Modeling of horizontal pleiotropy identifies possible causal gene expression in systemic lupus erythematosus. FRONTIERS IN LUPUS 2023; 1:1234578. [PMID: 37799268 PMCID: PMC10554754 DOI: 10.3389/flupu.2023.1234578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/07/2023]
Abstract
Background Systemic lupus erythematosus (SLE) is a chronic autoimmune condition with complex causes involving genetic and environmental factors. While genome-wide association studies (GWASs) have identified genetic loci associated with SLE, the functional genomic elements responsible for disease development remain largely unknown. Mendelian Randomization (MR) is an instrumental variable approach to causal inference based on data from observational studies, where genetic variants are employed as instrumental variables (IVs). Methods This study utilized a two-step strategy to identify causal genes for SLE. In the first step, the classical MR method was employed, assuming the absence of horizontal pleiotropy, to estimate the causal effect of gene expression on SLE. In the second step, advanced probabilistic MR methods (PMR-Egger, MRAID, and MR-MtRobin) were applied to the genes identified in the first step, considering horizontal pleiotropy, to filter out false positives. PMR-Egger and MRAID analyses utilized whole blood expression quantitative trait loci (eQTL) and SLE GWAS summary data, while MR-MtRobin analysis used an independent eQTL dataset from multiple immune cell types along with the same SLE GWAS data. Results The initial MR analysis identified 142 genes, including 43 outside of chromosome 6. Subsequently, applying the advanced MR methods reduced the number of genes with significant causal effects on SLE to 66. PMR-Egger, MRAID, and MR-MtRobin, respectively, identified 13, 7, and 16 non-chromosome 6 genes with significant causal effects. All methods identified expression of PHRF1 gene as causal for SLE. A comprehensive literature review was conducted to enhance understanding of the functional roles and mechanisms of the identified genes in SLE development. Conclusions The findings from the three MR methods exhibited overlapping genes with causal effects on SLE, demonstrating consistent results. However, each method also uncovered unique genes due to different modelling assumptions and technical factors, highlighting the complementary nature of the approaches. Importantly, MRAID demonstrated a reduced percentage of causal genes from the Major Histocompatibility complex (MHC) region on chromosome 6, indicating its potential in minimizing false positive findings. This study contributes to unraveling the mechanisms underlying SLE by employing advanced probabilistic MR methods to identify causal genes, thereby enhancing our understanding of SLE pathogenesis.
Collapse
Affiliation(s)
- Iouri Chepelev
- Research Service, US Department of Veterans Affairs Medical Center, Cincinnati, OH, United States
- Cincinnati Education and Research for Veterans Foundation, Cincinnati, OH, United States
| | - Isaac T.W. Harley
- US Department of Veterans Affairs Medical Center, Aurora, CO, United States
- Department of Immunology and Microbiology, University of Colorado School of Medicine, Aurora, CO, United States
- Division of Rheumatology, Department of Medicine, University of Colorado School of Medicine, Aurora, CO, United States
| | - John B. Harley
- Research Service, US Department of Veterans Affairs Medical Center, Cincinnati, OH, United States
- Cincinnati Education and Research for Veterans Foundation, Cincinnati, OH, United States
| |
Collapse
|
14
|
Anwar MY, Graff M, Highland HM, Smit R, Wang Z, Buchanan VL, Young KL, Kenny EE, Fernandez-Rhodes L, Liu S, Assimes T, Garcia DO, Daeeun K, Gignoux CR, Justice AE, Haiman CA, Buyske S, Peters U, Loos RJF, Kooperberg C, North KE. Assessing efficiency of fine-mapping obesity-associated variants through leveraging ancestry architecture and functional annotation using PAGE and UKBB cohorts. Hum Genet 2023; 142:1477-1489. [PMID: 37658231 DOI: 10.1007/s00439-023-02593-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 08/10/2023] [Indexed: 09/03/2023]
Abstract
Inadequate representation of non-European ancestry populations in genome-wide association studies (GWAS) has limited opportunities to isolate functional variants. Fine-mapping in multi-ancestry populations should improve the efficiency of prioritizing variants for functional interrogation. To evaluate this hypothesis, we leveraged ancestry architecture to perform comparative GWAS and fine-mapping of obesity-related phenotypes in European ancestry populations from the UK Biobank (UKBB) and multi-ancestry samples from the Population Architecture for Genetic Epidemiology (PAGE) consortium with comparable sample sizes. In the investigated regions with genome-wide significant associations for obesity-related traits, fine-mapping in our ancestrally diverse sample led to 95% and 99% credible sets (CS) with fewer variants than in the European ancestry sample. Lead fine-mapped variants in PAGE regions had higher average coding scores, and higher average posterior probabilities for causality compared to UKBB. Importantly, 99% CS in PAGE loci contained strong expression quantitative trait loci (eQTLs) in adipose tissues or harbored more variants in tighter linkage disequilibrium (LD) with eQTLs. Leveraging ancestrally diverse populations with heterogeneous ancestry architectures, coupled with functional annotation, increased fine-mapping efficiency and performance, and reduced the set of candidate variants for consideration for future functional studies. Significant overlap in genetic causal variants across populations suggests generalizability of genetic mechanisms underpinning obesity-related traits across populations.
Collapse
Affiliation(s)
- Mohammad Yaser Anwar
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| | - Mariaelisa Graff
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Heather M Highland
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Roelof Smit
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Zhe Wang
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Victoria L Buchanan
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Kristin L Young
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Eimear E Kenny
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lindsay Fernandez-Rhodes
- Department of Biobehavioral Health, College of Health and Human Development, Pennsylvania State University, University Park, PA, 16802, USA
| | - Simin Liu
- Department of Epidemiology and Center for Global Cardiometabolic Health, School of Public Health, Brown University, Providence, RI, 02903, USA
| | - Themistocles Assimes
- Division of Cardiovascular Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - David O Garcia
- Department of Health Promotion Sciences, Mel & Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ, 85724, USA
| | - Kim Daeeun
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Christopher R Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Anne E Justice
- Department of Population Health Sciences, Geisinger Health, Danville, PA, 17822, USA
| | - Christopher A Haiman
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, 90033, USA
| | - Steve Buyske
- Department of Statistics, Rutgers University, Piscataway, NJ, 08854, USA
| | - Ulrike Peters
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| |
Collapse
|
15
|
Salehi Nowbandegani P, Wohns AW, Ballard JL, Lander ES, Bloemendal A, Neale BM, O'Connor LJ. Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies. Nat Genet 2023; 55:1494-1502. [PMID: 37640881 DOI: 10.1038/s41588-023-01487-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 07/24/2023] [Indexed: 08/31/2023]
Abstract
Linkage disequilibrium (LD) is the correlation among nearby genetic variants. In genetic association studies, LD is often modeled using large correlation matrices, but this approach is inefficient, especially in ancestrally diverse studies. In the present study, we introduce LD graphical models (LDGMs), which are an extremely sparse and efficient representation of LD. LDGMs are derived from genome-wide genealogies; statistical relationships among alleles in the LDGM correspond to genealogical relationships among haplotypes. We published LDGMs and ancestry-specific LDGM precision matrices for 18 million common variants (minor allele frequency >1%) in five ancestry groups, validated their accuracy and demonstrated order-of-magnitude improvements in runtime for commonly used LD matrix computations. We implemented an extremely fast multiancestry polygenic prediction method, BLUPx-ldgm, which performs better than a similar method based on the reference LD correlation matrix. LDGMs will enable sophisticated methods that scale to ancestrally diverse genetic association data across millions of variants and individuals.
Collapse
Affiliation(s)
- Pouria Salehi Nowbandegani
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Anthony Wilder Wohns
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Stanford University School of Medicine, Stanford, CA, USA.
| | - Jenna L Ballard
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Eric S Lander
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biology, MIT, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Alex Bloemendal
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Luke J O'Connor
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
16
|
Wu Y, Qi T, Wray NR, Visscher PM, Zeng J, Yang J. Joint analysis of GWAS and multi-omics QTL summary statistics reveals a large fraction of GWAS signals shared with molecular phenotypes. CELL GENOMICS 2023; 3:100344. [PMID: 37601976 PMCID: PMC10435383 DOI: 10.1016/j.xgen.2023.100344] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 04/04/2023] [Accepted: 05/23/2023] [Indexed: 08/22/2023]
Abstract
Molecular quantitative trait loci (xQTLs) are often harnessed to prioritize genes or functional elements underpinning variant-trait associations identified from genome-wide association studies (GWASs). Here, we introduce OPERA, a method that jointly analyzes GWAS and multi-omics xQTL summary statistics to enhance the identification of molecular phenotypes associated with complex traits through shared causal variants. Applying OPERA to summary-level GWAS data for 50 complex traits (n = 20,833-766,345) and xQTL data from seven omics layers (n = 100-31,684) reveals that 50% of the GWAS signals are shared with at least one molecular phenotype. GWAS signals shared with multiple molecular phenotypes, such as those at the MSMB locus for prostate cancer, are particularly informative for understanding the genetic regulatory mechanisms underlying complex traits. Future studies with more molecular phenotypes, measured considering spatiotemporal effects in larger samples, are required to obtain a more saturated map linking molecular intermediates to GWAS signals.
Collapse
Affiliation(s)
- Yang Wu
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Ting Qi
- School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310024, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang 310024, China
| | - Naomi R. Wray
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Peter M. Visscher
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Jian Zeng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Jian Yang
- School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310024, China
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang 310024, China
| |
Collapse
|
17
|
Jiang X, Boutin T, Vitart V. Colocalization of corneal resistance factor GWAS loci with GTEx e/sQTLs highlights plausible candidate causal genes for keratoconus postnatal corneal stroma weakening. Front Genet 2023; 14:1171217. [PMID: 37621707 PMCID: PMC10445647 DOI: 10.3389/fgene.2023.1171217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 07/17/2023] [Indexed: 08/26/2023] Open
Abstract
Background: Genome-wide association studies (GWAS) for corneal resistance factor (CRF) have identified 100s of loci and proved useful to uncover genetic determinants for keratoconus, a corneal ectasia of early-adulthood onset and common indication of corneal transplantation. In the current absence of studies to probe the impact of candidate causal variants in the cornea, we aimed to fill some of this knowledge gap by leveraging tissue-shared genetic effects. Methods: 181 CRF signals were examined for evidence of colocalization with genetic signals affecting steady-state gene transcription and splicing in adult, non-eye, tissues of the Genotype-Tissue Expression (GTEx) project. Expression of candidate causal genes thus nominated was evaluated in single cell transcriptomes from adult cornea, limbus and conjunctiva. Fine-mapping and colocalization of CRF and keratoconus GWAS signals was also deployed to support their sharing causal variants. Results and discussion: 26.5% of CRF causal signals colocalized with GTEx v8 signals and nominated genes enriched in genes with high and specific expression in corneal stromal cells amongst tissues examined. Enrichment analyses carried out with nearest genes to all 181 CRF GWAS signals indicated that stromal cells of the limbus could be susceptible to signals that did not colocalize with GTEx's. These cells might not be well represented in GTEx and/or the genetic associations might have context specific effects. The causal signals shared with GTEx provide new insights into mediation of CRF genetic effects, including modulation of splicing events. Functionally relevant roles for several implicated genes' products in providing tensile strength, mechano-sensing and signaling make the corresponding genes and regulatory variants prime candidates to be validated and their roles and effects across tissues elucidated. Colocalization of CRF and keratoconus GWAS signals strengthened support for shared causal variants but also highlighted many ways into which likely true shared signals could be missed when using readily available GWAS summary statistics.
Collapse
Affiliation(s)
- Xinyi Jiang
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom
- Centre for Genetics and Molecular Medicine, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom
| | - Thibaud Boutin
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom
| | - Veronique Vitart
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
18
|
Klarin D, Devineni P, Sendamarai AK, Angueira AR, Graham SE, Shen YH, Levin MG, Pirruccello JP, Surakka I, Karnam PR, Roychowdhury T, Li Y, Wang M, Aragam KG, Paruchuri K, Zuber V, Shakt GE, Tsao NL, Judy RL, Vy HMT, Verma SS, Rader DJ, Do R, Bavaria JE, Nadkarni GN, Ritchie MD, Burgess S, Guo DC, Ellinor PT, LeMaire SA, Milewicz DM, Willer CJ, Natarajan P, Tsao PS, Pyarajan S, Damrauer SM. Genome-wide association study of thoracic aortic aneurysm and dissection in the Million Veteran Program. Nat Genet 2023; 55:1106-1115. [PMID: 37308786 PMCID: PMC10335930 DOI: 10.1038/s41588-023-01420-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 05/05/2023] [Indexed: 06/14/2023]
Abstract
The current understanding of the genetic determinants of thoracic aortic aneurysms and dissections (TAAD) has largely been informed through studies of rare, Mendelian forms of disease. Here, we conducted a genome-wide association study (GWAS) of TAAD, testing ~25 million DNA sequence variants in 8,626 participants with and 453,043 participants without TAAD in the Million Veteran Program, with replication in an independent sample of 4,459 individuals with and 512,463 without TAAD from six cohorts. We identified 21 TAAD risk loci, 17 of which have not been previously reported. We leverage multiple downstream analytic methods to identify causal TAAD risk genes and cell types and provide human genetic evidence that TAAD is a non-atherosclerotic aortic disorder distinct from other forms of vascular disease. Our results demonstrate that the genetic architecture of TAAD mirrors that of other complex traits and that it is not solely inherited through protein-altering variants of large effect size.
Collapse
Affiliation(s)
- Derek Klarin
- Veterans Affairs (VA) Palo Alto Healthcare System, Palo Alto, CA, USA.
- Department of Surgery, Stanford University School of Medicine, Palo Alto, CA, USA.
| | - Poornima Devineni
- Center for Data and Computational Sciences, VA Boston Healthcare System, Boston, MA, USA
| | - Anoop K Sendamarai
- Center for Data and Computational Sciences, VA Boston Healthcare System, Boston, MA, USA
- Carbone Cancer Center, University of Wisconsin-Madison, Madison, WI, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Anthony R Angueira
- Institute for Diabetes, Obesity and Metabolism, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| | - Sarah E Graham
- Department of Internal Medicine, Division of Cardiology, University of Michigan, Ann Arbor, MI, USA
| | - Ying H Shen
- Division of Cardiothoracic Surgery, Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, TX, USA
- Department of Cardiovascular Surgery, Texas Heart Institute, Houston, TX, USA
| | - Michael G Levin
- Division of Cardiovascular Medicine, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Department of Medicine, Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA, USA
| | - James P Pirruccello
- Division of Cardiology, University of California San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Ida Surakka
- Department of Internal Medicine, Division of Cardiology, University of Michigan, Ann Arbor, MI, USA
| | - Purushotham R Karnam
- Center for Data and Computational Sciences, VA Boston Healthcare System, Boston, MA, USA
| | - Tanmoy Roychowdhury
- Department of Internal Medicine, Division of Cardiology, University of Michigan, Ann Arbor, MI, USA
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Yanming Li
- Division of Cardiothoracic Surgery, Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, TX, USA
| | - Minxian Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Krishna G Aragam
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kaavya Paruchuri
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Verena Zuber
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
- MRC Centre for Environment and Health, School of Public Health, Imperial College London, London, UK
- UK Dementia Research Institute at Imperial College, Imperial College London, London, UK
| | - Gabrielle E Shakt
- Division of Cardiovascular Medicine, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Noah L Tsao
- Division of Cardiovascular Medicine, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Renae L Judy
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Ha My T Vy
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Shefali S Verma
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Daniel J Rader
- Division of Translational Medicine and Human Genetics, Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
- Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| | - Ron Do
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Joseph E Bavaria
- Division of Cardiovascular Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Marylyn D Ritchie
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA
| | - Stephen Burgess
- Medical Research Council Biostatistics Unit, University of Cambridge, Cambridge, UK
- Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Dong-Chuan Guo
- Division of Medical Genetics, Department of Internal Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Patrick T Ellinor
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiology Division, Massachusetts General Hospital, Boston, MA, USA
| | - Scott A LeMaire
- Division of Cardiothoracic Surgery, Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, TX, USA
- Department of Cardiovascular Surgery, Texas Heart Institute, Houston, TX, USA
- Cardiovascular Research Institute, Baylor College of Medicine, Houston, TX, USA
| | - Dianna M Milewicz
- Division of Medical Genetics, Department of Internal Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Cristen J Willer
- Department of Internal Medicine, Division of Cardiology, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Pradeep Natarajan
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Philip S Tsao
- Veterans Affairs (VA) Palo Alto Healthcare System, Palo Alto, CA, USA
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Cardiovascular Institute, Stanford, CA, USA
| | - Saiju Pyarajan
- Center for Data and Computational Sciences, VA Boston Healthcare System, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Scott M Damrauer
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA, USA.
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
19
|
Yang Z, Wang C, Liu L, Khan A, Lee A, Vardarajan B, Mayeux R, Kiryluk K, Ionita-Laza I. CARMA is a new Bayesian model for fine-mapping in genome-wide association meta-analyses. Nat Genet 2023; 55:1057-1065. [PMID: 37169873 DOI: 10.1038/s41588-023-01392-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 04/11/2023] [Indexed: 05/13/2023]
Abstract
Fine-mapping is commonly used to identify putative causal variants at genome-wide significant loci. Here we propose a Bayesian model for fine-mapping that has several advantages over existing methods, including flexible specification of the prior distribution of effect sizes, joint modeling of summary statistics and functional annotations and accounting for discrepancies between summary statistics and external linkage disequilibrium in meta-analyses. Using simulations, we compare performance with commonly used fine-mapping methods and show that the proposed model has higher power and lower false discovery rate (FDR) when including functional annotations, and higher power, lower FDR and higher coverage for credible sets in meta-analyses. We further illustrate our approach by applying it to a meta-analysis of Alzheimer's disease genome-wide association studies where we prioritize putatively causal variants and genes.
Collapse
Affiliation(s)
- Zikun Yang
- Department of Biostatistics, Columbia University, New York City, NY, USA
| | - Chen Wang
- Department of Biostatistics, Columbia University, New York City, NY, USA
- Division of Nephrology Department of Medicine College of Physicians and Surgeons, Columbia University, New York City, NY, USA
| | - Linxi Liu
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Atlas Khan
- Division of Nephrology Department of Medicine College of Physicians and Surgeons, Columbia University, New York City, NY, USA
| | - Annie Lee
- Department of Neurology College of Physicians and Surgeons, Columbia University, New York City, NY, USA
| | - Badri Vardarajan
- Department of Neurology College of Physicians and Surgeons, Columbia University, New York City, NY, USA
| | - Richard Mayeux
- Department of Neurology College of Physicians and Surgeons, Columbia University, New York City, NY, USA
| | - Krzysztof Kiryluk
- Division of Nephrology Department of Medicine College of Physicians and Surgeons, Columbia University, New York City, NY, USA
| | | |
Collapse
|
20
|
Zabad S, Gravel S, Li Y. Fast and accurate Bayesian polygenic risk modeling with variational inference. Am J Hum Genet 2023; 110:741-761. [PMID: 37030289 PMCID: PMC10183379 DOI: 10.1016/j.ajhg.2023.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 03/13/2023] [Indexed: 04/10/2023] Open
Abstract
The advent of large-scale genome-wide association studies (GWASs) has motivated the development of statistical methods for phenotype prediction with single-nucleotide polymorphism (SNP) array data. These polygenic risk score (PRS) methods use a multiple linear regression framework to infer joint effect sizes of all genetic variants on the trait. Among the subset of PRS methods that operate on GWAS summary statistics, sparse Bayesian methods have shown competitive predictive ability. However, most existing Bayesian approaches employ Markov chain Monte Carlo (MCMC) algorithms, which are computationally inefficient and do not scale favorably to higher dimensions, for posterior inference. Here, we introduce variational inference of polygenic risk scores (VIPRS), a Bayesian summary statistics-based PRS method that utilizes variational inference techniques to approximate the posterior distribution for the effect sizes. Our experiments with 36 simulation configurations and 12 real phenotypes from the UK Biobank dataset demonstrated that VIPRS is consistently competitive with the state-of-the-art in prediction accuracy while being more than twice as fast as popular MCMC-based approaches. This performance advantage is robust across a variety of genetic architectures, SNP heritabilities, and independent GWAS cohorts. In addition to its competitive accuracy on the "White British" samples, VIPRS showed improved transferability when applied to other ethnic groups, with up to 1.7-fold increase in R2 among individuals of Nigerian ancestry for low-density lipoprotein (LDL) cholesterol. To illustrate its scalability, we applied VIPRS to a dataset of 9.6 million genetic markers, which conferred further improvements in prediction accuracy for highly polygenic traits, such as height.
Collapse
Affiliation(s)
- Shadi Zabad
- School of Computer Science, McGill University, Montreal, QC, Canada
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montreal, QC, Canada.
| | - Yue Li
- School of Computer Science, McGill University, Montreal, QC, Canada.
| |
Collapse
|
21
|
Awasthi S, Chen CY, Lam M, Huang H, Ripke S, Altar CA. GWAS quality score for evaluating associated regions in GWAS analyses. Bioinformatics 2023; 39:6991168. [PMID: 36651666 PMCID: PMC9891241 DOI: 10.1093/bioinformatics/btad004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 11/14/2022] [Accepted: 01/17/2023] [Indexed: 01/19/2023] Open
Abstract
MOTIVATION The number of significantly associated regions reported in genome-wide association studies (GWAS) for polygenic traits typically increases with sample size. A traditional tool for quality control and identification of significant regions has been a visual inspection of how significant and correlated genetic variants cluster within a region. However, while inspecting hundreds of regions, this subjective method can misattribute significance to some loci or neglect others that are significant. RESULTS The GWAS quality score (GQS) identifies suspicious regions and prevents erroneous interpretations with an objective, quantitative and automated method. The GQS assesses all measured single nucleotide polymorphisms (SNPs) that are linked by inheritance to each other [linkage disequilibrium (LD)] and compares the significance of trait association of each SNP to its LD value for the reported index SNP. A GQS value of 1.0 ascribes a high level of confidence to the entire region and its underlying gene(s), while GQS values <1.0 indicate the need to closely inspect the outliers. We applied the GQS to published and non-published genome-wide summary statistics and report suspicious regions requiring secondary inspection while supporting the majority of reported regions from large-scale published meta-analyses. AVAILABILITY AND IMPLEMENTATION The GQS code/scripts can be cloned from GitHub (https://github.com/Xswapnil/GQS/). The analyst can use whole-genome summary statistics to estimate GQS for each defined region. We also provide an online tool (http://35.227.18.38/) that gives access to the GQS. The quantitative measure of quality attributes by GQS and its visualization is an objective method that enhances the confidence of each genomic hit. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Swapnil Awasthi
- Department of Psychiatry and Psychotherapy, Charité - Universitätsmedizin, Berlin 10117, Germany
| | | | - Max Lam
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Hailiang Huang
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | | | | |
Collapse
|
22
|
Li A, Liu S, Bakshi A, Jiang L, Chen W, Zheng Z, Sullivan PF, Visscher PM, Wray NR, Yang J, Zeng J. mBAT-combo: A more powerful test to detect gene-trait associations from GWAS data. Am J Hum Genet 2023; 110:30-43. [PMID: 36608683 PMCID: PMC9892780 DOI: 10.1016/j.ajhg.2022.12.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 12/08/2022] [Indexed: 01/07/2023] Open
Abstract
Gene-based association tests aggregate multiple SNP-trait associations into sets defined by gene boundaries and are widely used in post-GWAS analysis. A common approach for gene-based tests is to combine SNPs associations by computing the sum of χ2 statistics. However, this strategy ignores the directions of SNP effects, which could result in a loss of power for SNPs with masking effects, e.g., when the product of two SNP effects and the linkage disequilibrium (LD) correlation is negative. Here, we introduce "mBAT-combo," a set-based test that is better powered than other methods to detect multi-SNP associations in the context of masking effects. We validate the method through simulations and applications to real data. We find that of 35 blood and urine biomarker traits in the UK Biobank, 34 traits show evidence for masking effects in a total of 4,273 gene-trait pairs, indicating that masking effects is common in complex traits. We further validate the improved power of our method in height, body mass index, and schizophrenia with different GWAS sample sizes and show that on average 95.7% of the genes detected only by mBAT-combo with smaller sample sizes can be identified by the single-SNP approach with a 1.7-fold increase in sample sizes. Eleven genes significant only in mBAT-combo for schizophrenia are confirmed by functionally informed fine-mapping or Mendelian randomization integrating gene expression data. The framework of mBAT-combo can be applied to any set of SNPs to refine trait-association signals hidden in genomic regions with complex LD structures.
Collapse
Affiliation(s)
- Ang Li
- Institute for Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Shouye Liu
- Institute for Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Andrew Bakshi
- Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia
| | | | - Wenhan Chen
- Epigenetics Research Laboratory, Genomics and Epigenetics Theme, Garvan Institute of Medical Research, Sydney, NSW, Australia
| | - Zhili Zheng
- Institute for Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Patrick F Sullivan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden; Departments of Genetics and Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Peter M Visscher
- Institute for Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Naomi R Wray
- Institute for Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia; Queensland Brain Institute, University of Queensland, Brisbane, QLD, Australia
| | - Jian Yang
- School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China; Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, China
| | - Jian Zeng
- Institute for Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia.
| |
Collapse
|
23
|
Kanai M, Elzur R, Zhou W, Daly MJ, Finucane HK. Meta-analysis fine-mapping is often miscalibrated at single-variant resolution. CELL GENOMICS 2022; 2:100210. [PMID: 36643910 PMCID: PMC9839193 DOI: 10.1016/j.xgen.2022.100210] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 08/19/2022] [Accepted: 10/11/2022] [Indexed: 11/06/2022]
Abstract
Meta-analysis is pervasively used to combine multiple genome-wide association studies (GWASs). Fine-mapping of meta-analysis studies is typically performed as in a single-cohort study. Here, we first demonstrate that heterogeneity (e.g., of sample size, phenotyping, imputation) hurts calibration of meta-analysis fine-mapping. We propose a summary statistics-based quality-control (QC) method, suspicious loci analysis of meta-analysis summary statistics (SLALOM), that identifies suspicious loci for meta-analysis fine-mapping by detecting outliers in association statistics. We validate SLALOM in simulations and the GWAS Catalog. Applying SLALOM to 14 meta-analyses from the Global Biobank Meta-analysis Initiative (GBMI), we find that 67% of loci show suspicious patterns that call into question fine-mapping accuracy. These predicted suspicious loci are significantly depleted for having nonsynonymous variants as lead variant (2.7×; Fisher's exact p = 7.3 × 10-4). We find limited evidence of fine-mapping improvement in the GBMI meta-analyses compared with individual biobanks. We urge extreme caution when interpreting fine-mapping results from meta-analysis of heterogeneous cohorts.
Collapse
Affiliation(s)
- Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita 565-0871, Japan
| | - Roy Elzur
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Wei Zhou
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Mark J. Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Hilary K. Finucane
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
24
|
Privé F, Arbel J, Aschard H, Vilhjálmsson BJ. Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores. HGG ADVANCES 2022; 3:100136. [PMID: 36105883 PMCID: PMC9465343 DOI: 10.1016/j.xhgg.2022.100136] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 08/11/2022] [Indexed: 11/18/2022] Open
Abstract
Publicly available genome-wide association studies (GWAS) summary statistics exhibit uneven quality, which can impact the validity of follow-up analyses. First, we present an overview of possible misspecifications that come with GWAS summary statistics. Then, in both simulations and real-data analyses, we show that additional information such as imputation INFO scores, allele frequencies, and per-variant sample sizes in GWAS summary statistics can be used to detect possible issues and correct for misspecifications in the GWAS summary statistics. One important motivation for us is to improve the predictive performance of polygenic scores built from these summary statistics. Unfortunately, owing to the lack of reporting standards for GWAS summary statistics, this additional information is not systematically reported. We also show that using well-matched linkage disequilibrium (LD) references can improve model fit and translate into more accurate prediction. Finally, we discuss how to make polygenic score methods such as lassosum and LDpred2 more robust to these misspecifications to improve their predictive power.
Collapse
Affiliation(s)
- Florian Privé
- National Centre for Register-Based Research, Aarhus University, 8210 Aarhus, Denmark
| | - Julyan Arbel
- Université Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| | - Hugues Aschard
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, 75015 Paris, France
- Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Bjarni J. Vilhjálmsson
- National Centre for Register-Based Research, Aarhus University, 8210 Aarhus, Denmark
- Bioinformatics Research Centre, Aarhus University, 8000 Aarhus, Denmark
| |
Collapse
|
25
|
Zou Y, Carbonetto P, Wang G, Stephens M. Fine-mapping from summary data with the "Sum of Single Effects" model. PLoS Genet 2022; 18:e1010299. [PMID: 35853082 PMCID: PMC9337707 DOI: 10.1371/journal.pgen.1010299] [Citation(s) in RCA: 99] [Impact Index Per Article: 49.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 07/29/2022] [Accepted: 06/17/2022] [Indexed: 11/19/2022] Open
Abstract
In recent work, Wang et al introduced the "Sum of Single Effects" (SuSiE) model, and showed that it provides a simple and efficient approach to fine-mapping genetic variants from individual-level data. Here we present new methods for fitting the SuSiE model to summary data, for example to single-SNP z-scores from an association study and linkage disequilibrium (LD) values estimated from a suitable reference panel. To develop these new methods, we first describe a simple, generic strategy for extending any individual-level data method to deal with summary data. The key idea is to replace the usual regression likelihood with an analogous likelihood based on summary data. We show that existing fine-mapping methods such as FINEMAP and CAVIAR also (implicitly) use this strategy, but in different ways, and so this provides a common framework for understanding different methods for fine-mapping. We investigate other common practical issues in fine-mapping with summary data, including problems caused by inconsistencies between the z-scores and LD estimates, and we develop diagnostics to identify these inconsistencies. We also present a new refinement procedure that improves model fits in some data sets, and hence improves overall reliability of the SuSiE fine-mapping results. Detailed evaluations of fine-mapping methods in a range of simulated data sets show that SuSiE applied to summary data is competitive, in both speed and accuracy, with the best available fine-mapping methods for summary data.
Collapse
Affiliation(s)
- Yuxin Zou
- Department of Statistics, University of Chicago, Chicago, Illinois, United States of America
| | - Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Research Computing Center, University of Chicago, Chicago, Illinois, United States of America
| | - Gao Wang
- Department of Neurology and the Gertrude. H. Sergievsky Center, Columbia University, New York, New York, United States of America
| | - Matthew Stephens
- Department of Statistics, University of Chicago, Chicago, Illinois, United States of America
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|