1
|
Hahn J, Temprano-Sagrera G, Hasbani NR, Ligthart S, Dehghan A, Wolberg AS, Smith NL, Sabater-Lleal M, Morrison AC, de Vries PS. Bivariate genome-wide association study of circulating fibrinogen and C-reactive protein levels. J Thromb Haemost 2024; 22:3448-3459. [PMID: 39299614 DOI: 10.1016/j.jtha.2024.08.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 07/23/2024] [Accepted: 08/19/2024] [Indexed: 09/22/2024]
Abstract
BACKGROUND Fibrinogen and C-reactive protein (CRP) play an important role in inflammatory pathways and share multiple genetic loci reported in previously published genome-wide association studies (GWAS), highlighting their common genetic background. Leveraging the shared biology may identify further loci pleiotropically associated with both fibrinogen and CRP. OBJECTIVES To identify novel genetic variants that are pleiotropic and associated with both fibrinogen and CRP, by integrating both phenotypes in a bivariate GWAS by using a multitrait GWAS. METHODS We performed a bivariate GWAS to identify further pleiotropic genetic loci, using summary statistics of previously published GWAS on fibrinogen (n = 120 246) from the Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium, consisting of European ancestry samples and CRP (n = 363 228) from UK Biobank, including 5 different population groups. The main analysis was performed using metaUSAT and N-GWAMA. We conducted replication for novel CRP associations to test the robustness of the findings using an independent GWAS for CRP (n = 148 164). We also performed colocalization analysis to compare the associations in identified loci for the 2 traits and Genotype-Tissue Expression data. RESULTS We identified 87 pleiotropic loci that overlapped between metaUSAT and N-GWAMA, including 23 previously known for either fibrinogen or CRP, 58 novel loci for fibrinogen, and 6 novel loci for both fibrinogen and CRP. Overall, there were 30 pleiotropic and novel loci for both traits, and 7 of these showed evidence of colocalization, located in or near ZZZ3, NR1I2, RP11-72L22.1, MICU1, ARL14EP, SOCS2, and PGM5. Among these 30 loci, 13 replicated for CRP in an independent CRP GWAS. CONCLUSION Bivariate GWAS identified additional associated loci for fibrinogen and CRP. This analysis suggests fibrinogen and CRP share a common genetic architecture with many pleiotropic loci.
Collapse
Affiliation(s)
- Julie Hahn
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, USA.
| | - Gerard Temprano-Sagrera
- Genomics of Complex Diseases Unit, Institut d'Investigació Biomèdica Sant Pau, IIB Sant Pau, Barcelona, Spain
| | - Natalie R Hasbani
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Symen Ligthart
- Department of Intensive Care, Antwerp University Hospital, Edegem, Belgium
| | - Abbas Dehghan
- UK Dementia Research Institute at Imperial College London, London, United Kingdom; Department of Epidemiology and Biostatistics, Imperial College London, London, United Kingdom
| | - Alisa S Wolberg
- Pathology and Laboratory Medicine and University of North Carolina Blood Research Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Nicholas L Smith
- Department of Epidemiology, University of Washington, Seattle, Washington, USA; Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA; Department of Veterans Affairs Office of Research and Development, Seattle Epidemiologic Research and Information Center, Seattle, Washington, USA
| | - Maria Sabater-Lleal
- Genomics of Complex Diseases Unit, Institut d'Investigació Biomèdica Sant Pau, IIB Sant Pau, Barcelona, Spain; Cardiovascular Medicine Unit, Department of Medicine Solna, Karolinska Institutet, Center for Molecular Medicine and Karolinska University Hospital Solna, Stockholm, Sweden
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Paul S de Vries
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, USA.
| |
Collapse
|
2
|
Auvergne A, Traut N, Henches L, Troubat L, Frouin A, Boetto C, Kazem S, Julienne H, Toro R, Aschard H. Multitrait Analysis to Decipher the Intertwined Genetic Architecture of Neuroanatomical Phenotypes and Psychiatric Disorders. BIOLOGICAL PSYCHIATRY. COGNITIVE NEUROSCIENCE AND NEUROIMAGING 2024:S2451-9022(24)00266-0. [PMID: 39260564 DOI: 10.1016/j.bpsc.2024.08.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 06/28/2024] [Accepted: 08/12/2024] [Indexed: 09/13/2024]
Abstract
BACKGROUND There is increasing evidence of shared genetic factors between psychiatric disorders and brain magnetic resonance imaging (MRI) phenotypes. However, deciphering the joint genetic architecture of these outcomes has proven to be challenging, and new approaches are needed to infer the genetic structures that may underlie those phenotypes. Multivariate analyses are a meaningful approach to reveal links between MRI phenotypes and psychiatric disorders missed by univariate approaches. METHODS First, we conducted univariate and multivariate genome-wide association studies for 9 MRI-derived brain volume phenotypes in 20,000 UK Biobank participants. Next, we performed various complementary enrichment analyses to assess whether and how univariate and multitrait approaches could distinguish disorder-associated and non-disorder-associated variants from 6 psychiatric disorders: bipolar disorder, attention-deficit/hyperactivity disorder, autism, schizophrenia, obsessive-compulsive disorder, and major depressive disorder. Finally, we conducted a clustering analysis of top associated variants based on their MRI multitrait association using an optimized k-medoids approach. RESULTS A univariate MRI genome-wide association study revealed only negligible genetic correlations with psychiatric disorders, while a multitrait genome-wide association study identified multiple new associations and showed significant enrichment for variants related to both attention-deficit/hyperactivity disorder and schizophrenia. Clustering analyses also detected 2 clusters that showed not only enrichment for association with attention-deficit/hyperactivity disorder and schizophrenia but also a consistent direction of effects. Functional annotation analyses of those clusters pointed to multiple potential mechanisms, suggesting in particular a role of neurotrophin pathways in both MRI phenotypes and schizophrenia. CONCLUSIONS Our results show that multitrait association signature can be used to infer genetically driven latent MRI variables associated with psychiatric disorders, thereby opening paths for future biomarker development.
Collapse
Affiliation(s)
- Antoine Auvergne
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, France.
| | - Nicolas Traut
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, France
| | - Léo Henches
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, France
| | - Lucie Troubat
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, France
| | - Arthur Frouin
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, France
| | - Christophe Boetto
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, France
| | - Sayeh Kazem
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, France
| | - Hanna Julienne
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, France
| | - Roberto Toro
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, France
| | - Hugues Aschard
- Department of Computational Biology, Institut Pasteur, Université Paris Cité, Paris, France; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts.
| |
Collapse
|
3
|
Deng Q, Song C, Lin S. An adaptive and robust method for multi-trait analysis of genome-wide association studies using summary statistics. Eur J Hum Genet 2024; 32:681-690. [PMID: 37237036 PMCID: PMC11153499 DOI: 10.1038/s41431-023-01389-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 05/01/2023] [Accepted: 05/10/2023] [Indexed: 05/28/2023] Open
Abstract
Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with human traits or diseases in the past decade. Nevertheless, much of the heritability of many traits is still unaccounted for. Commonly used single-trait analysis methods are conservative, while multi-trait methods improve statistical power by integrating association evidence across multiple traits. In contrast to individual-level data, GWAS summary statistics are usually publicly available, and thus methods using only summary statistics have greater usage. Although many methods have been developed for joint analysis of multiple traits using summary statistics, there are many issues, including inconsistent performance, computational inefficiency, and numerical problems when considering lots of traits. To address these challenges, we propose a multi-trait adaptive Fisher method for summary statistics (MTAFS), a computationally efficient method with robust power performance. We applied MTAFS to two sets of brain imaging derived phenotypes (IDPs) from the UK Biobank, including a set of 58 Volumetric IDPs and a set of 212 Area IDPs. Through annotation analysis, the underlying genes of the SNPs identified by MTAFS were found to exhibit higher expression and are significantly enriched in brain-related tissues. Together with results from a simulation study, MTAFS shows its advantage over existing multi-trait methods, with robust performance across a range of underlying settings. It controls type 1 error well and can efficiently handle a large number of traits.
Collapse
Affiliation(s)
- Qiaolan Deng
- Division of Biostatistics, College of Public Health, The Ohio State University, Columbus, OH, USA
- Department of Statistics, College of Arts and Sciences, The Ohio State University, Columbus, OH, USA
| | - Chi Song
- Division of Biostatistics, College of Public Health, The Ohio State University, Columbus, OH, USA
| | - Shili Lin
- Department of Statistics, College of Arts and Sciences, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
4
|
Cao X, Zhang S, Sha Q. A novel method for multiple phenotype association studies based on genotype and phenotype network. PLoS Genet 2024; 20:e1011245. [PMID: 38728360 PMCID: PMC11111089 DOI: 10.1371/journal.pgen.1011245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 05/22/2024] [Accepted: 03/29/2024] [Indexed: 05/12/2024] Open
Abstract
Joint analysis of multiple correlated phenotypes for genome-wide association studies (GWAS) can identify and interpret pleiotropic loci which are essential to understand pleiotropy in diseases and complex traits. Meanwhile, constructing a network based on associations between phenotypes and genotypes provides a new insight to analyze multiple phenotypes, which can explore whether phenotypes and genotypes might be related to each other at a higher level of cellular and organismal organization. In this paper, we first develop a bipartite signed network by linking phenotypes and genotypes into a Genotype and Phenotype Network (GPN). The GPN can be constructed by a mixture of quantitative and qualitative phenotypes and is applicable to binary phenotypes with extremely unbalanced case-control ratios in large-scale biobank datasets. We then apply a powerful community detection method to partition phenotypes into disjoint network modules based on GPN. Finally, we jointly test the association between multiple phenotypes in a network module and a single nucleotide polymorphism (SNP). Simulations and analyses of 72 complex traits in the UK Biobank show that multiple phenotype association tests based on network modules detected by GPN are much more powerful than those without considering network modules. The newly proposed GPN provides a new insight to investigate the genetic architecture among different types of phenotypes. Multiple phenotypes association studies based on GPN are improved by incorporating the genetic information into the phenotype clustering. Notably, it might broaden the understanding of genetic architecture that exists between diagnoses, genes, and pleiotropy.
Collapse
Affiliation(s)
- Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
5
|
Liu L, Yan R, Guo P, Ji J, Gong W, Xue F, Yuan Z, Zhou X. Conditional transcriptome-wide association study for fine-mapping candidate causal genes. Nat Genet 2024; 56:348-356. [PMID: 38279040 DOI: 10.1038/s41588-023-01645-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 12/08/2023] [Indexed: 01/28/2024]
Abstract
Transcriptome-wide association studies (TWASs) aim to integrate genome-wide association studies with expression-mapping studies to identify genes with genetically predicted expression (GReX) associated with a complex trait. In the present report, we develop a method, GIFT (gene-based integrative fine-mapping through conditional TWAS), that performs conditional TWAS analysis by explicitly controlling for GReX of all other genes residing in a local region to fine-map putatively causal genes. GIFT is frequentist in nature, explicitly models both expression correlation and cis-single nucleotide polymorphism linkage disequilibrium across multiple genes and uses a likelihood framework to account for expression prediction uncertainty. As a result, GIFT produces calibrated P values and is effective for fine-mapping. We apply GIFT to analyze six traits in the UK Biobank, where GIFT narrows down the set size of putatively causal genes by 32.16-91.32% compared with existing TWAS fine-mapping approaches. The genes identified by GIFT highlight the importance of vessel regulation in determining blood pressures and lipid metabolism for regulating lipid levels.
Collapse
Affiliation(s)
- Lu Liu
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Ran Yan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Ping Guo
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Jiadong Ji
- Institute for Financial Studies, Shandong University, Jinan, China
| | - Weiming Gong
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China.
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, China.
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
6
|
Zilinskas R, Li C, Shen X, Pan W, Yang T. Inferring a directed acyclic graph of phenotypes from GWAS summary statistics. Biometrics 2024; 80:ujad039. [PMID: 38470257 PMCID: PMC10928990 DOI: 10.1093/biomtc/ujad039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 11/24/2023] [Accepted: 01/04/2024] [Indexed: 03/13/2024]
Abstract
Estimating phenotype networks is a growing field in computational biology. It deepens the understanding of disease etiology and is useful in many applications. In this study, we present a method that constructs a phenotype network by assuming a Gaussian linear structure model embedding a directed acyclic graph (DAG). We utilize genetic variants as instrumental variables and show how our method only requires access to summary statistics from a genome-wide association study (GWAS) and a reference panel of genotype data. Besides estimation, a distinct feature of the method is its summary statistics-based likelihood ratio test on directed edges. We applied our method to estimate a causal network of 29 cardiovascular-related proteins and linked the estimated network to Alzheimer's disease (AD). A simulation study was conducted to demonstrate the effectiveness of this method. An R package sumdag implementing the proposed method, all relevant code, and a Shiny application are available.
Collapse
Affiliation(s)
| | - Chunlin Li
- Department of Statistics, Iowa State University, Ames, IA 50011, United States
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Wei Pan
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, MN 55455, United States
| | - Tianzhong Yang
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, MN 55455, United States
| |
Collapse
|
7
|
Cao R, Olawsky E, McFowland E, Marcotte E, Spector L, Yang T. Subset scanning for multi-trait analysis using GWAS summary statistics. Bioinformatics 2024; 40:btad777. [PMID: 38191683 PMCID: PMC11087659 DOI: 10.1093/bioinformatics/btad777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 11/23/2023] [Accepted: 01/05/2024] [Indexed: 01/10/2024] Open
Abstract
MOTIVATION Multi-trait analysis has been shown to have greater statistical power than single-trait analysis. Most of the existing multi-trait analysis methods only work with a limited number of traits and usually prioritize high statistical power over identifying relevant traits, which heavily rely on domain knowledge. RESULTS To handle diseases and traits with obscure etiology, we developed TraitScan, a powerful and fast algorithm that identifies potential pleiotropic traits from a moderate or large number of traits (e.g. dozens to thousands) and tests the association between one genetic variant and the selected traits. TraitScan can handle either individual-level or summary-level GWAS data. We evaluated TraitScan using extensive simulations and found that it outperformed existing methods in terms of both testing power and trait selection when sparsity was low or modest. We then applied it to search for traits associated with Ewing Sarcoma, a rare bone tumor with peak onset in adolescence, among 754 traits in UK Biobank. Our analysis revealed a few promising traits worthy of further investigation, highlighting the use of TraitScan for more effective multi-trait analysis as biobanks emerge. We also extended TraitScan to search and test association with a polygenic risk score and genetically imputed gene expression. AVAILABILITY AND IMPLEMENTATION Our algorithm is implemented in an R package "TraitScan" available at https://github.com/RuiCao34/TraitScan.
Collapse
Affiliation(s)
- Rui Cao
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, United States
| | - Evan Olawsky
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, United States
| | - Edward McFowland
- Technology and Operations Management, Harvard Business School, Harvard University, Boston, MA 02163, United States
| | - Erin Marcotte
- Division of Epidemiology and Clinical Research, Department of Pediatrics, University of Minnesota, Minneapolis, MN 55454, United States
| | - Logan Spector
- Division of Epidemiology and Clinical Research, Department of Pediatrics, University of Minnesota, Minneapolis, MN 55454, United States
| | - Tianzhong Yang
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN 55414, United States
- Division of Epidemiology and Clinical Research, Department of Pediatrics, University of Minnesota, Minneapolis, MN 55454, United States
| |
Collapse
|
8
|
Xie H, Cao X, Zhang S, Sha Q. Joint analysis of multiple phenotypes for extremely unbalanced case-control association studies using multi-layer network. Bioinformatics 2023; 39:btad707. [PMID: 37991852 PMCID: PMC10697735 DOI: 10.1093/bioinformatics/btad707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Revised: 09/29/2023] [Accepted: 11/21/2023] [Indexed: 11/24/2023] Open
Abstract
MOTIVATION Genome-wide association studies is an essential tool for analyzing associations between phenotypes and single nucleotide polymorphisms (SNPs). Most of binary phenotypes in large biobanks are extremely unbalanced, which leads to inflated type I error rates for many widely used association tests for joint analysis of multiple phenotypes. In this article, we first propose a novel method to construct a Multi-Layer Network (MLN) using individuals with at least one case status among all phenotypes. Then, we introduce a computationally efficient community detection method to group phenotypes into disjoint clusters based on the MLN. Finally, we propose a novel approach, MLN with Omnibus (MLN-O), to jointly analyse the association between phenotypes and a SNP. MLN-O uses the score test to test the association of each merged phenotype in a cluster and a SNP, then uses the Omnibus test to obtain an overall test statistic to test the association between all phenotypes and a SNP. RESULTS We conduct extensive simulation studies to reveal that the proposed approach can control type I error rates and is more powerful than some existing methods. Meanwhile, we apply the proposed method to a real data set in the UK Biobank. Using phenotypes in Chapter XIII (Diseases of the musculoskeletal system and connective tissue) in the UK Biobank, we find that MLN-O identifies more significant SNPs than other methods we compare with. AVAILABILITY AND IMPLEMENTATION https://github.com/Hongjing-Xie/Multi-Layer-Network-with-Omnibus-MLN-O.
Collapse
Affiliation(s)
- Hongjing Xie
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, United States
| | - Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, United States
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, United States
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, United States
| |
Collapse
|
9
|
Zilinskas R, Li C, Shen X, Pan W, Yang T. Inferring a directed acyclic graph of phenotypes from GWAS summary statistics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.10.528092. [PMID: 38045347 PMCID: PMC10690198 DOI: 10.1101/2023.02.10.528092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Estimating phenotype networks is a growing field in computational biology. It deepens the understanding of disease etiology and is useful in many applications. In this study, we present a method that constructs a phenotype network by assuming a Gaussian linear structure model embedding a directed acyclic graph (DAG). We utilize genetic variants as instrumental variables and show how our method only requires access to summary statistics from a genome-wide association study (GWAS) and a reference panel of genotype data. Besides estimation, a distinct feature of the method is its summary statistics-based likelihood ratio test on directed edges. We applied our method to estimate a causal network of 29 cardiovascular-related proteins and linked the estimated network to Alzheimer's disease (AD). A simulation study was conducted to demonstrate the effectiveness of this method. An R package sumdag implementing the proposed method, all relevant code, and a Shiny application are available at https://github.com/chunlinli/sumdag.
Collapse
Affiliation(s)
| | - Chunlin Li
- Department of Statistics, Iowa State University, Ames, Iowa 50011, U.S.A
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, Minnesota 55455, U.S.A
| | - Wei Pan
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota 55455, U.S.A
| | - Tianzhong Yang
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota 55455, U.S.A
| |
Collapse
|
10
|
Deng Y, Tu D, O'Callaghan CJ, Liu G, Xu W. Two-stage multivariate Mendelian randomization on multiple outcomes with mixed distributions. Stat Methods Med Res 2023; 32:1543-1558. [PMID: 37338962 PMCID: PMC10515454 DOI: 10.1177/09622802231181220] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2023]
Abstract
In clinical research, it is important to study whether certain clinical factors or exposures have causal effects on clinical and patient-reported outcomes such as toxicities, quality of life, and self-reported symptoms, which can help improve patient care. Usually, such outcomes are recorded as multiple variables with different distributions. Mendelian randomization (MR) is a commonly used technique for causal inference with the help of genetic instrumental variables to deal with observed and unobserved confounders. Nevertheless, the current methodology of MR for multiple outcomes only focuses on one outcome at a time, meaning that it does not consider the correlation structure of multiple outcomes, which may lead to a loss of statistical power. In situations with multiple outcomes of interest, especially when there are mixed correlated outcomes with different distributions, it is much more desirable to jointly analyze them with a multivariate approach. Some multivariate methods have been proposed to model mixed outcomes; however, they do not incorporate instrumental variables and cannot handle unmeasured confounders. To overcome the above challenges, we propose a two-stage multivariate Mendelian randomization method (MRMO) that can perform multivariate analysis of mixed outcomes using genetic instrumental variables. We demonstrate that our proposed MRMO algorithm can gain power over the existing univariate MR method through simulation studies and a clinical application on a randomized Phase III clinical trial study on colorectal cancer patients.
Collapse
Affiliation(s)
- Yangqing Deng
- Department of Biostatistics, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Dongsheng Tu
- Canadian Cancer Trials Group, Queen's University, Kingston, ON, Canada
| | | | - Geoffrey Liu
- Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- Medical Oncology and Hematology, Princess Margaret Cancer Centre, Toronto, ON, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Wei Xu
- Department of Biostatistics, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
11
|
Lin Z, Xue H, Pan W. Combining Mendelian randomization and network deconvolution for inference of causal networks with GWAS summary data. PLoS Genet 2023; 19:e1010762. [PMID: 37200398 PMCID: PMC10231771 DOI: 10.1371/journal.pgen.1010762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 05/31/2023] [Accepted: 04/25/2023] [Indexed: 05/20/2023] Open
Abstract
Mendelian randomization (MR) has been increasingly applied for causal inference with observational data by using genetic variants as instrumental variables (IVs). However, the current practice of MR has been largely restricted to investigating the total causal effect between two traits, while it would be useful to infer the direct causal effect between any two of many traits (by accounting for indirect or mediating effects through other traits). For this purpose we propose a two-step approach: we first apply an extended MR method to infer (i.e. both estimate and test) a causal network of total effects among multiple traits, then we modify a graph deconvolution algorithm to infer the corresponding network of direct effects. Simulation studies showed much better performance of our proposed method than existing ones. We applied the method to 17 large-scale GWAS summary datasets (with median N = 256879 and median #IVs = 48) to infer the causal networks of both total and direct effects among 11 common cardiometabolic risk factors, 4 cardiometabolic diseases (coronary artery disease, stroke, type 2 diabetes, atrial fibrillation), Alzheimer's disease and asthma, identifying some interesting causal pathways. We also provide an R Shiny app (https://zhaotongl.shinyapps.io/cMLgraph/) for users to explore any subset of the 17 traits of interest.
Collapse
Affiliation(s)
- Zhaotong Lin
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Haoran Xue
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
12
|
Lin Z, Xue H, Pan W. Robust multivariable Mendelian randomization based on constrained maximum likelihood. Am J Hum Genet 2023; 110:592-605. [PMID: 36948188 PMCID: PMC10119150 DOI: 10.1016/j.ajhg.2023.02.014] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 02/27/2023] [Indexed: 03/24/2023] Open
Abstract
Mendelian randomization (MR) is a powerful tool for causal inference with observational genome-wide association study (GWAS) summary data. Compared to the more commonly used univariable MR (UVMR), multivariable MR (MVMR) not only is more robust to the notorious problem of genetic (horizontal) pleiotropy but also estimates the direct effect of each exposure on the outcome after accounting for possible mediating effects of other exposures. Despite promising applications, there is a lack of studies on MVMR's theoretical properties and robustness in applications. In this work, we propose an efficient and robust MVMR method based on constrained maximum likelihood (cML), called MVMR-cML, with strong theoretical support. Extensive simulations demonstrate that MVMR-cML performs better than other existing MVMR methods while possessing the above two advantages over its univariable counterpart. An application to several large-scale GWAS summary datasets to infer causal relationships between eight cardiometabolic risk factors and coronary artery disease (CAD) highlights the usefulness and some advantages of the proposed method. For example, after accounting for possible pleiotropic and mediating effects, triglyceride (TG), low-density lipoprotein cholesterol (LDL), and systolic blood pressure (SBP) had direct effects on CAD; in contrast, the effects of high-density lipoprotein cholesterol (HDL), diastolic blood pressure (DBP), and body height diminished after accounting for other risk factors.
Collapse
Affiliation(s)
- Zhaotong Lin
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Haoran Xue
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA.
| |
Collapse
|
13
|
Zigarelli AM, Venera HM, Receveur BA, Wolf JM, Westra J, Tintle NL. Multimarker omnibus tests by leveraging individual marker summary statistics from large biobanks. Ann Hum Genet 2023; 87:125-136. [PMID: 36683423 DOI: 10.1111/ahg.12495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 12/24/2022] [Accepted: 01/04/2023] [Indexed: 01/24/2023]
Abstract
As biobanks become increasingly popular, access to genotypic and phenotypic data continues to increase in the form of precomputed summary statistics (PCSS). Widespread accessibility of PCSS alleviates many issues related to biobank data, including that of data privacy and confidentiality, as well as high computational costs. However, questions remain about how to maximally leverage PCSS for downstream statistical analyses. Here we present a novel method for testing the association of an arbitrary number of single nucleotide variants (SNVs) on a linear combination of phenotypes after adjusting for covariates for common multimarker tests (e.g., SKAT, SKAT-O) without access to individual patient-level data (IPD). We validate exact formulas for each method, and demonstrate their accuracy through simulation studies and an application to fatty acid phenotypic data from the Framingham Heart Study.
Collapse
Affiliation(s)
- Angela M Zigarelli
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Massachusetts, USA
| | - Hanna M Venera
- Division of Biostatistics, University of Michigan, Michigan, USA
| | - Brody A Receveur
- Department of Statistics, George Mason University, Virginia, USA
| | - Jack M Wolf
- Division of Biostatistics, University of Minnesota, Minnesota, USA
| | - Jason Westra
- Department of Math, Computer Science, and Statistics, Dordt University, Iowa, USA
| | - Nathan L Tintle
- Department of Population Health Nursing Sciences, University of Illinois Chicago, Chicago, Illinois, USA
| |
Collapse
|
14
|
Zhao C, Jia X, Wang Y, Luo Z, Fan J, Shi X, Yang Y. Overlapping genetic susceptibility of seven autoimmune diseases:SPU tests based on genome-wide association summary statistics. Gene 2022; 851:147036. [DOI: 10.1016/j.gene.2022.147036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 10/26/2022] [Accepted: 11/04/2022] [Indexed: 11/11/2022]
|
15
|
Anwar MY, Baldassari AR, Polikowsky HG, Sitlani CM, Highland HM, Chami N, Chen HH, Graff M, Howard AG, Jung SY, Petty LE, Wang Z, Zhu W, Buyske S, Cheng I, Kaplan R, Kooperberg C, Loos RJF, Peters U, McCormick JB, Fisher-Hoch SP, Avery CL, Taylor KC, Below JE, North KE. Genetic pleiotropy underpinning adiposity and inflammation in self-identified Hispanic/Latino populations. BMC Med Genomics 2022; 15:192. [PMID: 36088317 PMCID: PMC9464371 DOI: 10.1186/s12920-022-01352-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 09/02/2022] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Concurrent variation in adiposity and inflammation suggests potential shared functional pathways and pleiotropic disease underpinning. Yet, exploration of pleiotropy in the context of adiposity-inflammation has been scarce, and none has included self-identified Hispanic/Latino populations. Given the high level of ancestral diversity in Hispanic American population, genetic studies may reveal variants that are infrequent/monomorphic in more homogeneous populations. METHODS Using multi-trait Adaptive Sum of Powered Score (aSPU) method, we examined individual and shared genetic effects underlying inflammatory (CRP) and adiposity-related traits (Body Mass Index [BMI]), and central adiposity (Waist to Hip Ratio [WHR]) in HLA participating in the Population Architecture Using Genomics and Epidemiology (PAGE) cohort (N = 35,871) with replication of effects in the Cameron County Hispanic Cohort (CCHC) which consists of Mexican American individuals. RESULTS Of the > 16 million SNPs tested, variants representing 7 independent loci were found to illustrate significant association with multiple traits. Two out of 7 variants were replicated at statistically significant level in multi-trait analyses in CCHC. The lead variant on APOE (rs439401) and rs11208712 were found to harbor multi-trait associations with adiposity and inflammation. CONCLUSIONS Results from this study demonstrate the importance of considering pleiotropy for improving our understanding of the etiology of the various metabolic pathways that regulate cardiovascular disease development.
Collapse
Affiliation(s)
- Mohammad Yaser Anwar
- Department of Epidemiology, University of North Carolina at Chapel Hill, 123 West Franklin Street, CVD Genetic Epidemiology Lab, Fl #4, Room A7, Chapel Hill, NC, 27599, USA.
| | - Antoine R Baldassari
- Department of Epidemiology, University of North Carolina at Chapel Hill, 123 West Franklin Street, CVD Genetic Epidemiology Lab, Fl #4, Room A7, Chapel Hill, NC, 27599, USA
| | - Hannah G Polikowsky
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Colleen M Sitlani
- Department of Medicine, University of Washington, Seattle, WA, 98195, USA
| | - Heather M Highland
- Department of Epidemiology, University of North Carolina at Chapel Hill, 123 West Franklin Street, CVD Genetic Epidemiology Lab, Fl #4, Room A7, Chapel Hill, NC, 27599, USA
| | - Nathalie Chami
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Hung-Hsin Chen
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Mariaelisa Graff
- Department of Epidemiology, University of North Carolina at Chapel Hill, 123 West Franklin Street, CVD Genetic Epidemiology Lab, Fl #4, Room A7, Chapel Hill, NC, 27599, USA
| | - Annie Green Howard
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
| | - Su Yon Jung
- Translational Sciences Section, School of Nursing, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Lauren E Petty
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Zhe Wang
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Wanying Zhu
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Steven Buyske
- Department of Statistics, Rutgers University, Piscataway, NJ, 08854, USA
| | - Iona Cheng
- Department of Epidemiology and Biostatistics, Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, 94115, USA
| | - Robert Kaplan
- Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ulrike Peters
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Joseph B McCormick
- School of Public Health, University of Texas Health Science Center at Houston, Brownsville Regional Campus, Brownsville, TX, 78520, USA
| | - Susan P Fisher-Hoch
- School of Public Health, University of Texas Health Science Center at Houston, Brownsville Regional Campus, Brownsville, TX, 78520, USA
| | - Christy L Avery
- Department of Epidemiology, University of North Carolina at Chapel Hill, 123 West Franklin Street, CVD Genetic Epidemiology Lab, Fl #4, Room A7, Chapel Hill, NC, 27599, USA
- Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
| | - Kira C Taylor
- Department of Epidemiology and Population Health, University of Louisville School of Public Health and Information Sciences, Louisville, KT, 40202, USA
| | - Jennifer E Below
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina at Chapel Hill, 123 West Franklin Street, CVD Genetic Epidemiology Lab, Fl #4, Room A7, Chapel Hill, NC, 27599, USA
| |
Collapse
|
16
|
Integrating variant functional annotation scores have varied abilities to improve power of genome-wide association studies. Sci Rep 2022; 12:10720. [PMID: 35750789 PMCID: PMC9232605 DOI: 10.1038/s41598-022-14924-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Accepted: 06/15/2022] [Indexed: 11/12/2022] Open
Abstract
Functional annotations have the potential to increase power of genome-wide association studies (GWAS) by prioritizing variants according to their biological function, but this potential has not been well studied. We comprehensively evaluated all 1132 traits in the UK Biobank whose SNP-heritability estimates were given “medium” or “high” labels by Neale’s lab. For each trait, we integrated GWAS summary statistics of close to 8 million common variants (minor allele frequency \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$>1\%$$\end{document}>1%) with either their 75 individual functional scores or their meta-scores, using three different data-integration methods. Overall, the number of new genome-wide significant findings after data-integration increases as a trait SNP-heritability estimate increases. However, there is a trade-off between new findings and loss of baseline GWAS findings, resulting in similar total numbers of significant findings between using GWAS alone and integrating GWAS with functional scores, across all 1132 traits analyzed and all three data-integration methods considered. Our findings suggest that, even with the current biobank-level sample size, more informative functional scores and/or new data-integration methods are needed to further improve the power of GWAS of common variants. For example, studying variants in coding sequence and obtaining cell-type-specific scores are potential future directions.
Collapse
|
17
|
Yang Y, Basu S, Zhang L. A Bayesian hierarchically structured prior for gene-based association testing with multiple traits in genome-wide association studies. Genet Epidemiol 2022; 46:63-72. [PMID: 34787916 PMCID: PMC8795481 DOI: 10.1002/gepi.22437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 09/28/2021] [Accepted: 10/18/2021] [Indexed: 02/03/2023]
Abstract
Although genome-wide association studies (GWAS) often collect data on multiple correlated traits for complex diseases, conventional gene-based analysis is usually univariate, and therefore, treating traits as uncorrelated. Multivariate analysis of multiple correlated traits can potentially increase the power to detect genes that affect some or all of these traits. In this study, we propose the multivariate hierarchically structured variable selection (HSVS-M) model, a flexible Bayesian model that tests the association of a gene with multiple correlated traits. With only summary statistics, HSVS-M can account for the correlations among genetic variants and among traits simultaneously and can also estimate the various directions and magnitudes of associations between a gene and multiple traits. Simulation studies show that HSVS-M substantially outperforms competing methods in various scenarios, particularly when variants in a gene are associated with a trait in similar directions and magnitudes. We applied HSVS-M to the summary statistics of a meta-analysis GWAS on four lipid traits from the Global Lipids Genetics Consortium and identified 15 genes that have also been confirmed as risk factors in previous studies.
Collapse
Affiliation(s)
- Yi Yang
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA,Department of Biostatistics, Columbia University, New York, NY 10032, USA,Correspondence:
| | - Saonli Basu
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Lin Zhang
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
18
|
Aguate FM, Vazquez AI, Merriman TR, de Los Campos G. Mapping pleiotropic loci using a fast-sequential testing algorithm. Eur J Hum Genet 2021; 29:1762-1773. [PMID: 34145383 PMCID: PMC8633382 DOI: 10.1038/s41431-021-00911-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 04/27/2021] [Accepted: 05/19/2021] [Indexed: 02/07/2023] Open
Abstract
Pleiotropy (i.e., genes with effects on multiple traits) leads to genetic correlations between traits and contributes to the development of many syndromes. Identifying variants with pleiotropic effects on multiple health-related traits can improve the biological understanding of gene action and disease etiology, and can help to advance disease-risk prediction. Sequential testing is a powerful approach for mapping genes with pleiotropic effects. However, the existing methods and the available software do not scale to analyses involving millions of SNPs and large datasets. This has limited the adoption of sequential testing for pleiotropy mapping at large scale. In this study, we present a sequential test and software that can be used to test pleiotropy in large systems of traits with biobank-sized data. Using simulations, we show that the methods implemented in the software are powerful and have adequate type-I error rate control. To demonstrate the use of the methods and software, we present a whole-genome scan in search of loci with pleiotropic effects on seven traits related to metabolic syndrome (MetS) using UK-Biobank data (n~300 K distantly related white European participants). We found abundant pleiotropy and report 170, 44, and 18 genomic regions harboring SNPs with pleiotropic effects in at least two, three, and four of the seven traits, respectively. We validate our results using previous studies documented in the GWAS-catalog and using data from GTEx. Our results confirm previously reported loci and lead to several novel discoveries that link MetS-related traits through plausible biological pathways.
Collapse
Affiliation(s)
- Fernando M Aguate
- Department of Epidemiology & Biostatistics, IQ - Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA.
| | - Ana I Vazquez
- Department of Epidemiology & Biostatistics, IQ - Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Tony R Merriman
- Department of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Gustavo de Los Campos
- Department of Epidemiology & Biostatistics, IQ - Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA.
- Department of Statistics & Probability, Michigan State University, East Lansing, MI, USA.
| |
Collapse
|
19
|
Liu W, Xu Y, Wang A, Huang T, Liu Z. The eigen higher criticism and eigen Berk–Jones tests for multiple trait association studies based on GWAS summary statistics. Genet Epidemiol 2021; 46:89-104. [DOI: 10.1002/gepi.22439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 09/10/2021] [Accepted: 10/21/2021] [Indexed: 11/11/2022]
Affiliation(s)
- Wei Liu
- Department of Statistics and Actuarial Science The University of Hong Kong Hong Kong SAR China
- Department of Cell Biology and Genetics, School of Basic Medical Sciences Xi'an Jiaotong University Health Science Center Xi'an China
| | - Yuyang Xu
- Department of Statistics and Actuarial Science The University of Hong Kong Hong Kong SAR China
| | - Anqi Wang
- Department of Statistics and Actuarial Science The University of Hong Kong Hong Kong SAR China
| | - Tao Huang
- Department of Epidemiology and Biostatistics, School of Public Health Peking University Beijing China
- Institute for Artificial Intelligence, Center for Intelligent Public Health Peking University Beijing China
- Key Laboratory of Molecular Cardiovascular Diseases, Peking University Ministry of Education Beijing China
| | - Zhonghua Liu
- Department of Statistics and Actuarial Science The University of Hong Kong Hong Kong SAR China
| |
Collapse
|
20
|
Wolf JM, Westra J, Tintle N. Using Summary Statistics to Model Multiplicative Combinations of Initially Analyzed Phenotypes With a Flexible Choice of Covariates. Front Genet 2021; 12:745901. [PMID: 34712269 PMCID: PMC8546319 DOI: 10.3389/fgene.2021.745901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 09/23/2021] [Indexed: 12/03/2022] Open
Abstract
While the promise of electronic medical record and biobank data is large, major questions remain about patient privacy, computational hurdles, and data access. One promising area of recent development is pre-computing non-individually identifiable summary statistics to be made publicly available for exploration and downstream analysis. In this manuscript we demonstrate how to utilize pre-computed linear association statistics between individual genetic variants and phenotypes to infer genetic relationships between products of phenotypes (e.g., ratios; logical combinations of binary phenotypes using "and" and "or") with customized covariate choices. We propose a method to approximate covariate adjusted linear models for products and logical combinations of phenotypes using only pre-computed summary statistics. We evaluate our method's accuracy through several simulation studies and an application modeling ratios of fatty acids using data from the Framingham Heart Study. These studies show consistent ability to recapitulate analysis results performed on individual level data including maintenance of the Type I error rate, power, and effect size estimates. An implementation of this proposed method is available in the publicly available R package pcsstools.
Collapse
Affiliation(s)
- Jack M. Wolf
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, United States
| | - Jason Westra
- Department of Mathematics, Computer Science, and Statistics, Dordt University, Sioux Center, IA, United States
| | - Nathan Tintle
- Department of Mathematics, Computer Science, and Statistics, Dordt University, Sioux Center, IA, United States
- Department of Population Health Nursing Science, College of Nursing, University of Illinois Chicago, Chicago, IL, United States
| |
Collapse
|
21
|
Julienne H, Laville V, McCaw ZR, He Z, Guillemot V, Lasry C, Ziyatdinov A, Nerin C, Vaysse A, Lechat P, Ménager H, Le Goff W, Dube MP, Kraft P, Ionita-Laza I, Vilhjálmsson BJ, Aschard H. Multitrait GWAS to connect disease variants and biological mechanisms. PLoS Genet 2021; 17:e1009713. [PMID: 34460823 PMCID: PMC8437297 DOI: 10.1371/journal.pgen.1009713] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 09/13/2021] [Accepted: 07/12/2021] [Indexed: 12/30/2022] Open
Abstract
Genome-wide association studies (GWASs) have uncovered a wealth of associations between common variants and human phenotypes. Here, we present an integrative analysis of GWAS summary statistics from 36 phenotypes to decipher multitrait genetic architecture and its link with biological mechanisms. Our framework incorporates multitrait association mapping along with an investigation of the breakdown of genetic associations into clusters of variants harboring similar multitrait association profiles. Focusing on two subsets of immunity and metabolism phenotypes, we then demonstrate how genetic variants within clusters can be mapped to biological pathways and disease mechanisms. Finally, for the metabolism set, we investigate the link between gene cluster assignment and the success of drug targets in randomized controlled trials.
Collapse
Affiliation(s)
- Hanna Julienne
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Vincent Laville
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Zachary R. McCaw
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, California, United States of America
| | - Vincent Guillemot
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Carla Lasry
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Andrey Ziyatdinov
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Cyril Nerin
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Amaury Vaysse
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Pierre Lechat
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Hervé Ménager
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Wilfried Le Goff
- Sorbonne Université, INSERM, Institute of Cardiometabolism and Nutrition (ICAN), UMR_S 1166, Paris, France
| | - Marie-Pierre Dube
- Université de Montréal Beaulieu-Saucier Pharmacogenomics Centre, Montreal Heart Institute, Montreal, Canada
- Université de Montréal, Faculty of Medicine, Department of medicine, Université de Montréal, Montreal, Canada
| | - Peter Kraft
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Iuliana Ionita-Laza
- Department of Biostatistics, Columbia University, New York, New York, United States of America
| | - Bjarni J. Vilhjálmsson
- National Centre for Register-based Research, Department of Economics and Business Economics, Aarhus University, Aarhus, Denmark
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Hugues Aschard
- Department of Computational Biology, Institut Pasteur, Paris, France
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
22
|
Sitlani CM, Baldassari AR, Highland HM, Hodonsky CJ, McKnight B, Avery CL. Comparison of adaptive multiple phenotype association tests using summary statistics in genome-wide association studies. Hum Mol Genet 2021; 30:1371-1383. [PMID: 33949650 PMCID: PMC8283209 DOI: 10.1093/hmg/ddab126] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 04/26/2021] [Accepted: 04/27/2021] [Indexed: 12/15/2022] Open
Abstract
Genome-wide association studies have been successful mapping loci for individual phenotypes, but few studies have comprehensively interrogated evidence of shared genetic effects across multiple phenotypes simultaneously. Statistical methods have been proposed for analyzing multiple phenotypes using summary statistics, which enables studies of shared genetic effects while avoiding challenges associated with individual-level data sharing. Adaptive tests have been developed to maintain power against multiple alternative hypotheses because the most powerful single-alternative test depends on the underlying structure of the associations between the multiple phenotypes and a single nucleotide polymorphism (SNP). Here we compare the performance of six such adaptive tests: two adaptive sum of powered scores (aSPU) tests, the unified score association test (metaUSAT), the adaptive test in a mixed-models framework (mixAda) and two principal-component-based adaptive tests (PCAQ and PCO). Our simulations highlight practical challenges that arise when multivariate distributions of phenotypes do not satisfy assumptions of multivariate normality. Previous reports in this context focus on low minor allele count (MAC) and omit the aSPU test, which relies less than other methods on asymptotic and distributional assumptions. When these assumptions are not satisfied, particularly when MAC is low and/or phenotype covariance matrices are singular or nearly singular, aSPU better preserves type I error, sometimes at the cost of decreased power. We illustrate this trade-off with multiple phenotype analyses of six quantitative electrocardiogram traits in the Population Architecture using Genomics and Epidemiology (PAGE) study.
Collapse
Affiliation(s)
- Colleen M Sitlani
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98101 USA
| | - Antoine R Baldassari
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516 USA
| | - Heather M Highland
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516 USA
| | - Chani J Hodonsky
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908 USA
| | - Barbara McKnight
- Department of Biostatistics, University of Washington, Seattle, WA 98195 USA
| | - Christy L Avery
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516 USA
| |
Collapse
|
23
|
Hu Y, Bien SA, Nishimura KK, Haessler J, Hodonsky CJ, Baldassari AR, Highland HM, Wang Z, Preuss M, Sitlani CM, Wojcik GL, Tao R, Graff M, Huckins LM, Sun Q, Chen MH, Mousas A, Auer PL, Lettre G, Tang W, Qi L, Thyagarajan B, Buyske S, Fornage M, Hindorff LA, Li Y, Lin D, Reiner AP, North KE, Loos RJF, Raffield LM, Peters U, Avery CL, Kooperberg C. Multi-ethnic genome-wide association analyses of white blood cell and platelet traits in the Population Architecture using Genomics and Epidemiology (PAGE) study. BMC Genomics 2021; 22:432. [PMID: 34107879 PMCID: PMC8191001 DOI: 10.1186/s12864-021-07745-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 05/26/2021] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Circulating white blood cell and platelet traits are clinically linked to various disease outcomes and differ across individuals and ancestry groups. Genetic factors play an important role in determining these traits and many loci have been identified. However, most of these findings were identified in populations of European ancestry (EA), with African Americans (AA), Hispanics/Latinos (HL), and other races/ethnicities being severely underrepresented. RESULTS We performed ancestry-combined and ancestry-specific genome-wide association studies (GWAS) for white blood cell and platelet traits in the ancestrally diverse Population Architecture using Genomics and Epidemiology (PAGE) Study, including 16,201 AA, 21,347 HL, and 27,236 EA participants. We identified six novel findings at suggestive significance (P < 5E-8), which need confirmation, and independent signals at six previously established regions at genome-wide significance (P < 2E-9). We confirmed multiple previously reported genome-wide significant variants in the single variant association analysis and multiple genes using PrediXcan. Evaluation of loci reported from a Euro-centric GWAS indicated attenuation of effect estimates in AA and HL compared to EA populations. CONCLUSIONS Our results highlighted the potential to identify ancestry-specific and ancestry-agnostic variants in participants with diverse backgrounds and advocate for continued efforts in improving inclusion of racially/ethnically diverse populations in genetic association studies for complex traits.
Collapse
Affiliation(s)
- Yao Hu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Stephanie A Bien
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Katherine K Nishimura
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Jeffrey Haessler
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Chani J Hodonsky
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Antoine R Baldassari
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Heather M Highland
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Zhe Wang
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Michael Preuss
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Colleen M Sitlani
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
| | | | - Ran Tao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
- The Vanderbilt Genetics Institute, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Mariaelisa Graff
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Laura M Huckins
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Quan Sun
- Department of Biostatistics, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Ming-Huei Chen
- The Framingham Heart Study, National Heart, Lung and Blood Institute, Framingham, MA, USA
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung and Blood Institute, Framingham, MA, USA
| | - Abdou Mousas
- Montreal Heart Institute, Montreal, Quebec, Canada
| | - Paul L Auer
- School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, WI, USA
| | - Guillaume Lettre
- Montreal Heart Institute, Montreal, Quebec, Canada
- Department of Medicine, Faculty of Medicine, Université de Montréal, Montreal, Quebec, Canada
| | - Weihong Tang
- School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Lihong Qi
- School of Medicine, University of California Davis, Davis, CA, USA
| | | | - Steve Buyske
- Department of Statistics and Biostatistics, Rutgers University, Piscataway, NJ, USA
| | - Myriam Fornage
- Brown Foundation Institute for Molecular Medicine, the University of Texas Health Science Center, Houston, TX, USA
| | - Lucia A Hindorff
- Division of Genomic Medicine, NIH National Human Genome Research Institute, Bethesda, MD, USA
| | - Yun Li
- Department of Biostatistics, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Danyu Lin
- Department of Biostatistics, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alexander P Reiner
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
| | - Kari E North
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Laura M Raffield
- Department of Genetics, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Christy L Avery
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Charles Kooperberg
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| |
Collapse
|
24
|
Associating Multivariate Traits with Genetic Variants Using Collapsing and Kernel Methods with Pedigree- or Population-Based Studies. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:8812282. [PMID: 33628328 PMCID: PMC7889379 DOI: 10.1155/2021/8812282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Revised: 01/02/2021] [Accepted: 01/08/2021] [Indexed: 11/18/2022]
Abstract
In genetic association analysis, several relevant phenotypes or multivariate traits with different types of components are usually collected to study complex or multifactorial diseases. Over the past few years, jointly testing for association between multivariate traits and multiple genetic variants has become more popular because it can increase statistical power to identify causal genes in pedigree- or population-based studies. However, most of the existing methods mainly focus on testing genetic variants associated with multiple continuous phenotypes. In this investigation, we develop a framework for identifying the pleiotropic effects of genetic variants on multivariate traits by using collapsing and kernel methods with pedigree- or population-structured data. The proposed framework is applicable to the burden test, the kernel test, and the omnibus test for autosomes and the X chromosome. The proposed multivariate trait association methods can accommodate continuous phenotypes or binary phenotypes and further can adjust for covariates. Simulation studies show that the performance of our methods is satisfactory with respect to the empirical type I error rates and power rates in comparison with the existing methods.
Collapse
|
25
|
Ning Z, Tsepilov YA, Sharapov SZ, Wang Z, Grishenko AK, Feng X, Shirali M, Joshi PK, Wilson JF, Pawitan Y, Haley CS, Aulchenko YS, Shen X. Nontrivial Replication of Loci Detected by Multi-Trait Methods. Front Genet 2021; 12:627989. [PMID: 33613642 PMCID: PMC7886991 DOI: 10.3389/fgene.2021.627989] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 01/04/2021] [Indexed: 11/21/2022] Open
Abstract
The ever-growing genome-wide association studies (GWAS) have revealed widespread pleiotropy. To exploit this, various methods that jointly consider associations of a genetic variant with multiple traits have been developed. Most efforts have been made concerning improving GWAS discovery power. However, how to replicate these discovered pleiotropic loci has yet to be discussed thoroughly. Unlike a single-trait scenario, multi-trait replication is not trivial considering the underlying genotype-multi-phenotype map of the associations. Here, we evaluate four methods for replicating multi-trait associations, corresponding to four levels of replication strength. Weak replication cannot justify pleiotropic genetic effects, whereas strong replication using our developed correlation methods can inform consistent pleiotropic genetic effects across the discovery and replication samples. We provide a protocol for replicating multi-trait genetic associations in practice. The described methods are implemented in the free and open-source R package MultiABEL.
Collapse
Affiliation(s)
- Zheng Ning
- Biostatistics Group, School of Life Sciences and School of Ecology, Sun Yat-sen University, Guangzhou, China.,Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Yakov A Tsepilov
- Division of Biology, Novosibirsk State University, Novosibirsk, Russia.,Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
| | | | - Zhipeng Wang
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.,College of Animal Science and Technology, Northeast Agricultural University, Harbin, China.,Bioinformatics Center, Northeast Agricultural University, Harbin, China
| | | | - Xiao Feng
- Biostatistics Group, School of Life Sciences and School of Ecology, Sun Yat-sen University, Guangzhou, China
| | - Masoud Shirali
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom
| | - Peter K Joshi
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - James F Wilson
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom.,Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Chris S Haley
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom
| | - Yurii S Aulchenko
- Kurchatov Genomics Center, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia.,PolyOmica, 's-Hertogenbosch, Netherlands
| | - Xia Shen
- Biostatistics Group, School of Life Sciences and School of Ecology, Sun Yat-sen University, Guangzhou, China.,Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.,MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom.,Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
26
|
Xue H, Wu C, Pan W. Leveraging existing GWAS summary data of genetically correlated and uncorrelated traits to improve power for a new GWAS. Genet Epidemiol 2020; 44:717-732. [PMID: 32677173 PMCID: PMC7722071 DOI: 10.1002/gepi.22333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 06/09/2020] [Accepted: 06/18/2020] [Indexed: 11/08/2022]
Abstract
In spite of the tremendous success of genome-wide association studies (GWAS) in identifying genetic variants associated with complex traits and common diseases, many more are yet to be discovered. Hence, it is always desirable to improve the statistical power of GWAS. Paralleling with the intensive efforts of integrating GWAS with functional annotations or other omic data, we propose leveraging other published GWAS summary data to boost statistical power for a new/focus GWAS; the traits of the published GWAS may or may not be genetically correlated with the target trait of the new GWAS. Building on weighted hypothesis testing with a solid theoretical foundation, we develop a novel and effective method to construct single-nucleotide polymorphism (SNP)-specific weights based on 22 published GWAS data sets with various traits, detecting sometimes dramatically increased numbers of significant SNPs and independent loci as compared to the standard/unweighted analysis. For example, by integrating a schizophrenia GWAS summary data set with 19 other GWAS summary data sets of nonschizophrenia traits, our new method identified 1,585 genome-wide significant SNPs mapping to 15 linkage disequilibrium-independent loci, largely exceeding 818 significant SNPs in 13 independent loci identified by the standard/unweighted analysis; furthermore, using a later and larger schizophrenia GWAS summary data set as the validation data, 1,423 (out of 1,585) significant SNPs identified by the weighted analysis, compared to 705 (out of 818) by the unweighted analysis, were confirmed, while all 15 and 13 independent loci were also confirmed. Similar conclusions were reached with lipids and Alzheimer's disease (AD) traits. We conclude that the proposed approach is simple and cost-effective to improve GWAS power.
Collapse
Affiliation(s)
- Haoran Xue
- School of Statistics, University of Minnesota, Minneapolis, Minnesota
| | - Chong Wu
- Department of Statistics, Florida State University, Tallahassee, Florida
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| |
Collapse
|
27
|
Wu C. Multi-trait Genome-Wide Analyses of the Brain Imaging Phenotypes in UK Biobank. Genetics 2020; 215:947-958. [PMID: 32540950 PMCID: PMC7404235 DOI: 10.1534/genetics.120.303242] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 06/09/2020] [Indexed: 01/08/2023] Open
Abstract
Many genetic variants identified in genome-wide association studies (GWAS) are associated with multiple, sometimes seemingly unrelated, traits. This motivates multi-trait association analyses, which have successfully identified novel associated loci for many complex diseases. While appealing, most existing methods focus on analyzing a relatively small number of traits, and may yield inflated Type 1 error rates when a large number of traits need to be analyzed jointly. As deep phenotyping data are becoming rapidly available, we develop a novel method, referred to as aMAT (adaptive multi-trait association test), for multi-trait analysis of any number of traits. We applied aMAT to GWAS summary statistics for a set of 58 volumetric imaging derived phenotypes from the UK Biobank. aMAT had a genomic inflation factor of 1.04, indicating the Type 1 error rate was well controlled. More important, aMAT identified 24 distinct risk loci, 13 of which were ignored by standard GWAS. In comparison, the competing methods either had a suspicious genomic inflation factor or identified much fewer risk loci. Finally, four additional sets of traits have been analyzed and provided similar conclusions.
Collapse
Affiliation(s)
- Chong Wu
- Department of Statistics, Florida State University, Tallahassee, Florida 32306
| |
Collapse
|
28
|
Bu D, Yang Q, Meng Z, Zhang S, Li Q. Truncated tests for combining evidence of summary statistics. Genet Epidemiol 2020; 44:687-701. [DOI: 10.1002/gepi.22330] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 04/24/2020] [Accepted: 06/01/2020] [Indexed: 12/15/2022]
Affiliation(s)
- Deliang Bu
- School of Mathematical Sciences University of Chinese Academy of Sciences Beijing China
- Key Laboratory of Big Data Mining and Knowledge Management Chinese Academy of Sciences Beijing China
| | - Qinglong Yang
- School of Statistics and Mathematics Zhongnan University of Economics and Law Wuhan China
| | - Zhen Meng
- LSC, NCMIS, Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing China
| | - Sanguo Zhang
- School of Mathematical Sciences University of Chinese Academy of Sciences Beijing China
- Key Laboratory of Big Data Mining and Knowledge Management Chinese Academy of Sciences Beijing China
| | - Qizhai Li
- School of Mathematical Sciences University of Chinese Academy of Sciences Beijing China
- LSC, NCMIS, Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing China
| |
Collapse
|
29
|
Guo B, Wu B. Integrate multiple traits to detect novel trait-gene association using GWAS summary data with an adaptive test approach. Bioinformatics 2020; 35:2251-2257. [PMID: 30476000 DOI: 10.1093/bioinformatics/bty961] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Revised: 10/30/2018] [Accepted: 11/22/2018] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Genetics hold great promise to precision medicine by tailoring treatment to the individual patient based on their genetic profiles. Toward this goal, many large-scale genome-wide association studies (GWAS) have been performed in the last decade to identify genetic variants associated with various traits and diseases. They have successfully identified tens of thousands of disease-related variants. However they have explained only a small proportion of the overall trait heritability for most traits and are of very limited clinical use. This is partly owing to the small effect sizes of most genetic variants, and the common practice of testing association between one trait and one genetic variant at a time in most GWAS, even when multiple related traits are often measured for each individual. Increasing evidence suggests that many genetic variants can influence multiple traits simultaneously, and we can gain more power by testing association of multiple traits simultaneously. It is appealing to develop novel multi-trait association test methods that need only GWAS summary data, since it is generally very hard to access the individual-level GWAS phenotype and genotype data. RESULTS Many existing GWAS summary data-based association test methods have relied on ad hoc approach or crude Monte Carlo approximation. In this article, we develop rigorous statistical methods for efficient and powerful multi-trait association test. We develop robust and efficient methods to accurately estimate the marginal trait correlation matrix using only GWAS summary data. We construct the principal component (PC)-based association test from the summary statistics. PC-based test has optimal power when the underlying multi-trait signal can be captured by the first PC, and otherwise it will have suboptimal performance. We develop an adaptive test by optimally weighting the PC-based test and the omnibus chi-square test to achieve robust performance under various scenarios. We develop efficient numerical algorithms to compute the analytical P-values for all the proposed tests without the need of Monte Carlo sampling. We illustrate the utility of proposed methods through application to the GWAS meta-analysis summary data for multiple lipids and glycemic traits. We identify multiple novel loci that were missed by individual trait-based association test. AVAILABILITY AND IMPLEMENTATION All the proposed methods are implemented in an R package available at http://www.github.com/baolinwu/MTAR. The developed R programs are extremely efficient: it takes less than 2 min to compute the list of genome-wide significant single nucleotide polymorphisms (SNPs) for all proposed multi-trait tests for the lipids GWAS summary data with 2.5 million SNPs on a single Linux desktop. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
30
|
Luo L, Shen J, Zhang H, Chhibber A, Mehrotra DV, Tang ZZ. Multi-trait analysis of rare-variant association summary statistics using MTAR. Nat Commun 2020; 11:2850. [PMID: 32503972 PMCID: PMC7275056 DOI: 10.1038/s41467-020-16591-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Accepted: 05/09/2020] [Indexed: 12/13/2022] Open
Abstract
Integrating association evidence across multiple traits can improve the power of gene discovery and reveal pleiotropy. Most multi-trait analysis methods focus on individual common variants in genome-wide association studies. Here, we introduce multi-trait analysis of rare-variant associations (MTAR), a framework for joint analysis of association summary statistics between multiple rare variants and different traits. MTAR achieves substantial power gain by leveraging the genome-wide genetic correlation measure to inform the degree of gene-level effect heterogeneity across traits. We apply MTAR to rare-variant summary statistics for three lipid traits in the Global Lipids Genetics Consortium. 99 genome-wide significant genes were identified in the single-trait-based tests, and MTAR increases this to 139. Among the 11 novel lipid-associated genes discovered by MTAR, 7 are replicated in an independent UK Biobank GWAS analysis. Our study demonstrates that MTAR is substantially more powerful than single-trait-based tests and highlights the value of MTAR for novel gene discovery.
Collapse
Affiliation(s)
- Lan Luo
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, 53706, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, 07065, USA
| | - Hong Zhang
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, New Jersey, 07065, USA
| | - Aparna Chhibber
- Genetics and Pharmacogenomics, Merck & Co., Inc., West Point, Pennsylvania, 19446, USA
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, Pennsylvania, 19454, USA
| | - Zheng-Zheng Tang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, 53715, USA.
- Wisconsin Institute for Discovery, Madison, Wisconsin, 53715, USA.
| |
Collapse
|
31
|
Deng Y, He T, Fang R, Li S, Cao H, Cui Y. Genome-Wide Gene-Based Multi-Trait Analysis. Front Genet 2020; 11:437. [PMID: 32508874 PMCID: PMC7248273 DOI: 10.3389/fgene.2020.00437] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Accepted: 04/08/2020] [Indexed: 11/29/2022] Open
Abstract
Genome-wide association studies focusing on a single phenotype have been broadly conducted to identify genetic variants associated with a complex disease. The commonly applied single variant analysis is limited by failing to consider the complex interactions between variants, which motivated the development of association analyses focusing on genes or gene sets. Moreover, when multiple correlated phenotypes are available, methods based on a multi-trait analysis can improve the association power. However, most currently available multi-trait analyses are single variant-based analyses; thus have limited power when disease variants function as a group in a gene or a gene set. In this work, we propose a genome-wide gene-based multi-trait analysis method by considering genes as testing units. For a given phenotype, we adopt a rapid and powerful kernel-based testing method which can evaluate the joint effect of multiple variants within a gene. The joint effect, either linear or nonlinear, is captured through kernel functions. Given a series of candidate kernel functions, we propose an omnibus test strategy to integrate the test results based on different candidate kernels. A p-value combination method is then applied to integrate dependent p-values to assess the association between a gene and multiple correlated phenotypes. Simulation studies show a reasonable type I error control and an excellent power of the proposed method compared to its counterparts. We further show the utility of the method by applying it to two data sets: the Human Liver Cohort and the Alzheimer Disease Neuroimaging Initiative data set, and novel genes are identified. Our method has broad applications in other fields in which the interest is to evaluate the joint effect (linear or nonlinear) of a set of variants.
Collapse
Affiliation(s)
- Yamin Deng
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Tao He
- Department of Mathematics, San Francisco State University, San Francisco, CA, United States
| | - Ruiling Fang
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Shaoyu Li
- Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC, United States
| | - Hongyan Cao
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, United States
| |
Collapse
|
32
|
Hodonsky CJ, Baldassari AR, Bien SA, Raffield LM, Highland HM, Sitlani CM, Wojcik GL, Tao R, Graff M, Tang W, Thyagarajan B, Buyske S, Fornage M, Hindorff LA, Li Y, Lin D, Reiner AP, North KE, Loos RJF, Kooperberg C, Avery CL. Ancestry-specific associations identified in genome-wide combined-phenotype study of red blood cell traits emphasize benefits of diversity in genomics. BMC Genomics 2020; 21:228. [PMID: 32171239 PMCID: PMC7071748 DOI: 10.1186/s12864-020-6626-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Accepted: 02/26/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Quantitative red blood cell (RBC) traits are highly polygenic clinically relevant traits, with approximately 500 reported GWAS loci. The majority of RBC trait GWAS have been performed in European- or East Asian-ancestry populations, despite evidence that rare or ancestry-specific variation contributes substantially to RBC trait heritability. Recently developed combined-phenotype methods which leverage genetic trait correlation to improve statistical power have not yet been applied to these traits. Here we leveraged correlation of seven quantitative RBC traits in performing a combined-phenotype analysis in a multi-ethnic study population. RESULTS We used the adaptive sum of powered scores (aSPU) test to assess combined-phenotype associations between ~ 21 million SNPs and seven RBC traits in a multi-ethnic population (maximum n = 67,885 participants; 24% African American, 30% Hispanic/Latino, and 43% European American; 76% female). Thirty-nine loci in our multi-ethnic population contained at least one significant association signal (p < 5E-9), with lead SNPs at nine loci significantly associated with three or more RBC traits. A majority of the lead SNPs were common (MAF > 5%) across all ancestral populations. Nineteen additional independent association signals were identified at seven known loci (HFE, KIT, HBS1L/MYB, CITED2/FILNC1, ABO, HBA1/2, and PLIN4/5). For example, the HBA1/2 locus contained 14 conditionally independent association signals, 11 of which were previously unreported and are specific to African and Amerindian ancestries. One variant in this region was common in all ancestries, but exhibited a narrower LD block in African Americans than European Americans or Hispanics/Latinos. GTEx eQTL analysis of all independent lead SNPs yielded 31 significant associations in relevant tissues, over half of which were not at the gene immediately proximal to the lead SNP. CONCLUSION This work identified seven loci containing multiple independent association signals for RBC traits using a combined-phenotype approach, which may improve discovery in genetically correlated traits. Highly complex genetic architecture at the HBA1/2 locus was only revealed by the inclusion of African Americans and Hispanics/Latinos, underscoring the continued importance of expanding large GWAS to include ancestrally diverse populations.
Collapse
Affiliation(s)
- Chani J. Hodonsky
- University of North Carolina Gillings School of Public Health, 135 Dauer Dr, Chapel Hill, NC 27599 USA
- University of Virginia Center for Public Health Genomics, 1355 Lee St, Charlottesville, VA 22908 USA
| | - Antoine R. Baldassari
- University of North Carolina Gillings School of Public Health, 135 Dauer Dr, Chapel Hill, NC 27599 USA
| | - Stephanie A. Bien
- Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA 98109 USA
| | - Laura M. Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC 27599 USA
| | - Heather M. Highland
- University of North Carolina Gillings School of Public Health, 135 Dauer Dr, Chapel Hill, NC 27599 USA
| | - Colleen M. Sitlani
- University of Washington, 1730 Minor Ave, Ste 1360, Seattle, WA 98101 USA
| | - Genevieve L. Wojcik
- Stanford University School of Medicine, 291 Campus Dr, Stanford, CA 94305 USA
| | - Ran Tao
- Vanderbilt University, 2525 West End Ave #1100, Nashville, TN 37203 USA
| | - Marielisa Graff
- University of North Carolina Gillings School of Public Health, 135 Dauer Dr, Chapel Hill, NC 27599 USA
| | - Weihong Tang
- University of Minnesota, 420 Delaware St SE, Minneapolis, MN 55455 USA
| | | | - Steve Buyske
- Rutgers University, 683 Hoes Ln W, Piscataway, NJ 08854 USA
| | - Myriam Fornage
- University of Texas Houston, 7000 Fannin Street, Houston, TX 77030 USA
| | - Lucia A. Hindorff
- National Human Genome Research Institute, 31 Center Dr, Bethesda, MD 20894 USA
| | - Yun Li
- University of North Carolina Gillings School of Public Health, 135 Dauer Dr, Chapel Hill, NC 27599 USA
| | - Danyu Lin
- University of North Carolina Gillings School of Public Health, 135 Dauer Dr, Chapel Hill, NC 27599 USA
| | - Alex P. Reiner
- Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA 98109 USA
- University of Washington, 1705 NE Pacific St, Seattle, WA 98195 USA
| | - Kari E. North
- University of North Carolina Gillings School of Public Health, 135 Dauer Dr, Chapel Hill, NC 27599 USA
- Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC 27599 USA
| | - Ruth J. F. Loos
- Icahn School of Medicine at Mount Sinai, 1468 Madison Ave, New York, NY 10029 USA
| | | | - Christy L. Avery
- University of North Carolina Gillings School of Public Health, 135 Dauer Dr, Chapel Hill, NC 27599 USA
| |
Collapse
|
33
|
Julienne H, Lechat P, Guillemot V, Lasry C, Yao C, Araud R, Laville V, Vilhjalmsson B, Ménager H, Aschard H. JASS: command line and web interface for the joint analysis of GWAS results. NAR Genom Bioinform 2020; 2:lqaa003. [PMID: 32002517 PMCID: PMC6978790 DOI: 10.1093/nargab/lqaa003] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 12/03/2019] [Accepted: 01/09/2020] [Indexed: 12/11/2022] Open
Abstract
Genome-wide association study (GWAS) has been the driving force for identifying association between genetic variants and human phenotypes. Thousands of GWAS summary statistics covering a broad range of human traits and diseases are now publicly available. These GWAS have proven their utility for a range of secondary analyses, including in particular the joint analysis of multiple phenotypes to identify new associated genetic variants. However, although several methods have been proposed, there are very few large-scale applications published so far because of challenges in implementing these methods on real data. Here, we present JASS (Joint Analysis of Summary Statistics), a polyvalent Python package that addresses this need. Our package incorporates recently developed joint tests such as the omnibus approach and various weighted sum of Z-score tests while solving all practical and computational barriers for large-scale multivariate analysis of GWAS summary statistics. This includes data cleaning and harmonization tools, an efficient algorithm for fast derivation of joint statistics, an optimized data management process and a web interface for exploration purposes. Both benchmark analyses and real data applications demonstrated the robustness and strong potential of JASS for the detection of new associated genetic variants. Our package is freely available at https://gitlab.pasteur.fr/statistical-genetics/jass.
Collapse
Affiliation(s)
- Hanna Julienne
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Pierre Lechat
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Vincent Guillemot
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Carla Lasry
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Chunzi Yao
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Robinson Araud
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Vincent Laville
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Bjarni Vilhjalmsson
- National Center for Register-Based Research, Aarhus University, DK-8210 Aarhus, Denmark
| | - Hervé Ménager
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Hugues Aschard
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, 02115 Boston, MA, USA
| |
Collapse
|
34
|
Li X, Zhang S, Sha Q. Joint analysis of multiple phenotypes using a clustering linear combination method based on hierarchical clustering. Genet Epidemiol 2020; 44:67-78. [PMID: 31541490 PMCID: PMC7480017 DOI: 10.1002/gepi.22263] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2019] [Revised: 07/19/2019] [Accepted: 08/28/2019] [Indexed: 12/24/2022]
Abstract
Emerging evidence suggests that a genetic variant can affect multiple phenotypes, especially in complex human diseases. Therefore, joint analysis of multiple phenotypes may offer new insights into disease etiology. Recently, many statistical methods have been developed for joint analysis of multiple phenotypes, including the clustering linear combination (CLC) method. Due to the unknown number of clusters for a given data, a simulation procedure must be used to evaluate the p-value of the final test statistic of CLC. This makes the CLC method computationally demanding. In this paper, we use a stopping criterion to determine the number of clusters in the CLC method. We have named our method, hierarchical clustering CLC (HCLC). HCLC has an asymptotic distribution, which is very computationally efficient and makes it applicable for genome-wide association studies. Extensive simulations together with the COPDGene data analysis have been used to assess the type I error rates and power of our proposed method. Our simulation results demonstrate that the type I error rates of HCLC are effectively controlled in different realistic settings. HCLC either outperforms all other methods or has statistical power that is very close to the most powerful method with which it has been compared.
Collapse
Affiliation(s)
- Xueling Li
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
35
|
Wolf JM, Barnard M, Xia X, Ryder N, Westra J, Tintle N. Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020; 25:719-730. [PMID: 31797641 PMCID: PMC6907735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The popularization of biobanks provides an unprecedented amount of genetic and phenotypic information that can be used to research the relationship between genetics and human health. Despite the opportunities these datasets provide, they also pose many problems associated with computational time and costs, data size and transfer, and privacy and security. The publishing of summary statistics from these biobanks, and the use of them in a variety of downstream statistical analyses, alleviates many of these logistical problems. However, major questions remain about how to use summary statistics in all but the simplest downstream applications. Here, we present a novel approach to utilize basic summary statistics (estimates from single marker regressions on single phenotypes) to evaluate more complex phenotypes using multivariate methods. In particular, we present a covariate-adjusted method for conducting principal component analysis (PCA) utilizing only biobank summary statistics. We validate exact formulas for this method, as well as provide a framework of estimation when specific summary statistics are not available, through simulation. We apply our method to a real data set of fatty acid and genomic data.
Collapse
Affiliation(s)
- Jack M Wolf
- Department of Mathematics, Statistics, and Computer Science, St. Olaf College, Northfield, MN 55057, USA,
| | | | | | | | | | | |
Collapse
|
36
|
Effect of non-normality and low count variants on cross-phenotype association tests in GWAS. Eur J Hum Genet 2019; 28:300-312. [PMID: 31582815 DOI: 10.1038/s41431-019-0514-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Revised: 09/01/2019] [Accepted: 09/05/2019] [Indexed: 01/21/2023] Open
Abstract
Many complex human diseases, such as type 2 diabetes, are characterized by multiple underlying traits/phenotypes that have substantially shared genetic architecture. Multivariate analysis of correlated traits has the potential to increase the power of detecting underlying common genetic loci. Several cross-phenotype association methods have been proposed-some require individual-level data on traits and genotypes, while the others require only summary-level data. In this article, we explore whether non-normality of multivariate trait distribution affects the inference from some of the existing multi-trait methods and how that effect is dependent on the allele count of the genetic variant being tested. We find that most of these tests are susceptible to biases that lead to spurious association signals. Even after controlling for confounders that may contribute to non-normality and then applying inverse normal transformation on the residuals of each trait, these tests may have inflated type I errors for variants with low minor allele counts (MACs). A likelihood ratio test of association based on the ordinal regression of individual-level genotype conditional on the traits seems to be the least biased and can maintain type I error when the MAC is reasonably large (e.g., MAC > 30). Application of these methods to publicly available summary statistics of eight amino acid traits on European samples seem to exhibit systematic inflation (especially for variants with low MAC), which is consistent with our findings from simulation experiments.
Collapse
|
37
|
Zhang J, Sha Q, Liu G, Wang X. A gene based approach to test genetic association based on an optimally weighted combination of multiple traits. PLoS One 2019; 14:e0220914. [PMID: 31398229 PMCID: PMC6688794 DOI: 10.1371/journal.pone.0220914] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Accepted: 07/25/2019] [Indexed: 01/11/2023] Open
Abstract
There is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases for which multiple correlated traits are often measured. Joint analysis of multiple traits could increase statistical power by aggregating multiple weak effects. Existing methods for multiple trait association tests usually study each of the multiple traits separately and then combine the univariate test statistics or combine p-values of the univariate tests for identifying disease associated genetic variants. However, ignoring correlation between phenotypes may cause power loss. Additionally, the genetic variants in one gene (including common and rare variants) are often viewed as a whole that affects the underlying disease since the basic functional unit of inheritance is a gene rather than a genetic variant. Thus, results from gene level association tests can be more readily integrated with downstream functional and pathogenic investigation, whereas many existing methods for multiple trait association tests only focus on testing a single common variant rather than a gene. In this article, we propose a statistical method by Testing an Optimally Weighted Combination of Multiple traits (TOW-CM) to test the association between multiple traits and multiple variants in a genomic region (a gene or pathway). We investigate the performance of the proposed method through extensive simulation studies. Our simulation studies show that the proposed method has correct type I error rates and is either the most powerful test or comparable with the most powerful tests. Additionally, we illustrate the usefulness of TOW-CM based on a COPDGene study.
Collapse
Affiliation(s)
- Jianjun Zhang
- Department of Mathematics, University of North Texas, Denton, TX, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States of America
| | - Guanfu Liu
- School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai, China
| | - Xuexia Wang
- Department of Mathematics, University of North Texas, Denton, TX, United States of America
| |
Collapse
|
38
|
Masotti M, Guo B, Wu B. Pleiotropy informed adaptive association test of multiple traits using genome-wide association study summary data. Biometrics 2019; 75:1076-1085. [PMID: 31021400 DOI: 10.1111/biom.13076] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Accepted: 04/16/2019] [Indexed: 12/17/2022]
Abstract
Genetic variants associated with disease outcomes can be used to develop personalized treatment. To reach this precision medicine goal, hundreds of large-scale genome-wide association studies (GWAS) have been conducted in the past decade to search for promising genetic variants associated with various traits. They have successfully identified tens of thousands of disease-related variants. However, in total these identified variants explain only part of the variation for most complex traits. There remain many genetic variants with small effect sizes to be discovered, which calls for the development of (a) GWAS with more samples and more comprehensively genotyped variants, for example, the NHLBI Trans-Omics for Precision Medicine (TOPMed) Program is planning to conduct whole genome sequencing on over 100 000 individuals; and (b) novel and more powerful statistical analysis methods. The current dominating GWAS analysis approach is the "single trait" association test, despite the fact that many GWAS are conducted in deeply phenotyped cohorts including many correlated and well-characterized outcomes, which can help improve the power to detect novel variants if properly analyzed, as suggested by increasing evidence that pleiotropy, where a genetic variant affects multiple traits, is the norm in genome-phenome associations. We aim to develop pleiotropy informed powerful association test methods across multiple traits for GWAS. Since it is generally very hard to access individual-level GWAS phenotype and genotype data for those existing GWAS, due to privacy concerns and various logistical considerations, we develop rigorous statistical methods for pleiotropy informed adaptive multitrait association test methods that need only summary association statistics publicly available from most GWAS. We first develop a pleiotropy test, which has powerful performance for truly pleiotropic variants but is sensitive to the pleiotropy assumption. We then develop a pleiotropy informed adaptive test that has robust and powerful performance under various genetic models. We develop accurate and efficient numerical algorithms to compute the analytical P-value for the proposed adaptive test without the need of resampling or permutation. We illustrate the performance of proposed methods through application to joint association test of GWAS meta-analysis summary data for several glycemic traits. Our proposed adaptive test identified several novel loci missed by individual trait based GWAS meta-analysis. All the proposed methods are implemented in a publicly available R package.
Collapse
Affiliation(s)
- Maria Masotti
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| | - Bin Guo
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| | - Baolin Wu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| |
Collapse
|
39
|
Sha Q, Wang Z, Zhang X, Zhang S. A clustering linear combination approach to jointly analyze multiple phenotypes for GWAS. Bioinformatics 2019; 35:1373-1379. [PMID: 30239574 PMCID: PMC6477981 DOI: 10.1093/bioinformatics/bty810] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2017] [Revised: 08/29/2018] [Accepted: 09/18/2018] [Indexed: 12/16/2022] Open
Abstract
SUMMARY There is an increasing interest in joint analysis of multiple phenotypes for genome-wide association studies (GWASs) based on the following reasons. First, cohorts usually collect multiple phenotypes and complex diseases are usually measured by multiple correlated intermediate phenotypes. Second, jointly analyzing multiple phenotypes may increase statistical power for detecting genetic variants associated with complex diseases. Third, there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. In this paper, we develop a clustering linear combination (CLC) method to jointly analyze multiple phenotypes for GWASs. In the CLC method, we first cluster individual statistics into positively correlated clusters and then, combine the individual statistics linearly within each cluster and combine the between-cluster terms in a quadratic form. CLC is not only robust to different signs of the means of individual statistics, but also reduce the degrees of freedom of the test statistic. We also theoretically prove that if we can cluster the individual statistics correctly, CLC is the most powerful test among all tests with certain quadratic forms. Our simulation results show that CLC is either the most powerful test or has similar power to the most powerful test among the tests we compared, and CLC is much more powerful than other tests when effect sizes align with inferred clusters. We also evaluate the performance of CLC through a real case study. AVAILABILITY AND IMPLEMENTATION R code for implementing our method is available at http://www.math.mtu.edu/∼shuzhang/software.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| | - Zhenchuan Wang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| | - Xiao Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| |
Collapse
|
40
|
Zhou H, Sinsheimer JS, Bates DM, Chu BB, German CA, Ji SS, Keys KL, Kim J, Ko S, Mosher GD, Papp JC, Sobel EM, Zhai J, Zhou JJ, Lange K. OPENMENDEL: a cooperative programming project for statistical genetics. Hum Genet 2019; 139:61-71. [PMID: 30915546 DOI: 10.1007/s00439-019-02001-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Accepted: 03/15/2019] [Indexed: 01/06/2023]
Abstract
Statistical methods for genome-wide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, a collaborative effort of statistical geneticists is required to develop open source software targeted to genetic epidemiology. Our attempt to meet this need is called the OPENMENDEL project (https://openmendel.github.io). It aims to (1) enable interactive and reproducible analyses with informative intermediate results, (2) scale to big data analytics, (3) embrace parallel and distributed computing, (4) adapt to rapid hardware evolution, (5) allow cloud computing, (6) allow integration of varied genetic data types, and (7) foster easy communication between clinicians, geneticists, statisticians, and computer scientists. This article reviews and makes recommendations to the genetic epidemiology community in the context of the OPENMENDEL project.
Collapse
Affiliation(s)
- Hua Zhou
- Department of Biostatistics, UCLA Fielding School of Public Health, Los Angeles, USA.
| | - Janet S Sinsheimer
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, USA.
| | - Douglas M Bates
- Department of Statistics, University of Wisconsin, Madison, USA
| | - Benjamin B Chu
- Department of Biomathematics, David Geffen School of Medicine at UCLA, Los Angeles, USA
| | - Christopher A German
- Department of Biostatistics, UCLA Fielding School of Public Health, Los Angeles, USA
| | - Sarah S Ji
- Department of Biostatistics, UCLA Fielding School of Public Health, Los Angeles, USA
| | - Kevin L Keys
- Department of Medicine, University of California, San Francisco, USA
| | - Juhyun Kim
- Department of Biostatistics, UCLA Fielding School of Public Health, Los Angeles, USA
| | - Seyoon Ko
- Department of Statistics, Seoul National University, Seoul, South Korea
| | - Gordon D Mosher
- Departments of Statistics and Computer Science, University of California, Riverside, USA
| | - Jeanette C Papp
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, USA
| | - Eric M Sobel
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, USA
| | - Jing Zhai
- Department of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, USA
| | - Jin J Zhou
- Department of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, USA
| | - Kenneth Lange
- Department of Biomathematics, David Geffen School of Medicine at UCLA, Los Angeles, USA.
| |
Collapse
|
41
|
Galván-Femenía I, Obón-Santacana M, Piñeyro D, Guindo-Martinez M, Duran X, Carreras A, Pluvinet R, Velasco J, Ramos L, Aussó S, Mercader JM, Puig L, Perucho M, Torrents D, Moreno V, Sumoy L, de Cid R. Multitrait genome association analysis identifies new susceptibility genes for human anthropometric variation in the GCAT cohort. J Med Genet 2018; 55:765-778. [PMID: 30166351 PMCID: PMC6252362 DOI: 10.1136/jmedgenet-2018-105437] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Revised: 07/19/2018] [Accepted: 07/21/2018] [Indexed: 12/22/2022]
Abstract
Background Heritability estimates have revealed an important contribution of SNP variants for most common traits; however, SNP analysis by single-trait genome-wide association studies (GWAS) has failed to uncover their impact. In this study, we applied a multitrait GWAS approach to discover additional factor of the missing heritability of human anthropometric variation. Methods We analysed 205 traits, including diseases identified at baseline in the GCAT cohort (Genomes For Life- Cohort study of the Genomes of Catalonia) (n=4988), a Mediterranean adult population-based cohort study from the south of Europe. We estimated SNP heritability contribution and single-trait GWAS for all traits from 15 million SNP variants. Then, we applied a multitrait-related approach to study genome-wide association to anthropometric measures in a two-stage meta-analysis with the UK Biobank cohort (n=336 107). Results Heritability estimates (eg, skin colour, alcohol consumption, smoking habit, body mass index, educational level or height) revealed an important contribution of SNP variants, ranging from 18% to 77%. Single-trait analysis identified 1785 SNPs with genome-wide significance threshold. From these, several previously reported single-trait hits were confirmed in our sample with LINC01432 (p=1.9×10−9) variants associated with male baldness, LDLR variants with hyperlipidaemia (ICD-9:272) (p=9.4×10−10) and variants in IRF4 (p=2.8×10−57), SLC45A2 (p=2.2×10−130), HERC2 (p=2.8×10−176), OCA2 (p=2.4×10−121) and MC1R (p=7.7×10−22) associated with hair, eye and skin colour, freckling, tanning capacity and sun burning sensitivity and the Fitzpatrick phototype score, all highly correlated cross-phenotypes. Multitrait meta-analysis of anthropometric variation validated 27 loci in a two-stage meta-analysis with a large British ancestry cohort, six of which are newly reported here (p value threshold <5×10−9) at ZRANB2-AS2, PIK3R1, EPHA7, MAD1L1, CACUL1 and MAP3K9. Conclusion Considering multiple-related genetic phenotypes improve associated genome signal detection. These results indicate the potential value of data-driven multivariate phenotyping for genetic studies in large population-based cohorts to contribute to knowledge of complex traits.
Collapse
Affiliation(s)
- Iván Galván-Femenía
- GenomesForLife-GCAT Lab Group, Program of Predictive and Personalized Medicine of Cancer (PMPPC), Germans Trias i Pujol Research Institute (IGTP), Crta. de Can Ruti, Badalona, Catalunya, Spain
| | - Mireia Obón-Santacana
- GenomesForLife-GCAT Lab Group, Program of Predictive and Personalized Medicine of Cancer (PMPPC), Germans Trias i Pujol Research Institute (IGTP), Crta. de Can Ruti, Badalona, Catalunya, Spain.,Unit of Biomarkers and Susceptibility, Cancer Prevention and Control Program, Catalan Institute of Oncology (ICO), IDIBELL and CIBERESP, Barcelona, Spain
| | - David Piñeyro
- High Content Genomics and Bioinformatics Unit, Program of Predictive and Personalized Medicine of Cancer (PMPPC), Germans Trias i Pujol Research Institute (IGTP), Badalona, Catalunya, Spain
| | - Marta Guindo-Martinez
- Life Sciences - Computational Genomics, Barcelona Supercomputing Center (BSC-CNS), Joint BSC-CRG-IRB Research Program in Computational Biology, Barcelona, Spain
| | - Xavier Duran
- GenomesForLife-GCAT Lab Group, Program of Predictive and Personalized Medicine of Cancer (PMPPC), Germans Trias i Pujol Research Institute (IGTP), Crta. de Can Ruti, Badalona, Catalunya, Spain
| | - Anna Carreras
- GenomesForLife-GCAT Lab Group, Program of Predictive and Personalized Medicine of Cancer (PMPPC), Germans Trias i Pujol Research Institute (IGTP), Crta. de Can Ruti, Badalona, Catalunya, Spain
| | - Raquel Pluvinet
- High Content Genomics and Bioinformatics Unit, Program of Predictive and Personalized Medicine of Cancer (PMPPC), Germans Trias i Pujol Research Institute (IGTP), Badalona, Catalunya, Spain
| | - Juan Velasco
- GenomesForLife-GCAT Lab Group, Program of Predictive and Personalized Medicine of Cancer (PMPPC), Germans Trias i Pujol Research Institute (IGTP), Crta. de Can Ruti, Badalona, Catalunya, Spain
| | - Laia Ramos
- High Content Genomics and Bioinformatics Unit, Program of Predictive and Personalized Medicine of Cancer (PMPPC), Germans Trias i Pujol Research Institute (IGTP), Badalona, Catalunya, Spain
| | - Susanna Aussó
- High Content Genomics and Bioinformatics Unit, Program of Predictive and Personalized Medicine of Cancer (PMPPC), Germans Trias i Pujol Research Institute (IGTP), Badalona, Catalunya, Spain
| | - J M Mercader
- Programs in Metabolism and Medical & Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, US.,Diabetes Unit and Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, US
| | - Lluis Puig
- Blood Division, Banc de Sang i Teixits, Barcelona, Spain
| | - Manuel Perucho
- Cancer Genetics and Epigenetics Group, Program of Predictive and Personalized Medicine of Cancer (PMPPC), Germans Trias i Pujol Research Institute (IGTP), Badalona, Catalunya, Spain
| | - David Torrents
- Life Sciences - Computational Genomics, Barcelona Supercomputing Center (BSC-CNS), Joint BSC-CRG-IRB Research Program in Computational Biology, Barcelona, Spain.,ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, Catalunya, Spain
| | - Victor Moreno
- Unit of Biomarkers and Susceptibility, Cancer Prevention and Control Program, Catalan Institute of Oncology (ICO), IDIBELL and CIBERESP, Barcelona, Spain.,Department of Clinical Sciences, Faculty of Medicine, University of Barcelona, Barcelona, Spain
| | - Lauro Sumoy
- High Content Genomics and Bioinformatics Unit, Program of Predictive and Personalized Medicine of Cancer (PMPPC), Germans Trias i Pujol Research Institute (IGTP), Badalona, Catalunya, Spain
| | - Rafael de Cid
- GenomesForLife-GCAT Lab Group, Program of Predictive and Personalized Medicine of Cancer (PMPPC), Germans Trias i Pujol Research Institute (IGTP), Crta. de Can Ruti, Badalona, Catalunya, Spain
| |
Collapse
|
42
|
Wang Z, Sha Q, Fang S, Zhang K, Zhang S. Testing an optimally weighted combination of common and/or rare variants with multiple traits. PLoS One 2018; 13:e0201186. [PMID: 30048520 PMCID: PMC6062080 DOI: 10.1371/journal.pone.0201186] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2018] [Accepted: 07/10/2018] [Indexed: 12/25/2022] Open
Abstract
Recently, joint analysis of multiple traits has become popular because it can increase statistical power to identify genetic variants associated with complex diseases. In addition, there is increasing evidence indicating that pleiotropy is a widespread phenomenon in complex diseases. Currently, most of existing methods test the association between multiple traits and a single genetic variant. However, these methods by analyzing one variant at a time may not be ideal for rare variant association studies because of the allelic heterogeneity as well as the extreme rarity of rare variants. In this article, we developed a statistical method by testing an optimally weighted combination of variants with multiple traits (TOWmuT) to test the association between multiple traits and a weighted combination of variants (rare and/or common) in a genomic region. TOWmuT is robust to the directions of effects of causal variants and is applicable to different types of traits. Using extensive simulation studies, we compared the performance of TOWmuT with the following five existing methods: gene association with multiple traits (GAMuT), multiple sequence kernel association test (MSKAT), adaptive weighting reverse regression (AWRR), single-TOW, and MANOVA. Our results showed that, in all of the simulation scenarios, TOWmuT has correct type I error rates and is consistently more powerful than the other five tests. We also illustrated the usefulness of TOWmuT by analyzing a whole-genome genotyping data from a lung function study.
Collapse
Affiliation(s)
- Zhenchuan Wang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shurong Fang
- Department of Mathematics and Computer Science, John Carroll University, University Heights, Ohio, United States of America
| | - Kui Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
43
|
Abstract
OBJECTIVES Glucocorticoids such as dexamethasone have pleiotropic effects, including desired antileukemic, anti-inflammatory, or immunosuppressive effects, and undesired metabolic or toxic effects. The most serious adverse effects of dexamethasone among patients with acute lymphoblastic leukemia are osteonecrosis and thrombosis. To identify inherited genomic variation involved in these severe adverse effects, we carried out genome-wide association studies (GWAS) by analyzing 14 pleiotropic glucocorticoid phenotypes in 391 patients with acute lymphoblastic leukemia. PATIENTS AND METHODS We used the Projection Onto the Most Interesting Statistical Evidence integrative analysis technique to identify genetic variants associated with pleiotropic dexamethasone phenotypes, stratifying for age, sex, race, and treatment, and compared the results with conventional single-phenotype GWAS. The phenotypes were osteonecrosis, central nervous system toxicity, hyperglycemia, hypokalemia, thrombosis, dexamethasone exposure, BMI, growth trajectory, and levels of cortisol, albumin, and asparaginase antibodies, and changes in cholesterol, triglycerides, and low-density lipoproteins after dexamethasone. RESULTS The integrative analysis identified more pleiotropic single nucleotide polymorphism variants (P=1.46×10(-215), and these variants were more likely to be in gene-regulatory regions (P=1.22×10(-6)) than traditional single-phenotype GWAS. The integrative analysis yielded genomic variants (rs2243057 and rs6453253) in F2RL1, a receptor that functions in hemostasis, thrombosis, and inflammation, which were associated with pleiotropic effects, including osteonecrosis and thrombosis, and were in regulatory gene regions. CONCLUSION The integrative pleiotropic analysis identified risk variants for osteonecrosis and thrombosis not identified by single-phenotype analysis that may have importance for patients with underlying sensitivity to multiple dexamethasone adverse effects.
Collapse
|
44
|
Jo Hodonsky C, Schurmann C, Schick UM, Kocarnik J, Tao R, van Rooij FJ, Wassel C, Buyske S, Fornage M, Hindorff LA, Floyd JS, Ganesh SK, Lin DY, North KE, Reiner AP, Loos RJ, Kooperberg C, Avery CL. Generalization and fine mapping of red blood cell trait genetic associations to multi-ethnic populations: The PAGE Study. Am J Hematol 2018; 93:10.1002/ajh.25161. [PMID: 29905378 PMCID: PMC6300146 DOI: 10.1002/ajh.25161] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 05/29/2018] [Accepted: 05/29/2018] [Indexed: 12/17/2022]
Abstract
Red blood cell (RBC) traits provide insight into a wide range of physiological states and exhibit moderate to high heritability, making them excellent candidates for genetic studies to inform underlying biologic mechanisms. Previous RBC trait genome-wide association studies were performed primarily in European- or Asian-ancestry populations, missing opportunities to inform understanding of RBC genetic architecture in diverse populations and reduce intervals surrounding putative functional SNPs through fine-mapping. Here, we report the first fine-mapping of six correlated (Pearson's r range: |0.04 - 0.92|) RBC traits in up to 19,036 African Americans and 19,562 Hispanic/Latinos participants of the Population Architecture using Genomics and Epidemiology (PAGE) consortium. Trans-ethnic meta-analysis of race/ethnic- and study-specific estimates for approximately 11,000 SNPs flanking 13 previously identified association signals as well as 150,000 additional array-wide SNPs was performed using inverse-variance meta-analysis after adjusting for study and clinical covariates. Approximately half of previously reported index SNP-RBC trait associations generalized to the trans-ethnic study population (p<1.7x10-4 ); previously unreported independent association signals within the ABO region reinforce the potential for multiple functional variants affecting the same locus. Trans-ethnic fine-mapping did not reveal additional signals at the HFE locus independent of the known functional variants. Finally, we identified a potential novel association in the Hispanic/Latino study population at the HECTD4/RPL6 locus for RBC count (p=1.9x10-7 ). The identification of a previously unknown association, generalization of a large proportion of known association signals, and refinement of known association signals all exemplify the benefits of genetic studies in diverse populations. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Chani Jo Hodonsky
- Department of Epidemiology, University of North Carolina Gillings School of Public Health, Chapel Hill, NC
| | - Claudia Schurmann
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
- The Genetics of Obesity and Related Metabolic Traits Program, The Icahn School of Medicine at Mount Sinai, New York, NY
| | - Ursula M Schick
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
- The Genetics of Obesity and Related Metabolic Traits Program, The Icahn School of Medicine at Mount Sinai, New York, NY
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Jonathan Kocarnik
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Ran Tao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN
| | - Frank Ja van Rooij
- Department of Epidemiology, Erasmus University Medical Center, Rotterdam, 3000, the Netherlands
| | - Christina Wassel
- Department of Pathology and Laboratory Medicine, College of Medicine, University of Vermont, Burlington, VT
| | - Steve Buyske
- Department of Statistics and Biostatistics, Hill Center, Rutgers, The State University of New Jersey, 110 Frelinghuysen Rd. Piscataway, NY
| | - Myriam Fornage
- Institute of Molecular Medicine and Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX
| | - Lucia A Hindorff
- Division of Genomic Medicine, National Human Genome Research Institute, National institutes of Health, Bethesda, MD
| | - James S Floyd
- Departments of Medicine, University of Washington, Seattle, WA
- Department of Epidemiology, University of Washington, Seattle, WA
| | - Santhi K Ganesh
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI
| | - Dan-Yu Lin
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC
| | - Kari E North
- Department of Epidemiology, University of North Carolina Gillings School of Public Health, Chapel Hill, NC
| | - Alex P Reiner
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA
- Department of Epidemiology, University of Washington, Seattle, WA
| | - Ruth Jf Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
- The Genetics of Obesity and Related Metabolic Traits Program, The Icahn School of Medicine at Mount Sinai, New York, NY
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Christy L Avery
- Department of Epidemiology, University of North Carolina Gillings School of Public Health, Chapel Hill, NC
- Carolina Population Center, University of North Carolina, Chapel Hill, NC
| |
Collapse
|
45
|
Napier MD, Franceschini N, Gondalia R, Stewart JD, Méndez-Giráldez R, Sitlani CM, Seyerle AA, Highland HM, Li Y, Wilhelmsen KC, Yan S, Duan Q, Roach J, Yao J, Guo X, Taylor KD, Heckbert SR, Rotter JI, North KE, Reiner AP, Zhang ZM, Tinker LF, Liao D, Laurie CC, Gogarten SM, Lin HJ, Brody JA, Bartz TM, Psaty BM, Sotoodehnia N, Soliman EZ, Avery CL, Whitsel EA. Genome-wide association study and meta-analysis identify loci associated with ventricular and supraventricular ectopy. Sci Rep 2018; 8:5675. [PMID: 29618737 PMCID: PMC5884864 DOI: 10.1038/s41598-018-23843-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 03/09/2018] [Indexed: 01/03/2023] Open
Abstract
The genetic basis of supraventricular and ventricular ectopy (SVE, VE) remains largely uncharacterized, despite established genetic mechanisms of arrhythmogenesis. To identify novel genetic variants associated with SVE/VE in ancestrally diverse human populations, we conducted a genome-wide association study of electrocardiographically identified SVE and VE in five cohorts including approximately 43,000 participants of African, European and Hispanic/Latino ancestry. In thirteen ancestry-stratified subgroups, we tested multivariable-adjusted associations of SVE and VE with single nucleotide polymorphism (SNP) dosage. We combined subgroup-specific association estimates in inverse variance-weighted, fixed-effects and Bayesian meta-analyses. We also combined fixed-effects meta-analytic t-test statistics for SVE and VE in multi-trait SNP association analyses. No loci reached genome-wide significance in trans-ethnic meta-analyses. However, we found genome-wide significant SNPs intronic to an apoptosis-enhancing gene previously associated with QRS interval duration (FAF1; lead SNP rs7545860; effect allele frequency = 0.02; P = 2.0 × 10−8) in multi-trait analysis among European ancestry participants and near a locus encoding calcium-dependent glycoproteins (DSC3; lead SNP rs8086068; effect allele frequency = 0.17) in meta-analysis of SVE (P = 4.0 × 10−8) and multi-trait analysis (P = 2.9 × 10−9) among African ancestry participants. The novel findings suggest several mechanisms by which genetic variation may predispose to ectopy in humans and highlight the potential value of leveraging pleiotropy in future studies of ectopy-related phenotypes.
Collapse
Affiliation(s)
- Melanie D Napier
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina, Chapel Hill, NC, USA.
| | - Nora Franceschini
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Rahul Gondalia
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - James D Stewart
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina, Chapel Hill, NC, USA.,Carolina Population Center, University of North Carolina, Chapel Hill, NC, USA
| | - Raúl Méndez-Giráldez
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Colleen M Sitlani
- Division of Cardiology, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Amanda A Seyerle
- Division of Epidemiology and Community Health, University of Minnesota, Minneapolis, MN, USA
| | - Heather M Highland
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Yun Li
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA.,Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA.,Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA
| | - Kirk C Wilhelmsen
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA.,Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC, USA
| | - Song Yan
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA.,Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
| | - Qing Duan
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Jeffrey Roach
- Research Computing Center, University of North Carolina, Chapel Hill, NC, USA
| | - Jie Yao
- Institute for Translational Genomics and Population Sciences and Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, California, USA.,Division of Genomic Outcomes, Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, California, USA
| | - Xiuqing Guo
- Institute for Translational Genomics and Population Sciences and Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, California, USA.,Division of Genomic Outcomes, Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, California, USA
| | - Kent D Taylor
- Institute for Translational Genomics and Population Sciences and Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, California, USA.,Division of Genomic Outcomes, Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, California, USA
| | - Susan R Heckbert
- Cardiovascular Health Research Unit and the Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - Jerome I Rotter
- Institute for Translational Genomics and Population Sciences and Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, California, USA.,Division of Genomic Outcomes, Departments of Pediatrics and Medicine, Harbor-UCLA Medical Center, Torrance, California, USA
| | - Kari E North
- Carolina Population Center, University of North Carolina, Chapel Hill, NC, USA.,Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, NC, USA
| | - Alexander P Reiner
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - Zhu-Ming Zhang
- Epidemiological Cardiology Research Center, Department of Epidemiology and Prevention, Wake Forest University, Winston-Salem, NC, USA
| | - Lesley F Tinker
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Duanping Liao
- Department of Public Health Sciences, Penn State University College of Medicine, Hershey, PA, USA
| | - Cathy C Laurie
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | | | - Henry J Lin
- Institute for Translational Genomics and Population Sciences and Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, California, USA.,Division of Medical Genetics, Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, California, USA
| | - Jennifer A Brody
- Division of Cardiology, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Traci M Bartz
- Cardiovascular Health Research Unit, Departments of Biostatistics and Medicine, University of Washington, Seattle, WA, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Services, University of Washington, Seattle, WA, USA.,Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Nona Sotoodehnia
- Division of Cardiology, Department of Medicine, University of Washington, Seattle, WA, USA.,Cardiovascular Health Research Unit and the Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - Elsayed Z Soliman
- Department of Epidemiology, School of Medicine, Wake Forest University, Winston-Salem, NC, USA
| | - Christy L Avery
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina, Chapel Hill, NC, USA.,Carolina Population Center, University of North Carolina, Chapel Hill, NC, USA
| | - Eric A Whitsel
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina, Chapel Hill, NC, USA. .,Department of Medicine, University of North Carolina, Chapel Hill, NC, USA.
| |
Collapse
|
46
|
Seyerle AA, Sitlani CM, Noordam R, Gogarten SM, Li J, Li X, Evans DS, Sun F, Laaksonen MA, Isaacs A, Kristiansson K, Highland HM, Stewart JD, Harris TB, Trompet S, Bis JC, Peloso GM, Brody JA, Broer L, Busch EL, Duan Q, Stilp AM, O'Donnell CJ, Macfarlane PW, Floyd JS, Kors JA, Lin HJ, Li-Gao R, Sofer T, Méndez-Giráldez R, Cummings SR, Heckbert SR, Hofman A, Ford I, Li Y, Launer LJ, Porthan K, Newton-Cheh C, Napier MD, Kerr KF, Reiner AP, Rice KM, Roach J, Buckley BM, Soliman EZ, de Mutsert R, Sotoodehnia N, Uitterlinden AG, North KE, Lee CR, Gudnason V, Stürmer T, Rosendaal FR, Taylor KD, Wiggins KL, Wilson JG, Chen YD, Kaplan RC, Wilhelmsen K, Cupples LA, Salomaa V, van Duijn C, Jukema JW, Liu Y, Mook-Kanamori DO, Lange LA, Vasan RS, Smith AV, Stricker BH, Laurie CC, Rotter JI, Whitsel EA, Psaty BM, Avery CL. Pharmacogenomics study of thiazide diuretics and QT interval in multi-ethnic populations: the cohorts for heart and aging research in genomic epidemiology. THE PHARMACOGENOMICS JOURNAL 2018; 18:215-226. [PMID: 28719597 PMCID: PMC5773415 DOI: 10.1038/tpj.2017.10] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Revised: 01/14/2017] [Accepted: 03/09/2017] [Indexed: 12/23/2022]
Abstract
Thiazide diuretics, commonly used antihypertensives, may cause QT interval (QT) prolongation, a risk factor for highly fatal and difficult to predict ventricular arrhythmias. We examined whether common single-nucleotide polymorphisms (SNPs) modified the association between thiazide use and QT or its component parts (QRS interval, JT interval) by performing ancestry-specific, trans-ethnic and cross-phenotype genome-wide analyses of European (66%), African American (15%) and Hispanic (19%) populations (N=78 199), leveraging longitudinal data, incorporating corrected standard errors to account for underestimation of interaction estimate variances and evaluating evidence for pathway enrichment. Although no loci achieved genome-wide significance (P<5 × 10-8), we found suggestive evidence (P<5 × 10-6) for SNPs modifying the thiazide-QT association at 22 loci, including ion transport loci (for example, NELL1, KCNQ3). The biologic plausibility of our suggestive results and simulations demonstrating modest power to detect interaction effects at genome-wide significant levels indicate that larger studies and innovative statistical methods are warranted in future efforts evaluating thiazide-SNP interactions.
Collapse
Affiliation(s)
- A A Seyerle
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
- Division of Epidemiology and Community Health, University of Minnesota, Minneapolis, MN, USA
| | - C M Sitlani
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - R Noordam
- Department of Epidemiology, Erasmus MC-University Medical Center Rotterdam, Rotterdam, The Netherlands
- Department of Gerontology and Geriatrics, Leiden University Medical Center, Leiden, The Netherlands
| | - S M Gogarten
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - J Li
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Palo Alto, CA, USA
| | - X Li
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - D S Evans
- California Pacific Medical Center Research Institute, San Francisco, CA, USA
| | - F Sun
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - M A Laaksonen
- Department of Health, THL-National Institute for Health and Welfare, Helsinki, Finland
| | - A Isaacs
- Department of Epidemiology, Erasmus MC-University Medical Center Rotterdam, Rotterdam, The Netherlands
- CARIM School of Cardiovascular Diseases, Maastricht Centre for Systems Biology (MaCSBio), and Department of Biochemistry, Maastricht University, Maastricht, The Netherlands
| | - K Kristiansson
- Department of Health, THL-National Institute for Health and Welfare, Helsinki, Finland
| | - H M Highland
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - J D Stewart
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
- Carolina Population Center, University of North Carolina, Chapel Hill, NC, USA
| | - T B Harris
- Laboratory of Epidemiology, Demography, and Biometry, National Institute on Aging, Bethesda, MD, USA
| | - S Trompet
- Department of Gerontology and Geriatrics, Leiden University Medical Center, Leiden, The Netherlands
- Department of Cardiology, Leiden University Medical Center, Leiden, The Netherlands
| | - J C Bis
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - G M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - J A Brody
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - L Broer
- Department of Internal Medicine, Erasmus MC-University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - E L Busch
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Q Duan
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - A M Stilp
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - C J O'Donnell
- Department of Medicine, Harvard University, Boston, MA, USA
- National Heart, Lung, and Blood Institute Framingham Heart Study, Framingham, MA, USA
- Cardiology Section, Boston Veterans Administration Healthcare, Boston, MA, USA
| | - P W Macfarlane
- Institute of Health and Wellbeing, University of Glasgow, Glasgow, UK
| | - J S Floyd
- Department of Medicine, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - J A Kors
- Department of Medical Informatics, Erasmus MC-University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - H J Lin
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
- Division of Medical Genetics, Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, CA, USA
| | - R Li-Gao
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - T Sofer
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - R Méndez-Giráldez
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - S R Cummings
- California Pacific Medical Center Research Institute, San Francisco, CA, USA
| | - S R Heckbert
- Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - A Hofman
- Department of Epidemiology, Erasmus MC-University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - I Ford
- Robertson Center for Biostatistics, University of Glasgow, Glasgow, UK
| | - Y Li
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
- Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA
| | - L J Launer
- Laboratory of Epidemiology, Demography, and Biometry, National Institute on Aging, Bethesda, MD, USA
| | - K Porthan
- Division of Cardiology, Heart and Lung Center, Helsinki University Central Hospital, Helsinki, Finland
| | - C Newton-Cheh
- Institute of Health and Wellbeing, University of Glasgow, Glasgow, UK
- Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - M D Napier
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - K F Kerr
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - A P Reiner
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - K M Rice
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - J Roach
- Research Computing Center, University of North Carolina, Chapel Hill, NC, USA
| | - B M Buckley
- Department of Pharmacology and Therapeutics, University College Cork, Cork, Ireland
| | - E Z Soliman
- Epidemiology Cardiology Research Center (EPICARE), Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - R de Mutsert
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - N Sotoodehnia
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Division of Cardiology, University of Washington, Seattle, WA, USA
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - A G Uitterlinden
- Department of Internal Medicine, Erasmus MC-University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - K E North
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - C R Lee
- Division of Pharmacotherapy and Experimental Therapeutics, Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | - V Gudnason
- Icelandic Heart Association, Kopavogur, Iceland
- Department of Medicine, University of Iceland, Reykjavik, Iceland
| | - T Stürmer
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
- Center for Pharmacoepidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - F R Rosendaal
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - K D Taylor
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - K L Wiggins
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - J G Wilson
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS, USA
| | - Y-Di Chen
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - R C Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - K Wilhelmsen
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
- The Renaissance Computing Institute, Chapel Hill, NC, USA
| | - L A Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- National Heart, Lung, and Blood Institute Framingham Heart Study, Framingham, MA, USA
| | - V Salomaa
- Department of Health, THL-National Institute for Health and Welfare, Helsinki, Finland
| | - C van Duijn
- Department of Epidemiology, Erasmus MC-University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - J W Jukema
- Department of Cardiology, Leiden University Medical Center, Leiden, The Netherlands
- Durrer Center for Cardiogenetic Research, Amsterdam, The Netherlands
- Interuniversity Cardiology Institute of the Netherlands, Utrecht, The Netherlands
| | - Y Liu
- Department of Epidemiology and Prevention, Division of Public Health Sciences, Wake Forest University, Winston-Salem, NC, USA
| | - D O Mook-Kanamori
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
- Department of Public Health and Primary Care, Leiden University Medical Center, Leiden, the Netherlands
- Department of BESC, Epidemiology Section, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - L A Lange
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - R S Vasan
- National Heart, Lung, and Blood Institute Framingham Heart Study, Framingham, MA, USA
- Division of Preventive Medicine and Epidemiology, Department of Epidemiology, Boston University School of Medicine, Boston, MA, USA
| | - A V Smith
- Icelandic Heart Association, Kopavogur, Iceland
- Department of Medicine, University of Iceland, Reykjavik, Iceland
| | - B H Stricker
- Department of Epidemiology, Erasmus MC-University Medical Center Rotterdam, Rotterdam, The Netherlands
- Inspectorate of Health Care, Utrecht, The Netherlands
| | - C C Laurie
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - J I Rotter
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - E A Whitsel
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
- Department of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - B M Psaty
- Department of Medicine, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Department of Health Services, University of Washington, Seattle, WA, USA
- Group Health Research Institute, Group Health Cooperative, Seattle, WA, USA
| | - C L Avery
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
- Carolina Population Center, University of North Carolina, Chapel Hill, NC, USA
| |
Collapse
|
47
|
Ray D, Boehnke M. Methods for meta-analysis of multiple traits using GWAS summary statistics. Genet Epidemiol 2017; 42:134-145. [PMID: 29226385 DOI: 10.1002/gepi.22105] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 10/27/2017] [Accepted: 11/08/2017] [Indexed: 12/21/2022]
Abstract
Genome-wide association studies (GWAS) for complex diseases have focused primarily on single-trait analyses for disease status and disease-related quantitative traits. For example, GWAS on risk factors for coronary artery disease analyze genetic associations of plasma lipids such as total cholesterol, LDL-cholesterol, HDL-cholesterol, and triglycerides (TGs) separately. However, traits are often correlated and a joint analysis may yield increased statistical power for association over multiple univariate analyses. Recently several multivariate methods have been proposed that require individual-level data. Here, we develop metaUSAT (where USAT is unified score-based association test), a novel unified association test of a single genetic variant with multiple traits that uses only summary statistics from existing GWAS. Although the existing methods either perform well when most correlated traits are affected by the genetic variant in the same direction or are powerful when only a few of the correlated traits are associated, metaUSAT is designed to be robust to the association structure of correlated traits. metaUSAT does not require individual-level data and can test genetic associations of categorical and/or continuous traits. One can also use metaUSAT to analyze a single trait over multiple studies, appropriately accounting for overlapping samples, if any. metaUSAT provides an approximate asymptotic P-value for association and is computationally efficient for implementation at a genome-wide level. Simulation experiments show that metaUSAT maintains proper type-I error at low error levels. It has similar and sometimes greater power to detect association across a wide array of scenarios compared to existing methods, which are usually powerful for some specific association scenarios only. When applied to plasma lipids summary data from the METSIM and the T2D-GENES studies, metaUSAT detected genome-wide significant loci beyond the ones identified by univariate analyses. Evidence from larger studies suggest that the variants additionally detected by our test are, indeed, associated with lipid levels in humans. In summary, metaUSAT can provide novel insights into the genetic architecture of a common disease or traits.
Collapse
Affiliation(s)
- Debashree Ray
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Michael Boehnke
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
48
|
Deng Y, Pan W. Testing Genetic Pleiotropy with GWAS Summary Statistics for Marginal and Conditional Analyses. Genetics 2017; 207:1285-1299. [PMID: 28971959 PMCID: PMC5714448 DOI: 10.1534/genetics.117.300347] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2017] [Accepted: 09/29/2017] [Indexed: 11/18/2022] Open
Abstract
There is growing interest in testing genetic pleiotropy, which is when a single genetic variant influences multiple traits. Several methods have been proposed; however, these methods have some limitations. First, all the proposed methods are based on the use of individual-level genotype and phenotype data; in contrast, for logistical, and other, reasons, summary statistics of univariate SNP-trait associations are typically only available based on meta- or mega-analyzed large genome-wide association study (GWAS) data. Second, existing tests are based on marginal pleiotropy, which cannot distinguish between direct and indirect associations of a single genetic variant with multiple traits due to correlations among the traits. Hence, it is useful to consider conditional analysis, in which a subset of traits is adjusted for another subset of traits. For example, in spite of substantial lowering of low-density lipoprotein cholesterol (LDL) with statin therapy, some patients still maintain high residual cardiovascular risk, and, for these patients, it might be helpful to reduce their triglyceride (TG) level. For this purpose, in order to identify new therapeutic targets, it would be useful to identify genetic variants with pleiotropic effects on LDL and TG after adjusting the latter for LDL; otherwise, a pleiotropic effect of a genetic variant detected by a marginal model could simply be due to its association with LDL only, given the well-known correlation between the two types of lipids. Here, we develop a new pleiotropy testing procedure based only on GWAS summary statistics that can be applied for both marginal analysis and conditional analysis. Although the main technical development is based on published union-intersection testing methods, care is needed in specifying conditional models to avoid invalid statistical estimation and inference. In addition to the previously used likelihood ratio test, we also propose using generalized estimating equations under the working independence model for robust inference. We provide numerical examples based on both simulated and real data, including two large lipid GWAS summary association datasets based on ∼100,000 and ∼189,000 samples, respectively, to demonstrate the difference between marginal and conditional analyses, as well as the effectiveness of our new approach.
Collapse
Affiliation(s)
- Yangqing Deng
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455
| |
Collapse
|
49
|
Lin N, Zhu Y, Fan R, Xiong M. A quadratically regularized functional canonical correlation analysis for identifying the global structure of pleiotropy with NGS data. PLoS Comput Biol 2017; 13:e1005788. [PMID: 29040274 PMCID: PMC5659802 DOI: 10.1371/journal.pcbi.1005788] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Revised: 10/27/2017] [Accepted: 09/21/2017] [Indexed: 01/12/2023] Open
Abstract
Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore correlation information of genetic variants, effectively reduce data dimensions, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new statistic method referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the ten competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and ten other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the ten other statistics.
Collapse
Affiliation(s)
- Nan Lin
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States of America
| | - Yun Zhu
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, United States of America
| | - Ruzong Fan
- Biostatistics and Bioinformatics Branch (BBB), Division of Intramural Population Health Research (DIPHR), Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (NIH), Bethesda, MD, United States of America
| | - Momiao Xiong
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States of America
| |
Collapse
|
50
|
A Powerful Framework for Integrating eQTL and GWAS Summary Data. Genetics 2017; 207:893-902. [PMID: 28893853 DOI: 10.1534/genetics.117.300270] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Accepted: 09/05/2017] [Indexed: 01/26/2023] Open
Abstract
Two new gene-based association analysis methods, called PrediXcan and TWAS for GWAS individual-level and summary data, respectively, were recently proposed to integrate GWAS with eQTL data, alleviating two common problems in GWAS by boosting statistical power and facilitating biological interpretation of GWAS discoveries. Based on a novel reformulation of PrediXcan and TWAS, we propose a more powerful gene-based association test to integrate single set or multiple sets of eQTL data with GWAS individual-level data or summary statistics. The proposed test was applied to several GWAS datasets, including two lipid summary association datasets based on [Formula: see text] and [Formula: see text] samples, respectively, and uncovered more known or novel trait-associated genes, showcasing much improved performance of our proposed method. The software implementing the proposed method is freely available as an R package.
Collapse
|