1
|
Mbatchou J, McPeek MS. JASPER: Fast, powerful, multitrait association testing in structured samples gives insight on pleiotropy in gene expression. Am J Hum Genet 2024:S0002-9297(24)00216-7. [PMID: 39025064 DOI: 10.1016/j.ajhg.2024.06.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 06/19/2024] [Accepted: 06/20/2024] [Indexed: 07/20/2024] Open
Abstract
Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction, and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks, or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture, and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits, and microbiome abundances. It allows for covariates, ascertainment, and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, most of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.
Collapse
Affiliation(s)
- Joelle Mbatchou
- Regeneron Genetics Center, Tarrytown, NY 10591, USA; Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
| | - Mary Sara McPeek
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA; Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
2
|
Xu G, Amei A, Wu W, Liu Y, Shen L, Oh EC, Wang Z. RETROSPECTIVE VARYING COEFFICIENT ASSOCIATION ANALYSIS OF LONGITUDINAL BINARY TRAITS: APPLICATION TO THE IDENTIFICATION OF GENETIC LOCI ASSOCIATED WITH HYPERTENSION. Ann Appl Stat 2024; 18:487-505. [PMID: 38577266 PMCID: PMC10994004 DOI: 10.1214/23-aoas1798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2024]
Abstract
Many genetic studies contain rich information on longitudinal phenotypes that require powerful analytical tools for optimal analysis. Genetic analysis of longitudinal data that incorporates temporal variation is important for understanding the genetic architecture and biological variation of complex diseases. Most of the existing methods assume that the contribution of genetic variants is constant over time and fail to capture the dynamic pattern of disease progression. However, the relative influence of genetic variants on complex traits fluctuates over time. In this study, we propose a retrospective varying coefficient mixed model association test, RVMMAT, to detect time-varying genetic effect on longitudinal binary traits. We model dynamic genetic effect using smoothing splines, estimate model parameters by maximizing a double penalized quasi-likelihood function, design a joint test using a Cauchy combination method, and evaluate statistical significance via a retrospective approach to achieve robustness to model misspecification. Through simulations we illustrated that the retrospective varying-coefficient test was robust to model misspecification under different ascertainment schemes and gained power over the association methods assuming constant genetic effect. We applied RVMMAT to a genome-wide association analysis of longitudinal measure of hypertension in the Multi-Ethnic Study of Atherosclerosis. Pathway analysis identified two important pathways related to G-protein signaling and DNA damage. Our results demonstrated that RVMMAT could detect biologically relevant loci and pathways in a genome scan and provided insight into the genetic architecture of hypertension.
Collapse
Affiliation(s)
- Gang Xu
- Department of Mathematical Sciences, University of Nevada
| | - Amei Amei
- Department of Mathematical Sciences, University of Nevada
| | - Weimiao Wu
- Department of Biostatistics, Yale School of Public Health
| | - Yunqing Liu
- Department of Biostatistics, Yale School of Public Health
| | - Linchuan Shen
- Department of Mathematical Sciences, University of Nevada
| | - Edwin C. Oh
- Department of Internal Medicine, University of Nevada School of Medicine
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health
| |
Collapse
|
3
|
Mbatchou J, McPeek MS. JASPER: fast, powerful, multitrait association testing in structured samples gives insight on pleiotropy in gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.18.571948. [PMID: 38187553 PMCID: PMC10769254 DOI: 10.1101/2023.12.18.571948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits and microbiome abundances. It allows for covariates, ascertainment and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, some of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.
Collapse
Affiliation(s)
- Joelle Mbatchou
- Regeneron Genetics Center, Tarrytown, NY 10591, USA
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
| | - Mary Sara McPeek
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
- Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
4
|
Zhuang X, Xu G, Amei A, Cordes D, Wang Z, Oh EC. Detecting time-varying genetic effects in Alzheimer's disease using a longitudinal GWAS model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.17.562756. [PMID: 37905044 PMCID: PMC10614870 DOI: 10.1101/2023.10.17.562756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Background The development and progression of Alzheimer's disease (AD) is a complex process that can change over time, during which genetic influences on phenotypes may also fluctuate. Incorporating longitudinal phenotypes in genome wide association studies (GWAS) could help unmask genetic loci with time-varying effects. In this study, we incorporated a varying coefficient test in a longitudinal GWAS model to identify single nucleotide polymorphisms (SNPs) that may have time- or age-dependent effects in AD. Methods Genotype data from 1,877 participants in the Alzheimer's Neuroimaging Data Initiative (ADNI) were imputed using the Haplotype Reference Consortium (HRC) panel, resulting in 9,573,130 SNPs. Subjects' longitudinal impairment status at each visit was considered as a binary and clinical phenotype. Participants' composite standardized uptake value ratio (SUVR) derived from each longitudinal amyloid PET scan was considered as a continuous and biological phenotype. The retrospective varying coefficient mixed model association test (RVMMAT) was used in longitudinal GWAS to detect time-varying genetic effects on the impairment status and SUVR measures. Post-hoc analyses were performed on genome-wide significant SNPs, including 1) pathway analyses; 2) age-stratified genotypic comparisons and regression analyses; and 3) replication analyses using data from the National Alzheimer's Coordinating Center (NACC). Results Our model identified 244 genome-wide significant SNPs that revealed time-varying genetic effects on the clinical impairment status in AD; among which, 12 SNPs on chromosome 19 were successfully replicated using data from NACC. Post-hoc age-stratified analyses indicated that for most of these 244 SNPs, the maximum genotypic effect on impairment status occurred between 70 to 80 years old, and then declined with age. Our model further identified 73 genome-wide significant SNPs associated with the temporal variation of amyloid accumulation. For these SNPs, an increasing genotypic effect on PET-SUVR was observed as participants' age increased. Functional pathway analyses on significant SNPs for both phenotypes highlighted the involvement and disruption of immune responses- and neuroinflammation-related pathways in AD. Conclusion We demonstrate that longitudinal GWAS models with time-varying coefficients can boost the statistical power in AD-GWAS. In addition, our analyses uncovered potential time-varying genetic variants on repeated measurements of clinical and biological phenotypes in AD.
Collapse
|
5
|
Onifade M, Roy-Gagnon MH, Parent MÉ, Burkett KM. Comparison of mixed model based approaches for correcting for population substructure with application to extreme phenotype sampling. BMC Genomics 2022; 23:98. [PMID: 35120458 PMCID: PMC8815214 DOI: 10.1186/s12864-022-08297-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Accepted: 01/06/2022] [Indexed: 11/10/2022] Open
Abstract
Background Mixed models are used to correct for confounding due to population stratification and hidden relatedness in genome-wide association studies. This class of models includes linear mixed models and generalized linear mixed models. Existing mixed model approaches to correct for population substructure have been previously investigated with both continuous and case-control response variables. However, they have not been investigated in the context of extreme phenotype sampling (EPS), where genetic covariates are only collected on samples having extreme response variable values. In this work, we compare the performance of existing binary trait mixed model approaches (GMMAT, LEAP and CARAT) on EPS data. Since linear mixed models are commonly used even with binary traits, we also evaluate the performance of a popular linear mixed model implementation (GEMMA). Results We used simulation studies to estimate the type I error rate and power of all approaches assuming a population with substructure. Our simulation results show that for a common candidate variant, both LEAP and GMMAT control the type I error rate while CARAT’s rate remains inflated. We applied all methods to a real dataset from a Québec, Canada, case-control study that is known to have population substructure. We observe similar type I error control with the analysis on the Québec dataset. For rare variants, the false positive rate remains inflated even after correction with mixed model approaches. For methods that control the type I error rate, the estimated power is comparable. Conclusions The methods compared in this study differ in their type I error control. Therefore, when data are from an EPS study, care should be taken to ensure that the models underlying the methodology are suitable to the sampling strategy and to the minor allele frequency of the candidate SNPs. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-022-08297-y).
Collapse
Affiliation(s)
- Maryam Onifade
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada
| | | | - Marie-Élise Parent
- Centre Armand-Frappier Santé Biotechnologie, Institut national de la recherche scientifique, Université du Québec, Laval, Canada
| | - Kelly M Burkett
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada.
| |
Collapse
|
6
|
Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
7
|
Wu W, Wang Z, Xu K, Zhang X, Amei A, Gelernter J, Zhao H, Justice AC, Wang Z. Retrospective Association Analysis of Longitudinal Binary Traits Identifies Important Loci and Pathways in Cocaine Use. Genetics 2019; 213:1225-1236. [PMID: 31591132 PMCID: PMC6893384 DOI: 10.1534/genetics.119.302598] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 10/04/2019] [Indexed: 12/15/2022] Open
Abstract
Longitudinal phenotypes have been increasingly available in genome-wide association studies (GWAS) and electronic health record-based studies for identification of genetic variants that influence complex traits over time. For longitudinal binary data, there remain significant challenges in gene mapping, including misspecification of the model for phenotype distribution due to ascertainment. Here, we propose L-BRAT (Longitudinal Binary-trait Retrospective Association Test), a retrospective, generalized estimating equation-based method for genetic association analysis of longitudinal binary outcomes. We also develop RGMMAT, a retrospective, generalized linear mixed model-based association test. Both tests are retrospective score approaches in which genotypes are treated as random conditional on phenotype and covariates. They allow both static and time-varying covariates to be included in the analysis. Through simulations, we illustrated that retrospective association tests are robust to ascertainment and other types of phenotype model misspecification, and gain power over previous association methods. We applied L-BRAT and RGMMAT to a genome-wide association analysis of repeated measures of cocaine use in a longitudinal cohort. Pathway analysis implicated association with opioid signaling and axonal guidance signaling pathways. Lastly, we replicated important pathways in an independent cocaine dependence case-control GWAS. Our results illustrate that L-BRAT is able to detect important loci and pathways in a genome scan and to provide insights into genetic architecture of cocaine use.
Collapse
Affiliation(s)
- Weimiao Wu
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| | - Zhong Wang
- Baker Institute for Animal Health, Cornell University, Ithaca, New York 14850
| | - Ke Xu
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut 06511
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
| | - Xinyu Zhang
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut 06511
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
| | - Amei Amei
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Nevada 89154
| | - Joel Gelernter
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut 06511
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| | - Amy C Justice
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
- Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut 06511
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| |
Collapse
|
8
|
Lloyd-Jones LR, Robinson MR, Yang J, Visscher PM. Transformation of Summary Statistics from Linear Mixed Model Association on All-or-None Traits to Odds Ratio. Genetics 2018; 208:1397-1408. [PMID: 29429966 PMCID: PMC5887138 DOI: 10.1534/genetics.117.300360] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Accepted: 01/25/2018] [Indexed: 12/15/2022] Open
Abstract
Genome-wide association studies (GWAS) have identified thousands of loci that are robustly associated with complex diseases. The use of linear mixed model (LMM) methodology for GWAS is becoming more prevalent due to its ability to control for population structure and cryptic relatedness and to increase power. The odds ratio (OR) is a common measure of the association of a disease with an exposure (e.g., a genetic variant) and is readably available from logistic regression. However, when the LMM is applied to all-or-none traits it provides estimates of genetic effects on the observed 0-1 scale, a different scale to that in logistic regression. This limits the comparability of results across studies, for example in a meta-analysis, and makes the interpretation of the magnitude of an effect from an LMM GWAS difficult. In this study, we derived transformations from the genetic effects estimated under the LMM to the OR that only rely on summary statistics. To test the proposed transformations, we used real genotypes from two large, publicly available data sets to simulate all-or-none phenotypes for a set of scenarios that differ in underlying model, disease prevalence, and heritability. Furthermore, we applied these transformations to GWAS summary statistics for type 2 diabetes generated from 108,042 individuals in the UK Biobank. In both simulation and real-data application, we observed very high concordance between the transformed OR from the LMM and either the simulated truth or estimates from logistic regression. The transformations derived and validated in this study improve the comparability of results from prospective and already performed LMM GWAS on complex diseases by providing a reliable transformation to a common comparative scale for the genetic effects.
Collapse
Affiliation(s)
- Luke R Lloyd-Jones
- Institute for Molecular Bioscience, University of Queensland, Brisbane 4072, Australia
| | - Matthew R Robinson
- Institute for Molecular Bioscience, University of Queensland, Brisbane 4072, Australia
- Department of Computational Biology, University of Lausanne, CH-1015, Switzerland
| | - Jian Yang
- Institute for Molecular Bioscience, University of Queensland, Brisbane 4072, Australia
- Queensland Brain Institute, University of Queensland, Brisbane 4072, Australia
| | - Peter M Visscher
- Institute for Molecular Bioscience, University of Queensland, Brisbane 4072, Australia
- Queensland Brain Institute, University of Queensland, Brisbane 4072, Australia
| |
Collapse
|
9
|
Ioannidis JP. Making Optimal Use of and Extending beyond Polygenic Additive Liability Models. Hum Hered 2016; 80:158-61. [DOI: 10.1159/000448200] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|