1
|
Wendel B, Heidenreich M, Budde M, Heilbronner M, Oraki Kohshour M, Papiol S, Falkai P, Schulze TG, Heilbronner U, Bickeböller H. Kalpra: A kernel approach for longitudinal pathway regression analysis integrating network information with an application to the longitudinal PsyCourse Study. Front Genet 2022; 13:1015885. [PMID: 36561312 PMCID: PMC9767414 DOI: 10.3389/fgene.2022.1015885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 11/24/2022] [Indexed: 12/12/2022] Open
Abstract
A popular approach to reduce the high dimensionality resulting from genome-wide association studies is to analyze a whole pathway in a single test for association with a phenotype. Kernel machine regression (KMR) is a highly flexible pathway analysis approach. Initially, KMR was developed to analyze a simple phenotype with just one measurement per individual. Recently, however, the investigation into the influence of genomic factors in the development of disease-related phenotypes across time (trajectories) has gained in importance. Thus, novel statistical approaches for KMR analyzing longitudinal data, i.e. several measurements at specific time points per individual are required. For longitudinal pathway analysis, we extend KMR to long-KMR using the estimation equivalence of KMR and linear mixed models. We include additional random effects to correct for the dependence structure. Moreover, within long-KMR we created a topology-based pathway analysis by combining this approach with a kernel including network information of the pathway. Most importantly, long-KMR not only allows for the investigation of the main genetic effect adjusting for time dependencies within an individual, but it also allows to test for the association of the pathway with the longitudinal course of the phenotype in the form of testing the genetic time-interaction effect. The approach is implemented as an R package, kalpra. Our simulation study demonstrates that the power of long-KMR exceeded that of another KMR method previously developed to analyze longitudinal data, while maintaining (slightly conservatively) the type I error. The network kernel improved the performance of long-KMR compared to the linear kernel. Considering different pathway densities, the power of the network kernel decreased with increasing pathway density. We applied long-KMR to cognitive data on executive function (Trail Making Test, part B) from the PsyCourse Study and 17 candidate pathways selected from Reactome. We identified seven nominally significant pathways.
Collapse
Affiliation(s)
- Bernadette Wendel
- Department of Genetic Epidemiology, University Medical Center Göttingen, Georg-August-University Göttingen, Göttingen, Germany,*Correspondence: Bernadette Wendel,
| | - Markus Heidenreich
- Department of Genetic Epidemiology, University Medical Center Göttingen, Georg-August-University Göttingen, Göttingen, Germany
| | - Monika Budde
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany
| | - Maria Heilbronner
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany
| | - Mojtaba Oraki Kohshour
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany
| | - Sergi Papiol
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany,Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, Munich, Germany
| | - Peter Falkai
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, Munich, Germany
| | - Thomas G. Schulze
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany,Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, United States,Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Urs Heilbronner
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany
| | - Heike Bickeböller
- Department of Genetic Epidemiology, University Medical Center Göttingen, Georg-August-University Göttingen, Göttingen, Germany
| |
Collapse
|
2
|
Li S, Li S, Su S, Zhang H, Shen J, Wen Y. Gene Region Association Analysis of Longitudinal Quantitative Traits Based on a Function-On-Function Regression Model. Front Genet 2022; 13:781740. [PMID: 35265102 PMCID: PMC8899465 DOI: 10.3389/fgene.2022.781740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 01/04/2022] [Indexed: 11/13/2022] Open
Abstract
In the process of growth and development in life, gene expressions that control quantitative traits will turn on or off with time. Studies of longitudinal traits are of great significance in revealing the genetic mechanism of biological development. With the development of ultra-high-density sequencing technology, the associated analysis has tremendous challenges to statistical methods. In this paper, a longitudinal functional data association test (LFDAT) method is proposed based on the function-on-function regression model. LFDAT can simultaneously treat phenotypic traits and marker information as continuum variables and analyze the association of longitudinal quantitative traits and gene regions. Simulation studies showed that: 1) LFDAT performs well for both linkage equilibrium simulation and linkage disequilibrium simulation, 2) LFDAT has better performance for gene regions (include common variants, low-frequency variants, rare variants and mixture), and 3) LFDAT can accurately identify gene switching in the growth and development stage. The longitudinal data of the Oryza sativa projected shoot area is analyzed by LFDAT. It showed that there is the advantage of quick calculations. Further, an association analysis was conducted between longitudinal traits and gene regions by integrating the micro effects of multiple related variants and using the information of the entire gene region. LFDAT provides a feasible method for studying the formation and expression of longitudinal traits.
Collapse
Affiliation(s)
- Shijing Li
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China.,> Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Shiqin Li
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Shaoqiang Su
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Hui Zhang
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China.,> Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Jiayu Shen
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China.,> Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Yongxian Wen
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China.,> Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou, China
| |
Collapse
|
3
|
Wu W, Wang Z, Xu K, Zhang X, Amei A, Gelernter J, Zhao H, Justice AC, Wang Z. Retrospective Association Analysis of Longitudinal Binary Traits Identifies Important Loci and Pathways in Cocaine Use. Genetics 2019; 213:1225-1236. [PMID: 31591132 PMCID: PMC6893384 DOI: 10.1534/genetics.119.302598] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 10/04/2019] [Indexed: 12/15/2022] Open
Abstract
Longitudinal phenotypes have been increasingly available in genome-wide association studies (GWAS) and electronic health record-based studies for identification of genetic variants that influence complex traits over time. For longitudinal binary data, there remain significant challenges in gene mapping, including misspecification of the model for phenotype distribution due to ascertainment. Here, we propose L-BRAT (Longitudinal Binary-trait Retrospective Association Test), a retrospective, generalized estimating equation-based method for genetic association analysis of longitudinal binary outcomes. We also develop RGMMAT, a retrospective, generalized linear mixed model-based association test. Both tests are retrospective score approaches in which genotypes are treated as random conditional on phenotype and covariates. They allow both static and time-varying covariates to be included in the analysis. Through simulations, we illustrated that retrospective association tests are robust to ascertainment and other types of phenotype model misspecification, and gain power over previous association methods. We applied L-BRAT and RGMMAT to a genome-wide association analysis of repeated measures of cocaine use in a longitudinal cohort. Pathway analysis implicated association with opioid signaling and axonal guidance signaling pathways. Lastly, we replicated important pathways in an independent cocaine dependence case-control GWAS. Our results illustrate that L-BRAT is able to detect important loci and pathways in a genome scan and to provide insights into genetic architecture of cocaine use.
Collapse
Affiliation(s)
- Weimiao Wu
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| | - Zhong Wang
- Baker Institute for Animal Health, Cornell University, Ithaca, New York 14850
| | - Ke Xu
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut 06511
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
| | - Xinyu Zhang
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut 06511
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
| | - Amei Amei
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Nevada 89154
| | - Joel Gelernter
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut 06511
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| | - Amy C Justice
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
- Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut 06511
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| |
Collapse
|
4
|
Koh H, Li Y, Zhan X, Chen J, Zhao N. A Distance-Based Kernel Association Test Based on the Generalized Linear Mixed Model for Correlated Microbiome Studies. Front Genet 2019; 10:458. [PMID: 31156711 PMCID: PMC6532659 DOI: 10.3389/fgene.2019.00458] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Accepted: 04/30/2019] [Indexed: 12/12/2022] Open
Abstract
Researchers have increasingly employed family-based or longitudinal study designs to survey the roles of the human microbiota on diverse host traits of interest (e. g., health/disease status, medical intervention, behavioral/environmental factor). Such study designs are useful to properly control for potential confounders or the sensitive changes in microbial composition and host traits. However, downstream data analysis is challenging because the measurements within clusters (e.g., families, subjects including repeated measures) tend to be correlated so that statistical methods based on the independence assumption cannot be used. For the correlated microbiome studies, a distance-based kernel association test based on the linear mixed model, namely, correlated sequence kernel association test (cSKAT), has recently been introduced. cSKAT models the microbial community using an ecological distance (e.g., Jaccard/Bray-Curtis dissimilarity, unique fraction distance), and then tests its association with a host trait. Similar to prior distance-based kernel association tests (e.g., microbiome regression-based kernel association test), the use of ecological distances gives a high power to cSKAT. However, cSKAT is limited to handling Gaussian traits [e.g., body mass index (BMI)] and a single chosen distance measure at a time. The power of cSKAT differs a lot by which distance measure is used. However, choosing an optimal distance measure is challenging because of the unknown nature of the true association. Here, we introduce a distance-based kernel association test based on the generalized linear mixed model (GLMM), namely, GLMM-MiRKAT, to handle diverse types of traits, such as Gaussian (e.g., BMI), Binomial (e.g., disease status, treatment/placebo) or Poisson (e.g., number of tumors/treatments) traits. We further propose a data-driven adaptive test of GLMM-MiRKAT, namely, aGLMM-MiRKAT, so as to avoid the need to choose the optimal distance measure. Our extensive simulations demonstrate that aGLMM-MiRKAT is robustly powerful while correctly controlling type I error rates. We apply aGLMM-MiRKAT to real familial and longitudinal microbiome data, where we discover significant disparity in microbial community composition by BMI status and the frequency of antibiotic use. In summary, aGLMM-MiRKAT is a useful analytical tool with its broad applicability to diverse types of traits, robust power and valid statistical inference.
Collapse
Affiliation(s)
- Hyunwook Koh
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Yutong Li
- School of Physics, Peking University, Beijing, China
| | - Xiang Zhan
- Department of Public Health Sciences, Pennsylvania State University, Hershey, PA, United States
| | - Jun Chen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| |
Collapse
|
5
|
Zhan X, Xue L, Zheng H, Plantinga A, Wu MC, Schaid DJ, Zhao N, Chen J. A small‐sample kernel association test for correlated data with application to microbiome association studies. Genet Epidemiol 2018; 42:772-782. [DOI: 10.1002/gepi.22160] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Revised: 06/27/2018] [Accepted: 07/15/2018] [Indexed: 01/11/2023]
Affiliation(s)
- Xiang Zhan
- Department of Public Health SciencesPennsylvania State UniversityHershey Pennsylvania
| | - Lingzhou Xue
- Department of StatisticsPennsylvania State UniversityUniversity Park Pennsylvania
| | - Haotian Zheng
- Department of Mathematical SciencesTsinghua UniversityBeijing China
| | - Anna Plantinga
- Department of BiostatisticsUniversity of WashingtonSeattle Washington
| | - Michael C. Wu
- Department of BiostatisticsUniversity of WashingtonSeattle Washington
- Division of Public Health SciencesFred Hutchinson Cancer Research CenterSeattle Washington
| | - Daniel J. Schaid
- Division of Biomedical Statistics and InformaticsMayo ClinicRochester Minnesota
| | - Ni Zhao
- Department of BiostatisticsJohns Hopkins UniversityBaltimore Maryland
| | - Jun Chen
- Division of Biomedical Statistics and InformaticsMayo ClinicRochester Minnesota
- Center for Individualized MedicineMayo ClinicRochester Minnesota
| |
Collapse
|
6
|
Wang Z, Wang N, Wu R, Wang Z. fGWAS: An R package for genome-wide association analysis with longitudinal phenotypes. J Genet Genomics 2018; 45:411-413. [PMID: 30049619 PMCID: PMC6179436 DOI: 10.1016/j.jgg.2018.06.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Revised: 06/18/2018] [Accepted: 06/27/2018] [Indexed: 10/28/2022]
Affiliation(s)
- Zhong Wang
- College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China; Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY 14850, USA.
| | - Nating Wang
- College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Rongling Wu
- Center for Computational Biology, Beijing Forestry University, Beijing 100083, China; Center for Statistical Genetics, Department of Public Health Sciences, Pennsylvania State College of Medicine, Hershey, PA 17033, USA
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT 06510, USA.
| |
Collapse
|
7
|
Park JY, Wu C, Basu S, McGue M, Pan W. Adaptive SNP-Set Association Testing in Generalized Linear Mixed Models with Application to Family Studies. Behav Genet 2018; 48:55-66. [PMID: 29150721 PMCID: PMC5754233 DOI: 10.1007/s10519-017-9883-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Accepted: 11/07/2017] [Indexed: 10/18/2022]
Abstract
In genome-wide association studies (GWAS), it has been increasingly recognized that, as a complementary approach to standard single SNP analyses, it may be beneficial to analyze a group of functionally related SNPs together. Among the existent population-based SNP-set association tests, two adaptive tests, the aSPU test and the aSPUpath test, offer a powerful and general approach at the gene- and pathway-levels by data-adaptively combining the results across multiple SNPs (and genes) such that high statistical power can be maintained across a wide range of scenarios. We extend the aSPU and the aSPUpath test to familial data under the framework of the generalized linear mixed models (GLMMs), which can take account of both subject relatedness and possible population structure. As in population-based GWAS, the proposed aSPU and aSPUpath tests require only fitting a single and common GLMM (under the null hypothesis) for all the SNPs, thus are computationally efficient and feasible for large GWAS data. We illustrate our approaches in identifying genes and pathways associated with alcohol dependence in the Minnesota Twin Family Study. The aSPU test detected a gene associated with the trait, in contrast to none by the standard single SNP analysis. Our aSPU test also controlled Type I errors satisfactorily in a small simulation study. We provide R code to conduct the aSPU and aSPUpath tests for familial and other correlated data.
Collapse
Affiliation(s)
- Jun Young Park
- Division of Biostatistics, University of Minnesota, A460 Mayo Building, MMC 303, 420 Delaware St. SE, Minneapolis, MN, 55455, USA
| | - Chong Wu
- Division of Biostatistics, University of Minnesota, A460 Mayo Building, MMC 303, 420 Delaware St. SE, Minneapolis, MN, 55455, USA
| | - Saonli Basu
- Division of Biostatistics, University of Minnesota, A460 Mayo Building, MMC 303, 420 Delaware St. SE, Minneapolis, MN, 55455, USA
| | - Matt McGue
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, A460 Mayo Building, MMC 303, 420 Delaware St. SE, Minneapolis, MN, 55455, USA.
| |
Collapse
|
8
|
He Z, Lee S, Zhang M, Smith JA, Guo X, Palmas W, Kardia SL, Ionita-Laza I, Mukherjee B. Rare-variant association tests in longitudinal studies, with an application to the Multi-Ethnic Study of Atherosclerosis (MESA). Genet Epidemiol 2017; 41:801-810. [PMID: 29076270 PMCID: PMC5696115 DOI: 10.1002/gepi.22081] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Revised: 08/24/2017] [Accepted: 08/24/2017] [Indexed: 11/09/2022]
Abstract
Over the past few years, an increasing number of studies have identified rare variants that contribute to trait heritability. Due to the extreme rarity of some individual variants, gene-based association tests have been proposed to aggregate the genetic variants within a gene, pathway, or specific genomic region as opposed to a one-at-a-time single variant analysis. In addition, in longitudinal studies, statistical power to detect disease susceptibility rare variants can be improved through jointly testing repeatedly measured outcomes, which better describes the temporal development of the trait of interest. However, usual sandwich/model-based inference for sequencing studies with longitudinal outcomes and rare variants can produce deflated/inflated type I error rate without further corrections. In this paper, we develop a group of tests for rare-variant association based on outcomes with repeated measures. We propose new perturbation methods such that the type I error rate of the new tests is not only robust to misspecification of within-subject correlation, but also significantly improved for variants with extreme rarity in a study with small or moderate sample size. Through extensive simulation studies, we illustrate that substantially higher power can be achieved by utilizing longitudinal outcomes and our proposed finite sample adjustment. We illustrate our methods using data from the Multi-Ethnic Study of Atherosclerosis for exploring association of repeated measures of blood pressure with rare and common variants based on exome sequencing data on 6,361 individuals.
Collapse
Affiliation(s)
- Zihuai He
- Department of Biostatistics, Columbia University, New York, NY 10032
| | - Seunggeun Lee
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109
| | - Min Zhang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109
| | - Jennifer A. Smith
- Department of Epidemiology, University of Michigan, Ann Arbor, MI 48109
| | - Xiuqing Guo
- Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, CA 90509
| | - Walter Palmas
- Department of Medicine, Columbia University, New York, NY 10032
| | | | | | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109
| |
Collapse
|
9
|
Xu Z, Xu G, Pan W. Adaptive testing for association between two random vectors in moderate to high dimensions. Genet Epidemiol 2017; 41:599-609. [PMID: 28714590 PMCID: PMC5643233 DOI: 10.1002/gepi.22059] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Revised: 04/26/2017] [Accepted: 05/17/2017] [Indexed: 01/09/2023]
Abstract
Testing for association between two random vectors is a common and important task in many fields, however, existing tests, such as Escoufier's RV test, are suitable only for low-dimensional data, not for high-dimensional data. In moderate to high dimensions, it is necessary to consider sparse signals, which are often expected with only a few, but not many, variables associated with each other. We generalize the RV test to moderate-to-high dimensions. The key idea is to data adaptively weight each variable pair based on its empirical association. As the consequence, the proposed test is adaptive, alleviating the effects of noise accumulation in high-dimensional data, and thus maintaining the power for both dense and sparse alternative hypotheses. We show the connections between the proposed test with several existing tests, such as a generalized estimating equations-based adaptive test, multivariate kernel machine regression (KMR), and kernel distance methods. Furthermore, we modify the proposed adaptive test so that it can be powerful for nonlinear or nonmonotonic associations. We use both real data and simulated data to demonstrate the advantages and usefulness of the proposed new test. The new test is freely available in R package aSPC on CRAN at https://cran.r-project.org/web/packages/aSPC/index.html and https://github.com/jasonzyx/aSPC.
Collapse
Affiliation(s)
- Zhiyuan Xu
- Division of Biostatistics, University of Minnesota
| | - Gongjun Xu
- Department of Statistics, University of Michigan
| | - Wei Pan
- Division of Biostatistics, University of Minnesota
| | | |
Collapse
|