1
|
Wendel B, Heidenreich M, Budde M, Heilbronner M, Oraki Kohshour M, Papiol S, Falkai P, Schulze TG, Heilbronner U, Bickeböller H. Kalpra: A kernel approach for longitudinal pathway regression analysis integrating network information with an application to the longitudinal PsyCourse Study. Front Genet 2022; 13:1015885. [PMID: 36561312 PMCID: PMC9767414 DOI: 10.3389/fgene.2022.1015885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 11/24/2022] [Indexed: 12/12/2022] Open
Abstract
A popular approach to reduce the high dimensionality resulting from genome-wide association studies is to analyze a whole pathway in a single test for association with a phenotype. Kernel machine regression (KMR) is a highly flexible pathway analysis approach. Initially, KMR was developed to analyze a simple phenotype with just one measurement per individual. Recently, however, the investigation into the influence of genomic factors in the development of disease-related phenotypes across time (trajectories) has gained in importance. Thus, novel statistical approaches for KMR analyzing longitudinal data, i.e. several measurements at specific time points per individual are required. For longitudinal pathway analysis, we extend KMR to long-KMR using the estimation equivalence of KMR and linear mixed models. We include additional random effects to correct for the dependence structure. Moreover, within long-KMR we created a topology-based pathway analysis by combining this approach with a kernel including network information of the pathway. Most importantly, long-KMR not only allows for the investigation of the main genetic effect adjusting for time dependencies within an individual, but it also allows to test for the association of the pathway with the longitudinal course of the phenotype in the form of testing the genetic time-interaction effect. The approach is implemented as an R package, kalpra. Our simulation study demonstrates that the power of long-KMR exceeded that of another KMR method previously developed to analyze longitudinal data, while maintaining (slightly conservatively) the type I error. The network kernel improved the performance of long-KMR compared to the linear kernel. Considering different pathway densities, the power of the network kernel decreased with increasing pathway density. We applied long-KMR to cognitive data on executive function (Trail Making Test, part B) from the PsyCourse Study and 17 candidate pathways selected from Reactome. We identified seven nominally significant pathways.
Collapse
Affiliation(s)
- Bernadette Wendel
- Department of Genetic Epidemiology, University Medical Center Göttingen, Georg-August-University Göttingen, Göttingen, Germany,*Correspondence: Bernadette Wendel,
| | - Markus Heidenreich
- Department of Genetic Epidemiology, University Medical Center Göttingen, Georg-August-University Göttingen, Göttingen, Germany
| | - Monika Budde
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany
| | - Maria Heilbronner
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany
| | - Mojtaba Oraki Kohshour
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany
| | - Sergi Papiol
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany,Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, Munich, Germany
| | - Peter Falkai
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, Munich, Germany
| | - Thomas G. Schulze
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany,Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, United States,Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Urs Heilbronner
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany
| | - Heike Bickeböller
- Department of Genetic Epidemiology, University Medical Center Göttingen, Georg-August-University Göttingen, Göttingen, Germany
| |
Collapse
|
2
|
Davenport CA, Maity A, Sullivan PF, Tzeng JY. A Powerful Test for SNP Effects on Multivariate Binary Outcomes using Kernel Machine Regression. STATISTICS IN BIOSCIENCES 2018; 10:117-138. [PMID: 30420901 PMCID: PMC6226013 DOI: 10.1007/s12561-017-9189-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Revised: 12/20/2016] [Accepted: 03/15/2017] [Indexed: 10/19/2022]
Abstract
Evaluating multiple binary outcomes is common in genetic studies of complex diseases. These outcomes are often correlated because they are collected from the same individual and they may share common marker effects. In this paper, we propose a procedure to test for effect of a SNP-set on multiple, possibly correlated, binary responses. We develop a score-based test using a nonparametric modeling framework that jointly models the global effect of the marker set. We account for the nonlinear effects and potentially complicated interaction between markers using reproducing kernels. Our testing procedure only requires estimation under the null hypothesis and we use multivariate generalized estimating equations (GEEs) to estimate the model components to account for the correlation among the outcomes. We evaluate finite sample performance of our test via simulation study and demonstrated our methods using the CATIE antibody study data and the CoLaus Study data.
Collapse
Affiliation(s)
- Clemontina A Davenport
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, NC 27707, USA
| | - Arnab Maity
- Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | - Patrick F Sullivan
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jung-Ying Tzeng
- Department of Statistics, Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695, USA. Department of Statistics, National Cheng-Kung University, Tainan, Taiwan Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
3
|
Friedrichs S, Manitz J, Burger P, Amos CI, Risch A, Chang-Claude J, Wichmann HE, Kneib T, Bickeböller H, Hofner B. Pathway-Based Kernel Boosting for the Analysis of Genome-Wide Association Studies. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2017; 2017:6742763. [PMID: 28785300 PMCID: PMC5530424 DOI: 10.1155/2017/6742763] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Revised: 04/15/2017] [Accepted: 05/10/2017] [Indexed: 01/24/2023]
Abstract
The analysis of genome-wide association studies (GWAS) benefits from the investigation of biologically meaningful gene sets, such as gene-interaction networks (pathways). We propose an extension to a successful kernel-based pathway analysis approach by integrating kernel functions into a powerful algorithmic framework for variable selection, to enable investigation of multiple pathways simultaneously. We employ genetic similarity kernels from the logistic kernel machine test (LKMT) as base-learners in a boosting algorithm. A model to explain case-control status is created iteratively by selecting pathways that improve its prediction ability. We evaluated our method in simulation studies adopting 50 pathways for different sample sizes and genetic effect strengths. Additionally, we included an exemplary application of kernel boosting to a rheumatoid arthritis and a lung cancer dataset. Simulations indicate that kernel boosting outperforms the LKMT in certain genetic scenarios. Applications to GWAS data on rheumatoid arthritis and lung cancer resulted in sparse models which were based on pathways interpretable in a clinical sense. Kernel boosting is highly flexible in terms of considered variables and overcomes the problem of multiple testing. Additionally, it enables the prediction of clinical outcomes. Thus, kernel boosting constitutes a new, powerful tool in the analysis of GWAS data and towards the understanding of biological processes involved in disease susceptibility.
Collapse
Affiliation(s)
- Stefanie Friedrichs
- Institute of Genetic Epidemiology, University Medical Centre, Georg-August University Göttingen, Göttingen, Germany
| | - Juliane Manitz
- Department of Statistics and Econometrics, Georg-August University Göttingen, Göttingen, Germany
- Department of Mathematics and Statistics, Boston University, Boston, MA, USA
| | - Patricia Burger
- Institute of Genetic Epidemiology, University Medical Centre, Georg-August University Göttingen, Göttingen, Germany
| | - Christopher I. Amos
- Department of Community and Family Medicine, Geisel School of Medicine, Dartmouth College, Lebanon, NH, USA
| | - Angela Risch
- Division of Molecular Biology, University of Salzburg, Salzburg, Austria
- Translational Lung Research Center Heidelberg (TLRC-H), Member of the German Center for Lung Research (DZL), Heidelberg, Germany
- Division of Epigenomics and Cancer Risk Factors, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Jenny Chang-Claude
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Heinz-Erich Wichmann
- Institute of Medical Informatics, Biometry and Epidemiology, Chair of Epidemiology, Ludwig-Maximilians University, Munich, Germany
- Helmholtz Center Munich, Institute of Epidemiology II, Munich, Germany
- Institute of Medical Statistics and Epidemiology, Technical University Munich, Munich, Germany
| | - Thomas Kneib
- Department of Statistics and Econometrics, Georg-August University Göttingen, Göttingen, Germany
| | - Heike Bickeböller
- Institute of Genetic Epidemiology, University Medical Centre, Georg-August University Göttingen, Göttingen, Germany
| | - Benjamin Hofner
- Department of Medical Informatics, Biometry and Epidemiology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
- Section Biostatistics, Paul-Ehrlich-Institut, Langen, Germany
| |
Collapse
|
4
|
Freytag S, Manitz J, Schlather M, Kneib T, Amos CI, Risch A, Chang-Claude J, Heinrich J, Bickeböller H. A network-based kernel machine test for the identification of risk pathways in genome-wide association studies. Hum Hered 2014; 76:64-75. [PMID: 24434848 DOI: 10.1159/000357567] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Accepted: 11/26/2013] [Indexed: 02/06/2023] Open
Abstract
Biological pathways provide rich information and biological context on the genetic causes of complex diseases. The logistic kernel machine test integrates prior knowledge on pathways in order to analyze data from genome-wide association studies (GWAS). In this study, the kernel converts the genomic information of 2 individuals into a quantitative value reflecting their genetic similarity. With the selection of the kernel, one implicitly chooses a genetic effect model. Like many other pathway methods, none of the available kernels accounts for the topological structure of the pathway or gene-gene interaction types. However, evidence indicates that connectivity and neighborhood of genes are crucial in the context of GWAS, because genes associated with a disease often interact. Thus, we propose a novel kernel that incorporates the topology of pathways and information on interactions. Using simulation studies, we demonstrate that the proposed method maintains the type I error correctly and can be more effective in the identification of pathways associated with a disease than non-network-based methods. We apply our approach to genome-wide association case-control data on lung cancer and rheumatoid arthritis. We identify some promising new pathways associated with these diseases, which may improve our current understanding of the genetic mechanisms.
Collapse
Affiliation(s)
- Saskia Freytag
- Institute of Genetic Epidemiology, Medical School, Göttingen, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|