1
|
Falk I, Zhao M, Nait Saada J, Guo Q. Learning the kernel for rare variant genetic association test. Front Genet 2023; 14:1245238. [PMID: 37886683 PMCID: PMC10598548 DOI: 10.3389/fgene.2023.1245238] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 09/14/2023] [Indexed: 10/28/2023] Open
Abstract
Introduction: Compared to Genome-Wide Association Studies (GWAS) for common variants, single-marker association analysis for rare variants is underpowered. Set-based association analyses for rare variants are powerful tools that capture some of the missing heritability in trait association studies. Methods: We extend the convex-optimized SKAT (cSKAT) test set procedure which learns from data the optimal convex combination of kernels, to the full Generalised Linear Model (GLM) setting with arbitrary non-genetic covariates. We call this extended cSKAT (ecSKAT) and show that the resulting optimization problem is a quadratic programming problem that can be solved with no additional cost compared to cSKAT. Results: We show that a modified objective is related to an upper bound for the p-value through a decreasing exponential term in the objective function, indicating that optimizing this objective function is a principled way of learning the combination of kernels. We evaluate the performance of the proposed method on continuous and binary traits using simulation studies and illustrate its application using UK Biobank Whole Exome Sequencing data on hand grip strength and systemic lupus erythematosus rare variant association analysis. Discussion: Our proposed ecSKAT method enables correcting for important confounders in association studies such as age, sex or population structure for both quantitative and binary traits. Simulation studies showed that ecSKAT can recover sensible weights and achieve higher power across different sample sizes and misspecification settings. Compared to the burden test and SKAT method, ecSKAT gives a lower p-value for the genes tested in both quantitative and binary traits in the UKBiobank cohort.
Collapse
Affiliation(s)
- Isak Falk
- Department of Computer Science, University College London, London, United Kingdom
- Computational Statistics and Machine Learning, Italian Institute of Technology, Genoa, Italy
| | | | | | - Qi Guo
- BenevolentAI, London, United Kingdom
| |
Collapse
|
2
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Rare variant association on unrelated individuals in case-control studies using aggregation tests: existing methods and current limitations. Brief Bioinform 2023; 24:bbad412. [PMID: 37974506 DOI: 10.1093/bib/bbad412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 10/14/2023] [Accepted: 10/28/2023] [Indexed: 11/19/2023] Open
Abstract
Over the past years, progress made in next-generation sequencing technologies and bioinformatics have sparked a surge in association studies. Especially, genome-wide association studies (GWASs) have demonstrated their effectiveness in identifying disease associations with common genetic variants. Yet, rare variants can contribute to additional disease risk or trait heterogeneity. Because GWASs are underpowered for detecting association with such variants, numerous statistical methods have been recently proposed. Aggregation tests collapse multiple rare variants within a genetic region (e.g. gene, gene set, genomic loci) to test for association. An increasing number of studies using such methods successfully identified trait-associated rare variants and led to a better understanding of the underlying disease mechanism. In this review, we compare existing aggregation tests, their statistical features and scope of application, splitting them into the five classical classes: burden, adaptive burden, variance-component, omnibus and other. Finally, we describe some limitations of current aggregation tests, highlighting potential direction for further investigations.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- WELBIO department, WEL Research Institute, avenue Pasteur, 6, 1300 Wavre, Belgium
| |
Collapse
|
3
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Excalibur: A new ensemble method based on an optimal combination of aggregation tests for rare-variant association testing for sequencing data. PLoS Comput Biol 2023; 19:e1011488. [PMID: 37708232 PMCID: PMC10522036 DOI: 10.1371/journal.pcbi.1011488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 09/26/2023] [Accepted: 09/04/2023] [Indexed: 09/16/2023] Open
Abstract
The development of high-throughput next-generation sequencing technologies and large-scale genetic association studies produced numerous advances in the biostatistics field. Various aggregation tests, i.e. statistical methods that analyze associations of a trait with multiple markers within a genomic region, have produced a variety of novel discoveries. Notwithstanding their usefulness, there is no single test that fits all needs, each suffering from specific drawbacks. Selecting the right aggregation test, while considering an unknown underlying genetic model of the disease, remains an important challenge. Here we propose a new ensemble method, called Excalibur, based on an optimal combination of 36 aggregation tests created after an in-depth study of the limitations of each test and their impact on the quality of result. Our findings demonstrate the ability of our method to control type I error and illustrate that it offers the best average power across all scenarios. The proposed method allows for novel advances in Whole Exome/Genome sequencing association studies, able to handle a wide range of association models, providing researchers with an optimal aggregation analysis for the genetic regions of interest.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- WELBIO department, WEL Research Institute, Wavre, Belgium
| |
Collapse
|
4
|
Wendel B, Heidenreich M, Budde M, Heilbronner M, Oraki Kohshour M, Papiol S, Falkai P, Schulze TG, Heilbronner U, Bickeböller H. Kalpra: A kernel approach for longitudinal pathway regression analysis integrating network information with an application to the longitudinal PsyCourse Study. Front Genet 2022; 13:1015885. [PMID: 36561312 PMCID: PMC9767414 DOI: 10.3389/fgene.2022.1015885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 11/24/2022] [Indexed: 12/12/2022] Open
Abstract
A popular approach to reduce the high dimensionality resulting from genome-wide association studies is to analyze a whole pathway in a single test for association with a phenotype. Kernel machine regression (KMR) is a highly flexible pathway analysis approach. Initially, KMR was developed to analyze a simple phenotype with just one measurement per individual. Recently, however, the investigation into the influence of genomic factors in the development of disease-related phenotypes across time (trajectories) has gained in importance. Thus, novel statistical approaches for KMR analyzing longitudinal data, i.e. several measurements at specific time points per individual are required. For longitudinal pathway analysis, we extend KMR to long-KMR using the estimation equivalence of KMR and linear mixed models. We include additional random effects to correct for the dependence structure. Moreover, within long-KMR we created a topology-based pathway analysis by combining this approach with a kernel including network information of the pathway. Most importantly, long-KMR not only allows for the investigation of the main genetic effect adjusting for time dependencies within an individual, but it also allows to test for the association of the pathway with the longitudinal course of the phenotype in the form of testing the genetic time-interaction effect. The approach is implemented as an R package, kalpra. Our simulation study demonstrates that the power of long-KMR exceeded that of another KMR method previously developed to analyze longitudinal data, while maintaining (slightly conservatively) the type I error. The network kernel improved the performance of long-KMR compared to the linear kernel. Considering different pathway densities, the power of the network kernel decreased with increasing pathway density. We applied long-KMR to cognitive data on executive function (Trail Making Test, part B) from the PsyCourse Study and 17 candidate pathways selected from Reactome. We identified seven nominally significant pathways.
Collapse
Affiliation(s)
- Bernadette Wendel
- Department of Genetic Epidemiology, University Medical Center Göttingen, Georg-August-University Göttingen, Göttingen, Germany,*Correspondence: Bernadette Wendel,
| | - Markus Heidenreich
- Department of Genetic Epidemiology, University Medical Center Göttingen, Georg-August-University Göttingen, Göttingen, Germany
| | - Monika Budde
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany
| | - Maria Heilbronner
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany
| | - Mojtaba Oraki Kohshour
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany
| | - Sergi Papiol
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany,Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, Munich, Germany
| | - Peter Falkai
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, Munich, Germany
| | - Thomas G. Schulze
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany,Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, United States,Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Urs Heilbronner
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany
| | - Heike Bickeböller
- Department of Genetic Epidemiology, University Medical Center Göttingen, Georg-August-University Göttingen, Göttingen, Germany
| |
Collapse
|
5
|
Chen W, Coombes BJ, Larson NB. Recent advances and challenges of rare variant association analysis in the biobank sequencing era. Front Genet 2022; 13:1014947. [PMID: 36276986 PMCID: PMC9582646 DOI: 10.3389/fgene.2022.1014947] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 09/22/2022] [Indexed: 12/04/2022] Open
Abstract
Causal variants for rare genetic diseases are often rare in the general population. Rare variants may also contribute to common complex traits and can have much larger per-allele effect sizes than common variants, although power to detect these associations can be limited. Sequencing costs have steadily declined with technological advancements, making it feasible to adopt whole-exome and whole-genome profiling for large biobank-scale sample sizes. These large amounts of sequencing data provide both opportunities and challenges for rare-variant association analysis. Herein, we review the basic concepts of rare-variant analysis methods, the current state-of-the-art methods in utilizing variant annotations or external controls to improve the statistical power, and particular challenges facing rare variant analysis such as accounting for population structure, extremely unbalanced case-control design. We also review recent advances and challenges in rare variant analysis for familial sequencing data and for more complex phenotypes such as survival data. Finally, we discuss other potential directions for further methodology investigation.
Collapse
Affiliation(s)
- Wenan Chen
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| | - Brandon J. Coombes
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| | - Nicholas B. Larson
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| |
Collapse
|
6
|
Misawa K. Genotype Value Decomposition: Simple Methods for the Computation of Kernel Statistics. ADVANCED GENETICS (HOBOKEN, N.J.) 2022; 3:2100066. [PMID: 36620199 PMCID: PMC9744480 DOI: 10.1002/ggn2.202100066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Indexed: 01/11/2023]
Abstract
Recent advances in sequencing technologies enable genome-wide analyses for thousands of individuals. The sequential kernel association test (SKAT) is a widely used method to test for associations between a phenotype and a set of rare variants. As the sample size of human genetics studies increases, the computational time required to calculate a kernel is becoming more and more problematic. In this study, a new method to obtain kernel statistics without calculating a kernel matrix is proposed. A simple method for the computation of two kernel statistics, namely, a kernel statistic based on a genetic relationship matrix (GRM) and one based on an identity by state (IBS) matrix, are proposed. By using this method, calculation of the kernel statistics can be conducted using vector calculation without matrix calculation. The proposed method enables one to conduct SKAT for large samples of human genetics.
Collapse
Affiliation(s)
- Kazuharu Misawa
- Department of Human GeneticsYokohama City University Graduate School of Medicine3‐9 Fukuura, Kanazawa‐kuYokohama236‐0004Japan
| |
Collapse
|
7
|
Hüls A, Robins C, Conneely KN, Edgar R, De Jager PL, Bennett DA, Wingo AP, Epstein MP, Wingo TS. Brain DNA Methylation Patterns in CLDN5 Associated With Cognitive Decline. Biol Psychiatry 2022; 91:389-398. [PMID: 33838873 PMCID: PMC8329105 DOI: 10.1016/j.biopsych.2021.01.015] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 01/06/2021] [Accepted: 01/27/2021] [Indexed: 12/21/2022]
Abstract
BACKGROUND Cognitive trajectory varies widely and can distinguish people who develop dementia from people who remain cognitively normal. Variation in cognitive trajectory is only partially explained by traditional neuropathologies. We sought to identify novel genes associated with cognitive trajectory using DNA methylation profiles from human postmortem brain. METHODS We performed a brain epigenome-wide association study of cognitive trajectory in 636 participants from the ROS (Religious Orders Study) and MAP (Rush Memory and Aging Project) using DNA methylation profiles of the dorsolateral prefrontal cortex. To maximize our power to detect epigenetic associations, we used the recently developed Gene Association with Multiple Traits test to analyze the 5 measured cognitive domains simultaneously. RESULTS We found an epigenome-wide association for differential methylation of sites in the CLDN5 locus and cognitive trajectory (p = 9.96 × 10-7) that was robust to adjustment for cell type proportions (p = 8.52 × 10-7). This association was primarily driven by association with declines in episodic (p = 4.65 × 10-6) and working (p = 2.54 × 10-7) memory. This association between methylation in CLDN5 and cognitive decline was significant even in participants with no or little signs of amyloid-β and neurofibrillary tangle pathology. CONCLUSIONS Differential methylation of CLDN5, a gene that encodes an important protein of the blood-brain barrier, is associated with cognitive trajectory beyond traditional Alzheimer's disease pathologies. The association between CLDN5 methylation and cognitive trajectory in people with low pathology suggests an early role for CLDN5 and blood-brain barrier dysfunction in cognitive decline and Alzheimer's disease.
Collapse
Affiliation(s)
- Anke Hüls
- Department of Human Genetics, Emory University, Atlanta, Georgia; Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, Georgia; Gangarosa Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, Georgia
| | - Chloe Robins
- Department of Neurology, Emory University School of Medicine, Atlanta, Georgia
| | - Karen N Conneely
- Department of Human Genetics, Emory University, Atlanta, Georgia
| | - Rachel Edgar
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, British Columbia, Canada; Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Philip L De Jager
- Center for Translational and Computational Neuroimmunology, Department of Neurology, Columbia University Medical Center, New York, New York
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, Illinois
| | - Aliza P Wingo
- Department of Psychiatry, Emory University School of Medicine, Atlanta, Georgia; Division of Mental Health, Atlanta VA Medical Center, Decatur, Georgia
| | | | - Thomas S Wingo
- Department of Human Genetics, Emory University, Atlanta, Georgia; Department of Neurology, Emory University School of Medicine, Atlanta, Georgia.
| |
Collapse
|
8
|
Correa R, Alonso-Pupo N, Hernández Rodríguez EW. Multi-omics data integration approaches for precision oncology. Mol Omics 2022; 18:469-479. [DOI: 10.1039/d1mo00411e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Next-generation sequencing (NGS) has been pivotal to enhance the molecular characterization of human malignancies, allowing multiple omics data types to be available for cancer researchers and practitioners. In this context,...
Collapse
|
9
|
Maus Esfahani N, Catchpoole D, Khan J, Kennedy PJ. MCKAT: a multi-dimensional copy number variant kernel association test. BMC Bioinformatics 2021; 22:588. [PMID: 34895138 PMCID: PMC8666084 DOI: 10.1186/s12859-021-04494-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 11/25/2021] [Indexed: 11/25/2022] Open
Abstract
Background Copy number variants (CNVs) are the gain or loss of DNA segments in the genome. Studies have shown that CNVs are linked to various disorders, including autism, intellectual disability, and schizophrenia. Consequently, the interest in studying a possible association of CNVs to specific disease traits is growing. However, due to the specific multi-dimensional characteristics of the CNVs, methods for testing the association between CNVs and the disease-related traits are still underdeveloped. We propose a novel multi-dimensional CNV kernel association test (MCKAT) in this paper. We aim to find significant associations between CNVs and disease-related traits using kernel-based methods. Results We address the multi-dimensionality in CNV characteristics. We first design a single pair CNV kernel, which contains three sub-kernels to summarize the similarity between two CNVs considering all CNV characteristics. Then, aggregate single pair CNV kernel to the whole chromosome CNV kernel, which summarizes the similarity between CNVs in two or more chromosomes. Finally, the association between the CNVs and disease-related traits is evaluated by comparing the similarity in the trait with kernel-based similarity using a score test in a random effect model. We apply MCKAT on genome-wide CNV datasets to examine the association between CNVs and disease-related traits, which demonstrates the potential usefulness the proposed method has for the CNV association tests. We compare the performance of MCKAT with CKAT, a uni-dimensional kernel method. Based on the results, MCKAT indicates stronger evidence, smaller p-value, in detecting significant associations between CNVs and disease-related traits in both rare and common CNV datasets. Conclusion A multi-dimensional copy number variant kernel association test can detect statistically significant associated CNV regions with any disease-related trait. MCKAT can provide biologists with CNV hot spots at the cytogenetic band level that CNVs on them may have a significant association with disease-related traits. Using MCKAT, biologists can narrow their investigation from the whole genome, including many genes and CNVs, to more specific cytogenetic bands that MCKAT identifies. Furthermore, MCKAT can help biologists detect significantly associated CNVs with disease-related traits across a patient group instead of examining each subject’s CNVs case by case.
Collapse
Affiliation(s)
- Nastaran Maus Esfahani
- Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney, Australia.
| | - Daniel Catchpoole
- Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney, Australia.,The Tumour Bank, The Children's Hospital at Westmead, Sydney, Australia
| | - Javed Khan
- Center for Cancer Research, National Cancer Institute, Bethesda, USA
| | - Paul J Kennedy
- Australian Artificial Intelligence Institute, University of Technology Sydney, Sydney, Australia
| |
Collapse
|
10
|
Carpenter CM, Zhang W, Gillenwater L, Severn C, Ghosh T, Bowler R, Kechris K, Ghosh D. PaIRKAT: A pathway integrated regression-based kernel association test with applications to metabolomics and COPD phenotypes. PLoS Comput Biol 2021; 17:e1008986. [PMID: 34679079 PMCID: PMC8565741 DOI: 10.1371/journal.pcbi.1008986] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 11/03/2021] [Accepted: 10/13/2021] [Indexed: 02/02/2023] Open
Abstract
High-throughput data such as metabolomics, genomics, transcriptomics, and proteomics have become familiar data types within the "-omics" family. For this work, we focus on subsets that interact with one another and represent these "pathways" as graphs. Observed pathways often have disjoint components, i.e., nodes or sets of nodes (metabolites, etc.) not connected to any other within the pathway, which notably lessens testing power. In this paper we propose the Pathway Integrated Regression-based Kernel Association Test (PaIRKAT), a new kernel machine regression method for incorporating known pathway information into the semi-parametric kernel regression framework. This work extends previous kernel machine approaches. This paper also contributes an application of a graph kernel regularization method for overcoming disconnected pathways. By incorporating a regularized or "smoothed" graph into a score test, PaIRKAT can provide more powerful tests for associations between biological pathways and phenotypes of interest and will be helpful in identifying novel pathways for targeted clinical research. We evaluate this method through several simulation studies and an application to real metabolomics data from the COPDGene study. Our simulation studies illustrate the robustness of this method to incorrect and incomplete pathway knowledge, and the real data analysis shows meaningful improvements of testing power in pathways. PaIRKAT was developed for application to metabolomic pathway data, but the techniques are easily generalizable to other data sources with a graph-like structure.
Collapse
Affiliation(s)
- Charlie M. Carpenter
- Department of Biostatistics and Informatics, University of Colorado Denver, Anschutz Medical campus, Denver, Colorado, United States of America
| | - Weiming Zhang
- Syneos Health, Morrisville, North Carolina, United States of America
| | - Lucas Gillenwater
- Computational Bioscience Program, University of Colorado Denver, Anschutz medical campus, Denver, Colorado, United States of America
| | - Cameron Severn
- Department of Biostatistics and Informatics, University of Colorado Denver, Anschutz Medical campus, Denver, Colorado, United States of America
| | - Tusharkanti Ghosh
- Department of Biostatistics and Informatics, University of Colorado Denver, Anschutz Medical campus, Denver, Colorado, United States of America
| | - Russell Bowler
- Department of Medicine, National Jewish Health, Denver; University of Colorado Denver, Anschutz Medical Campus, Denver, Colorado, United States of America
| | - Katerina Kechris
- Department of Biostatistics and Informatics, University of Colorado Denver, Anschutz Medical campus, Denver, Colorado, United States of America
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, University of Colorado Denver, Anschutz Medical campus, Denver, Colorado, United States of America
| |
Collapse
|
11
|
Pluta D, Shen T, Xue G, Chen C, Ombao H, Yu Z. Ridge-penalized adaptive Mantel test and its application in imaging genetics. Stat Med 2021; 40:5313-5332. [PMID: 34216035 DOI: 10.1002/sim.9127] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 06/01/2021] [Accepted: 06/16/2021] [Indexed: 01/23/2023]
Abstract
We propose a ridge-penalized adaptive Mantel test (AdaMant) for evaluating the association of two high-dimensional sets of features. By introducing a ridge penalty, AdaMant tests the association across many metrics simultaneously. We demonstrate how ridge penalization bridges Euclidean and Mahalanobis distances and their corresponding linear models from the perspective of association measurement and testing. This result is not only theoretically interesting but also has important implications in penalized hypothesis testing, especially in high-dimensional settings such as imaging genetics. Applying the proposed method to an imaging genetic study of visual working memory in healthy adults, we identified interesting associations of brain connectivity (measured by electroencephalogram coherence) with selected genetic features.
Collapse
Affiliation(s)
- Dustin Pluta
- Department of Statistics, University of California, Irvine, Irvine, California, USA
| | - Tong Shen
- Department of Statistics, University of California, Irvine, Irvine, California, USA
| | - Gui Xue
- Center for Brain and Learning Science, Beijing Normal University, Beijing, China
| | - Chuansheng Chen
- Department of Psychology and Social Behavior, University of California, Irvine, Irvine, California, USA
| | - Hernando Ombao
- Statistics Program, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Zhaoxia Yu
- Department of Statistics, University of California, Irvine, Irvine, California, USA
| |
Collapse
|
12
|
Zhan X, Banerjee K, Chen J. Variant-set association test for generalized linear mixed model. Genet Epidemiol 2021; 45:402-412. [PMID: 33604919 DOI: 10.1002/gepi.22378] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 01/18/2021] [Accepted: 01/25/2021] [Indexed: 12/22/2022]
Abstract
Advances in high-throughput biotechnologies have culminated in a wide range of omics (such as genomics, epigenomics, transcriptomics, metabolomics, and metagenomics) studies, and increasing evidence in these studies indicates that the biological architecture of complex traits involves a large number of omics variants each with minor effects but collectively accounting for the full phenotypic variability. Thus, a major challenge in many "ome-wide" association analyses is to achieve adequate statistical power to identify multiple variants of small effect sizes, which is notoriously difficult for studies with relatively small-sample sizes. A small-sample adjustment incorporated in the kernel machine regression framework was proposed to solve this for association studies under various settings. However, such an adjustment in the generalized linear mixed model (GLMM) framework, which accounts for both sample relatedness and non-Gaussian outcomes, has not yet been attempted. In this study, we fill this gap by extending small-sample adjustment in kernel machine association test to GLMM. We propose a new Variant-Set Association Test (VSAT), a powerful and efficient analysis tool in GLMM, to examine the association between a set of omics variants and correlated phenotypes. The usefulness of VSAT is demonstrated using both numerical simulation studies and applications to data collected from multiple association studies. The software for implementing the proposed method in R is available at https://www.github.com/jchen1981/SSKAT.
Collapse
Affiliation(s)
- Xiang Zhan
- Department of Public Health Sciences, Pennsylvania State University, Hershey, Pennsylvania, USA
| | - Kalins Banerjee
- Department of Public Health Sciences, Pennsylvania State University, Hershey, Pennsylvania, USA
| | - Jun Chen
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
13
|
Schaid DJ, Sinnwell JP, Larson NB, Chen J. Penalized variance components for association of multiple genes with traits. Genet Epidemiol 2021; 44:665-675. [PMID: 33463755 DOI: 10.1002/gepi.22340] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 06/26/2020] [Accepted: 07/05/2020] [Indexed: 12/19/2022]
Abstract
Variance component models have gained popularity for genetic analyses, driven by their flexibility to simultaneously analyze multiple genetic variants in a gene by kernel statistics, and their ability to account for population stratification via genomic relationship matrices. For exploratory analyses with modest sample sizes and a potentially large number of variance components, it can be challenging to use standard maximum-likelihood or restricted maximum-likelihood methods to estimate variance components, because these iterative methods often fail to converge when likelihood surfaces are fairly flat, and standard-likelihood ratio statistical tests are not adequate. To overcome these limitations, we developed a penalized-likelihood model, whereby the penalty function follows the popular elastic-net approach, applying both L1 and L2 penalties to the variance components. By simulations, we demonstrate the potential gain in power by using both L1 and L2 penalties, and results from our simulations suggest that assigning 80% of the penalty parameter to the L1 penalty and 20% to the L2 penalty provides a reasonable balance between false-positive and false-negative results. Larger sample size improves the properties of our methods, at the cost of longer computation time. Application of our methods to a study of the influence of DNA methylation on levels of cortisol in reaction to stress testing shows how our method can be used to prioritize findings for further functional studies.
Collapse
Affiliation(s)
- Daniel J Schaid
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| | - Jason P Sinnwell
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| | - Nicholas B Larson
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| | - Jun Chen
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
14
|
Partanen J, Hyvärinen K, Bickeböller H, Bogunia-Kubik K, Crossland RE, Ivanova M, Perutelli F, Dressel R. Review of Genetic Variation as a Predictive Biomarker for Chronic Graft-Versus-Host-Disease After Allogeneic Stem Cell Transplantation. Front Immunol 2020; 11:575492. [PMID: 33193367 PMCID: PMC7604383 DOI: 10.3389/fimmu.2020.575492] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Accepted: 09/28/2020] [Indexed: 12/11/2022] Open
Abstract
Chronic graft-versus-host disease (cGvHD) is one of the major complications of allogeneic stem cell transplantation (HSCT). cGvHD is an autoimmune-like disorder affecting multiple organs and involves a dermatological rash, tissue inflammation and fibrosis. The incidence of cGvHD has been reported to be as high as 30% to 60% and there are currently no reliable tools for predicting the occurrence of cGvHD. There is therefore an important unmet clinical need for predictive biomarkers. The present review summarizes the state of the art for genetic variation as a predictive biomarker for cGvHD. We discuss three different modes of action for genetic variation in transplantation: genetic associations, genetic matching, and pharmacogenetics. The results indicate that currently, there are no genetic polymorphisms or genetic tools that can be reliably used as validated biomarkers for predicting cGvHD. A number of recommendations for future studies can be drawn. The majority of studies to date have been under-powered and included too few patients and genetic markers. Like in all complex multifactorial diseases, large collaborative genome-level studies are now needed to achieve reliable and unbiased results. Some of the candidate genes, in particular, CTLA4, HSPE, IL1R1, CCR6, FGFR1OP, and IL10, and some non-HLA variants in the HLA gene region have been replicated to be associated with cGvHD risk in independent studies. These associations should now be confirmed in large well-characterized cohorts with fine mapping. Some patients develop cGvHD despite very extensive immunosuppression and other treatments, indicating that the current therapeutic regimens may not always be effective enough. Hence, more studies on pharmacogenetics are also required. Moreover, all of these studies should be adjusted for diagnostic and clinical features of cGvHD. We conclude that future studies should focus on modern genome-level tools, such as machine learning, polygenic risk scores and genome-wide association study-transcription meta-analyses, instead of focusing on just single variants. The risk of cGvHD may be related to the summary level of immunogenetic differences, or whole genome histocompatibility between each donor-recipient pair. As the number of genome-wide analyses in HSCT is increasing, we are approaching an era where there will be sufficient data to incorporate these approaches in the near future.
Collapse
Affiliation(s)
- Jukka Partanen
- Finnish Red Cross Blood Service, Research and Development, Helsinki, Finland
| | - Kati Hyvärinen
- Finnish Red Cross Blood Service, Research and Development, Helsinki, Finland
| | - Heike Bickeböller
- Department of Genetic Epidemiology, University Medical Center Göttingen, Göttingen, Germany
| | - Katarzyna Bogunia-Kubik
- Hirszfeld Institute of Immunology and Experimental Therapy, Polish Academy of Sciences, Wroclaw, Poland
| | - Rachel E Crossland
- Haematological Sciences, Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Milena Ivanova
- Medical University, University Hospital Alexandrovska, Sofia, Bulgaria
| | - Francesca Perutelli
- Haematological Sciences, Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, United Kingdom.,Section of Hematology, Department of Clinical and Experimental Medicine, University of Pisa, Pisa, Italy
| | - Ralf Dressel
- Institute of Cellular and Molecular Immunology, University Medical Center Göttingen, Göttingen, Germany
| |
Collapse
|
15
|
Xia Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 171:309-491. [PMID: 32475527 DOI: 10.1016/bs.pmbts.2020.04.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.
Collapse
Affiliation(s)
- Yinglin Xia
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, United States.
| |
Collapse
|
16
|
A robust test for X-chromosome genetic association accounting for X-chromosome inactivation and imprinting. Genet Res (Camb) 2020; 102:e2. [PMID: 32234109 PMCID: PMC7132553 DOI: 10.1017/s0016672320000026] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
The X chromosome is known to play an important role in many sex-specific diseases. However, only a few single-nucleotide polymorphisms on the X chromosome have been found to be associated with diseases. Compared to the autosomes, conducting association tests on the X chromosome is more intractable due to the difference in the number of X chromosomes between females and males. On the other hand, X-chromosome inactivation takes place in female mammals, which is a phenomenon in which the expression of one copy of two X chromosomes in females is silenced in order to achieve the same gene expression level as that in males. In addition, imprinting effects may be related to certain diseases. Currently, there are some existing approaches taking X-chromosome inactivation into account when testing for associations on the X chromosome. However, none of them allows for imprinting effects. Therefore, in this paper, we propose a robust test, ZXCII, which accounts for both X-chromosome inactivation and imprinting effects without requiring specifying the genetic models in advance. Simulation studies are conducted in order to investigate the validity and performance of ZXCII under various scenarios of different parameter values. The simulation results show that ZXCII controls the type I error rate well when there is no association. Furthermore, with regards to power, ZXCII is robust in all of the situations considered and generally outperforms most of the existing methods in the presence of imprinting effects, especially under complete imprinting effects.
Collapse
|