1
|
Mbatchou J, McPeek MS. JASPER: Fast, powerful, multitrait association testing in structured samples gives insight on pleiotropy in gene expression. Am J Hum Genet 2024; 111:1750-1769. [PMID: 39025064 PMCID: PMC11339629 DOI: 10.1016/j.ajhg.2024.06.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 06/19/2024] [Accepted: 06/20/2024] [Indexed: 07/20/2024] Open
Abstract
Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction, and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks, or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture, and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits, and microbiome abundances. It allows for covariates, ascertainment, and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, most of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.
Collapse
Affiliation(s)
- Joelle Mbatchou
- Regeneron Genetics Center, Tarrytown, NY 10591, USA; Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
| | - Mary Sara McPeek
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA; Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
2
|
Mbatchou J, McPeek MS. JASPER: fast, powerful, multitrait association testing in structured samples gives insight on pleiotropy in gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.18.571948. [PMID: 38187553 PMCID: PMC10769254 DOI: 10.1101/2023.12.18.571948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits and microbiome abundances. It allows for covariates, ascertainment and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, some of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.
Collapse
Affiliation(s)
- Joelle Mbatchou
- Regeneron Genetics Center, Tarrytown, NY 10591, USA
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
| | - Mary Sara McPeek
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
- Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
3
|
Chen H, Naseri A, Zhi D. FiMAP: A fast identity-by-descent mapping test for biobank-scale cohorts. PLoS Genet 2023; 19:e1011057. [PMID: 38039339 PMCID: PMC10718418 DOI: 10.1371/journal.pgen.1011057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 12/13/2023] [Accepted: 11/07/2023] [Indexed: 12/03/2023] Open
Abstract
Although genome-wide association studies (GWAS) have identified tens of thousands of genetic loci, the genetic architecture is still not fully understood for many complex traits. Most GWAS and sequencing association studies have focused on single nucleotide polymorphisms or copy number variations, including common and rare genetic variants. However, phased haplotype information is often ignored in GWAS or variant set tests for rare variants. Here we leverage the identity-by-descent (IBD) segments inferred from a random projection-based IBD detection algorithm in the mapping of genetic associations with complex traits, to develop a computationally efficient statistical test for IBD mapping in biobank-scale cohorts. We used sparse linear algebra and random matrix algorithms to speed up the computation, and a genome-wide IBD mapping scan of more than 400,000 samples finished within a few hours. Simulation studies showed that our new method had well-controlled type I error rates under the null hypothesis of no genetic association in large biobank-scale cohorts, and outperformed traditional GWAS single-variant tests when the causal variants were untyped and rare, or in the presence of haplotype effects. We also applied our method to IBD mapping of six anthropometric traits using the UK Biobank data and identified a total of 3,442 associations, 2,131 (62%) of which remained significant after conditioning on suggestive tag variants in the ± 3 centimorgan flanking regions from GWAS.
Collapse
Affiliation(s)
- Han Chen
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Ardalan Naseri
- Center for Artificial Intelligence and Genome Informatics, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Degui Zhi
- Center for Artificial Intelligence and Genome Informatics, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| |
Collapse
|
4
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Rare variant association on unrelated individuals in case-control studies using aggregation tests: existing methods and current limitations. Brief Bioinform 2023; 24:bbad412. [PMID: 37974506 DOI: 10.1093/bib/bbad412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 10/14/2023] [Accepted: 10/28/2023] [Indexed: 11/19/2023] Open
Abstract
Over the past years, progress made in next-generation sequencing technologies and bioinformatics have sparked a surge in association studies. Especially, genome-wide association studies (GWASs) have demonstrated their effectiveness in identifying disease associations with common genetic variants. Yet, rare variants can contribute to additional disease risk or trait heterogeneity. Because GWASs are underpowered for detecting association with such variants, numerous statistical methods have been recently proposed. Aggregation tests collapse multiple rare variants within a genetic region (e.g. gene, gene set, genomic loci) to test for association. An increasing number of studies using such methods successfully identified trait-associated rare variants and led to a better understanding of the underlying disease mechanism. In this review, we compare existing aggregation tests, their statistical features and scope of application, splitting them into the five classical classes: burden, adaptive burden, variance-component, omnibus and other. Finally, we describe some limitations of current aggregation tests, highlighting potential direction for further investigations.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- WELBIO department, WEL Research Institute, avenue Pasteur, 6, 1300 Wavre, Belgium
| |
Collapse
|
5
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Excalibur: A new ensemble method based on an optimal combination of aggregation tests for rare-variant association testing for sequencing data. PLoS Comput Biol 2023; 19:e1011488. [PMID: 37708232 PMCID: PMC10522036 DOI: 10.1371/journal.pcbi.1011488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 09/26/2023] [Accepted: 09/04/2023] [Indexed: 09/16/2023] Open
Abstract
The development of high-throughput next-generation sequencing technologies and large-scale genetic association studies produced numerous advances in the biostatistics field. Various aggregation tests, i.e. statistical methods that analyze associations of a trait with multiple markers within a genomic region, have produced a variety of novel discoveries. Notwithstanding their usefulness, there is no single test that fits all needs, each suffering from specific drawbacks. Selecting the right aggregation test, while considering an unknown underlying genetic model of the disease, remains an important challenge. Here we propose a new ensemble method, called Excalibur, based on an optimal combination of 36 aggregation tests created after an in-depth study of the limitations of each test and their impact on the quality of result. Our findings demonstrate the ability of our method to control type I error and illustrate that it offers the best average power across all scenarios. The proposed method allows for novel advances in Whole Exome/Genome sequencing association studies, able to handle a wide range of association models, providing researchers with an optimal aggregation analysis for the genetic regions of interest.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- WELBIO department, WEL Research Institute, Wavre, Belgium
| |
Collapse
|
6
|
Lecomte M, Tomassi D, Rizzoli R, Tenon M, Berton T, Harney S, Fança-Berthon P. Effect of a Hop Extract Standardized in 8-Prenylnaringenin on Bone Health and Gut Microbiome in Postmenopausal Women with Osteopenia: A One-Year Randomized, Double-Blind, Placebo-Controlled Trial. Nutrients 2023; 15:2688. [PMID: 37375599 DOI: 10.3390/nu15122688] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 05/21/2023] [Accepted: 05/25/2023] [Indexed: 06/29/2023] Open
Abstract
Estrogen deficiency increases the risk of osteoporosis and fracture. The aim of this study was to investigate whether a hop extract standardized in 8-prenylnaringenin (8-PN), a potent phytoestrogen, could improve bone status of osteopenic women and to explore the gut microbiome roles in this effect. In this double-blind, placebo-controlled, randomized trial, 100 postmenopausal, osteopenic women were supplemented with calcium and vitamin D3 (CaD) tablets and either a hop extract (HE) standardized in 8-PN (n = 50) or a placebo (n = 50) for 48 weeks. Bone mineral density (BMD) and bone metabolism were assessed by DXA measurements and plasma bone biomarkers, respectively. Participant's quality of life (SF-36), gut microbiome composition, and short-chain fatty acid (SCFA) levels were also investigated. In addition to the CaD supplements, 48 weeks of HE supplementation increased total body BMD (1.8 ± 0.4% vs. baseline, p < 0.0001; 1.0 ± 0.6% vs. placebo, p = 0.08), with a higher proportion of women experiencing an increase ≥1% compared to placebo (odds ratio: 2.41 ± 1.07, p < 0.05). An increase in the SF-36 physical functioning score was observed with HE versus placebo (p = 0.05). Gut microbiome α-diversity and SCFA levels did not differ between groups. However, a higher abundance of genera Turicibacter and Shigella was observed in the HE group; both genera have been previously identified as associated with total body BMD. These results suggest that an 8-PN standardized hop extract could beneficially impact bone health of postmenopausal women with osteopenia.
Collapse
Affiliation(s)
| | | | - René Rizzoli
- Service of Bone Disease, Geneva University Hospitals and Faculty of Medicine, 1211 Geneva, Switzerland
| | | | | | - Sinead Harney
- Rheumatology Department, Cork University Hospital, T12 DFK4 Cork, Ireland
| | | |
Collapse
|
7
|
Knutson KA, Pan W. MATS: a novel multi-ancestry transcriptome-wide association study to account for heterogeneity in the effects of cis-regulated gene expression on complex traits. Hum Mol Genet 2023; 32:1237-1251. [PMID: 36179104 PMCID: PMC10077507 DOI: 10.1093/hmg/ddac247] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 09/16/2022] [Accepted: 09/28/2022] [Indexed: 01/16/2023] Open
Abstract
The Transcriptome-Wide Association Study (TWAS) is a widely used approach which integrates gene expression and Genome Wide Association Study (GWAS) data to study the role of cis-regulated gene expression (GEx) in complex traits. However, the genetic architecture of GEx varies across populations, and recent findings point to possible ancestral heterogeneity in the effects of GEx on complex traits, which may be amplified in TWAS by modeling GEx as a function of cis-eQTLs. Here, we present a novel extension to TWAS to account for heterogeneity in the effects of cis-regulated GEx which are correlated with ancestry. Our proposed Multi-Ancestry TwaS (MATS) framework jointly analyzes samples from multiple populations and distinguishes between shared, ancestry-specific and/or subject-specific expression-trait associations. As such, MATS amplifies power to detect shared GEx associations over ancestry-stratified TWAS through increased sample sizes, and facilitates the detection of genes with subgroup-specific associations which may be masked by standard TWAS. Our simulations highlight the improved Type-I error conservation and power of MATS compared with competing approaches. Our real data applications to Alzheimer's disease (AD) case-control genotypes from the Alzheimer's Disease Sequencing Project (ADSP) and continuous phenotypes from the UK Biobank (UKBB) identify a number of unique gene-trait associations which were not discovered through standard and/or ancestry-stratified TWAS. Ultimately, these findings promote MATS as a powerful method for detecting and estimating significant gene expression effects on complex traits within multi-ancestry cohorts and corroborates the mounting evidence for inter-population heterogeneity in gene-trait associations.
Collapse
Affiliation(s)
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
8
|
Wang Z, Yin L, Qi Y, Zhang J, Zhu H, Tang J. Intestinal Flora-Derived Kynurenic Acid Protects Against Intestinal Damage Caused by Candida albicans Infection via Activation of Aryl Hydrocarbon Receptor. Front Microbiol 2022; 13:934786. [PMID: 35923391 PMCID: PMC9339982 DOI: 10.3389/fmicb.2022.934786] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 06/20/2022] [Indexed: 12/25/2022] Open
Abstract
Colonization of the intestinal tract by Candida albicans (C. albicans) can lead to invasive candidiasis. Therefore, a functional intestinal epithelial barrier is critical for protecting against invasive C. albicans infections. We collected fecal samples from patients with Candida albicans bloodstream infection and healthy people. Through intestinal flora 16sRNA sequencing and intestinal metabolomic analysis, we found that C. albicans infection resulted in a significant decrease in the expression of the metabolite kynurenic acid (KynA). We used a repeated C. albicans intestinal infection mouse model, established following intake of 3% dextran sulfate sodium salt (DSS) for 9 days, and found that KynA, a tryptophan metabolite, inhibited inflammation, promoted expression of intestinal tight junction proteins, and protected from intestinal barrier damage caused by invasive Candida infections. We also demonstrated that KynA activated aryl hydrocarbon receptor (AHR) repressor in vivo and in vitro. Using Caco-2 cells co-cultured with C. albicans, we showed that KynA activated AHR, inhibited the myosin light chain kinase-phospho-myosin light chain (MLCK-pMLC) signaling pathway, and promoted tristetraprolin (TTP) expression to alleviate intestinal inflammation. Our findings suggest that the metabolite KynA which is differently expressed in patients with C. albicans infection and has a protective effect on the intestinal epithelium, via activating AHR, could be explored to provide new potential therapeutic strategies for invasive C. albicans infections.
Collapse
Affiliation(s)
- Zetian Wang
- Department of Trauma-Emergency & Critical Care Medicine, Shanghai Fifth People’s Hospital, Fudan University, Shanghai, China
| | - Liping Yin
- Department of Trauma-Emergency & Critical Care Medicine, Shanghai Fifth People’s Hospital, Fudan University, Shanghai, China
| | - Yue Qi
- Department of Trauma-Emergency & Critical Care Medicine, Shanghai Fifth People’s Hospital, Fudan University, Shanghai, China
| | - Jiali Zhang
- Department of Central Laboratory, Shanghai Fifth People’s Hospital, Fudan University, Shanghai, China
| | - Haiyan Zhu
- Department of Biological Medicines & Shanghai Engineering Research Center of Immunotherapeutics, Fudan University School of Pharmacy, Shanghai, China
- Haiyan Zhu,
| | - Jianguo Tang
- Department of Trauma-Emergency & Critical Care Medicine, Shanghai Fifth People’s Hospital, Fudan University, Shanghai, China
- *Correspondence: Jianguo Tang,
| |
Collapse
|
9
|
Huang C, Callahan BJ, Wu MC, Holloway ST, Brochu H, Lu W, Peng X, Tzeng JY. Phylogeny-guided microbiome OTU-specific association test (POST). MICROBIOME 2022; 10:86. [PMID: 35668471 PMCID: PMC9171974 DOI: 10.1186/s40168-022-01266-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 04/01/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND The relationship between host conditions and microbiome profiles, typically characterized by operational taxonomic units (OTUs), contains important information about the microbial role in human health. Traditional association testing frameworks are challenged by the high dimensionality and sparsity of typical microbiome profiles. Phylogenetic information is often incorporated to address these challenges with the assumption that evolutionarily similar taxa tend to behave similarly. However, this assumption may not always be valid due to the complex effects of microbes, and phylogenetic information should be incorporated in a data-supervised fashion. RESULTS In this work, we propose a local collapsing test called phylogeny-guided microbiome OTU-specific association test (POST). In POST, whether or not to borrow information and how much information to borrow from the neighboring OTUs in the phylogenetic tree are supervised by phylogenetic distance and the outcome-OTU association. POST is constructed under the kernel machine framework to accommodate complex OTU effects and extends kernel machine microbiome tests from community level to OTU level. Using simulation studies, we show that when the phylogenetic tree is informative, POST has better performance than existing OTU-level association tests. When the phylogenetic tree is not informative, POST achieves similar performance as existing methods. Finally, in real data applications on bacterial vaginosis and on preterm birth, we find that POST can identify similar or more outcome-associated OTUs that are of biological relevance compared to existing methods. CONCLUSIONS Using POST, we show that adaptively leveraging the phylogenetic information can enhance the selection performance of associated microbiome features by improving the overall true-positive and false-positive detection. We developed a user friendly R package POSTm which is freely available on CRAN ( https://CRAN.R-project.org/package=POSTm ). Video Abstract.
Collapse
Affiliation(s)
- Caizhi Huang
- Bioinformatics Research Center, North Carolina State University, Raleigh, 27606, USA
| | - Benjamin J Callahan
- Bioinformatics Research Center, North Carolina State University, Raleigh, 27606, USA
- Department of Population Health and Pathobiology, North Carolina State University, Raleigh, 27607, USA
| | - Michael C Wu
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, 98109, USA
| | - Shannon T Holloway
- Department of Statistics, North Carolina State University, Raleigh, 27606, USA
| | - Hayden Brochu
- Bioinformatics Research Center, North Carolina State University, Raleigh, 27606, USA
- Department of Molecular Biomedical Sciences, North Carolina State University, Raleigh, 27607, USA
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh, 27606, USA
| | - Xinxia Peng
- Bioinformatics Research Center, North Carolina State University, Raleigh, 27606, USA
- Department of Molecular Biomedical Sciences, North Carolina State University, Raleigh, 27607, USA
| | - Jung-Ying Tzeng
- Bioinformatics Research Center, North Carolina State University, Raleigh, 27606, USA.
- Department of Statistics, North Carolina State University, Raleigh, 27606, USA.
| |
Collapse
|
10
|
Jiang Z, He M, Chen J, Zhao N, Zhan X. MiRKAT-MC: A Distance-Based Microbiome Kernel Association Test With Multi-Categorical Outcomes. Front Genet 2022; 13:841764. [PMID: 35432465 PMCID: PMC9010828 DOI: 10.3389/fgene.2022.841764] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 03/10/2022] [Indexed: 12/14/2022] Open
Abstract
Increasing evidence has elucidated that the microbiome plays a critical role in many human diseases. Apart from continuous and binary traits that measure the extent or presence of a disease, multi-categorical outcomes including variations/subtypes of a disease or ordinal levels of disease severity are commonly seen in clinical studies. On top of that, studies with clustered design (i.e., family-based and longitudinal studies) are popular alternatives to population-based ones as they are able to identify characteristics on both individual and population levels and to investigate the trajectory of traits of interest over time. However, existing methods for microbiome association analysis are inadequate to handle multi-categorical outcomes, neither independent nor clustered data. We propose a microbiome kernel association test with multi-categorical outcomes (MiRKAT-MC). Our method is versatile to deal with both nominal and ordinal outcomes for independent and clustered data. In addition, it incorporates multiple ecological distances to allow for different association patterns between outcomes and microbiome compositions to be incorporated. A computationally efficient pseudo-permutation strategy is used to evaluate the statistical significance. Comprehensive simulations show that MiRKAT-MC preserves the nominal type I error and increases statistical powers under various scenarios and data types. We also apply MiRKAT-MC to real data sets with nominal and ordinal outcomes to gain biological insights. MiRKAT-MC is easy to implement, and freely available via an R package at https://github.com/Zhiwen-Owen-Jiang/MiRKATMC with a Graphical User Interface through R Shinny also available.
Collapse
Affiliation(s)
- Zhiwen Jiang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, United States
| | - Mengyu He
- Department Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, United States
| | - Jun Chen
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
| | - Ni Zhao
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, United States
- *Correspondence: Ni Zhao, ; Xiang Zhan,
| | - Xiang Zhan
- Department of Biostatistics, School of Public Health and Beijing International Center for Mathematical Research, Peking University, Beijing, China
- *Correspondence: Ni Zhao, ; Xiang Zhan,
| |
Collapse
|
11
|
Wang T, Ling W, Plantinga AM, Wu MC, Zhan X. Testing microbiome association using integrated quantile regression models. Bioinformatics 2022; 38:419-425. [PMID: 34554223 PMCID: PMC10060731 DOI: 10.1093/bioinformatics/btab668] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 08/24/2021] [Accepted: 09/18/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Most existing microbiome association analyses focus on the association between microbiome and conditional mean of health or disease-related outcomes, and within this vein, vast computational tools and methods have been devised for standard binary or continuous outcomes. However, these methods tend to be limited either when the underlying microbiome-outcome association occurs somewhere other than the mean level, or when distribution of the outcome variable is irregular (e.g. zero-inflated or mixtures) such that conditional outcome mean is less meaningful. We address this gap by investigating association analysis between microbiome compositions and conditional outcome quantiles. RESULTS We introduce a new association analysis tool named MiRKAT-IQ within the Microbiome Regression-based Kernel Association Test framework using Integrated Quantile regression models to examine the association between microbiome and the distribution of outcome. For an individual quantile, we utilize the existing kernel machine regression framework to examine the association between that conditional outcome quantile and a group of microbial features (e.g. microbiome community compositions). Then, the goal of examining microbiome association with the whole outcome distribution is achieved by integrating all outcome conditional quantiles over a process, and thus our new MiRKAT-IQ test is robust to both the location of association signals (e.g. mean, variance, median) and the heterogeneous distribution of the outcome. Extensive numerical simulation studies have been conducted to show the validity of the new MiRKAT-IQ test. We demonstrate the potential usefulness of MiRKAT-IQ with applications to actual biological data collected from a previous microbiome study. AVAILABILITY AND IMPLEMENTATION R codes to implement the proposed methodology is provided in the MiRKAT package, which is available on CRAN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tianying Wang
- Center for Statistical Science, Tsinghua University, Beijing 100084, China
- Department of Industrial Engineering, Tsinghua University, Beijing 100084, China
| | - Wodan Ling
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Anna M Plantinga
- Department of Mathematics and Statistics, Williams College, Williamstown, MA 01267, USA
| | - Michael C Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Xiang Zhan
- Department of Biostatistics, School of Public Health, Peking University, Beijing 100191, China
- Beijing International Center for Mathematical Research, Peking University, Beijing 100871, China
| |
Collapse
|
12
|
Tomassi D, Forzani L, Duarte S, Pfeiffer RM. Sufficient dimension reduction for compositional data. Biostatistics 2021; 22:687-705. [PMID: 31886477 DOI: 10.1093/biostatistics/kxz060] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 12/03/2019] [Accepted: 12/04/2019] [Indexed: 11/12/2022] Open
Abstract
Recent efforts to characterize the human microbiome and its relation to chronic diseases have led to a surge in statistical development for compositional data. We develop likelihood-based sufficient dimension reduction methods (SDR) to find linear combinations that contain all the information in the compositional data on an outcome variable, i.e., are sufficient for modeling and prediction of the outcome. We consider several models for the inverse regression of the compositional vector or transformations of it, as a function of outcome. They include normal, multinomial, and Poisson graphical models that allow for complex dependencies among observed counts. These methods yield efficient estimators of the reduction and can be applied to continuous or categorical outcomes. We incorporate variable selection into the estimation via penalties and address important invariance issues arising from the compositional nature of the data. We illustrate and compare our methods and some established methods for analyzing microbiome data in simulations and using data from the Human Microbiome Project. Displaying the data in the coordinate system of the SDR linear combinations allows visual inspection and facilitates comparisons across studies.
Collapse
Affiliation(s)
- Diego Tomassi
- CONICET and Facultad de Ingeniería Química, Universidad Nacional sel Litoral, Santiago del estero 2829, 3000 Santa Fe, Argentina and Institut Charles Delaunay/ROSAS Department, Systems Modelling and Dependability Team, Université de Technologie de Troyes, 12 rue Marie Curie, 10004 Troyes Cedex, France
| | - Liliana Forzani
- CONICET and Facultad de Ingeniería Química, Universidad Nacional sel Litoral, Santiago del estero 2829, 3000 Santa Fe, Argentina
| | - Sabrina Duarte
- Biostatistics Branch, National Cancer Institute, 9609 Medical Center Drive, Bethesda, MD 20892, USA
| | | |
Collapse
|
13
|
Carpenter CM, Zhang W, Gillenwater L, Severn C, Ghosh T, Bowler R, Kechris K, Ghosh D. PaIRKAT: A pathway integrated regression-based kernel association test with applications to metabolomics and COPD phenotypes. PLoS Comput Biol 2021; 17:e1008986. [PMID: 34679079 PMCID: PMC8565741 DOI: 10.1371/journal.pcbi.1008986] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 11/03/2021] [Accepted: 10/13/2021] [Indexed: 02/02/2023] Open
Abstract
High-throughput data such as metabolomics, genomics, transcriptomics, and proteomics have become familiar data types within the "-omics" family. For this work, we focus on subsets that interact with one another and represent these "pathways" as graphs. Observed pathways often have disjoint components, i.e., nodes or sets of nodes (metabolites, etc.) not connected to any other within the pathway, which notably lessens testing power. In this paper we propose the Pathway Integrated Regression-based Kernel Association Test (PaIRKAT), a new kernel machine regression method for incorporating known pathway information into the semi-parametric kernel regression framework. This work extends previous kernel machine approaches. This paper also contributes an application of a graph kernel regularization method for overcoming disconnected pathways. By incorporating a regularized or "smoothed" graph into a score test, PaIRKAT can provide more powerful tests for associations between biological pathways and phenotypes of interest and will be helpful in identifying novel pathways for targeted clinical research. We evaluate this method through several simulation studies and an application to real metabolomics data from the COPDGene study. Our simulation studies illustrate the robustness of this method to incorrect and incomplete pathway knowledge, and the real data analysis shows meaningful improvements of testing power in pathways. PaIRKAT was developed for application to metabolomic pathway data, but the techniques are easily generalizable to other data sources with a graph-like structure.
Collapse
Affiliation(s)
- Charlie M. Carpenter
- Department of Biostatistics and Informatics, University of Colorado Denver, Anschutz Medical campus, Denver, Colorado, United States of America
| | - Weiming Zhang
- Syneos Health, Morrisville, North Carolina, United States of America
| | - Lucas Gillenwater
- Computational Bioscience Program, University of Colorado Denver, Anschutz medical campus, Denver, Colorado, United States of America
| | - Cameron Severn
- Department of Biostatistics and Informatics, University of Colorado Denver, Anschutz Medical campus, Denver, Colorado, United States of America
| | - Tusharkanti Ghosh
- Department of Biostatistics and Informatics, University of Colorado Denver, Anschutz Medical campus, Denver, Colorado, United States of America
| | - Russell Bowler
- Department of Medicine, National Jewish Health, Denver; University of Colorado Denver, Anschutz Medical Campus, Denver, Colorado, United States of America
| | - Katerina Kechris
- Department of Biostatistics and Informatics, University of Colorado Denver, Anschutz Medical campus, Denver, Colorado, United States of America
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, University of Colorado Denver, Anschutz Medical campus, Denver, Colorado, United States of America
| |
Collapse
|
14
|
Zhan X, Banerjee K, Chen J. Variant-set association test for generalized linear mixed model. Genet Epidemiol 2021; 45:402-412. [PMID: 33604919 DOI: 10.1002/gepi.22378] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 01/18/2021] [Accepted: 01/25/2021] [Indexed: 12/22/2022]
Abstract
Advances in high-throughput biotechnologies have culminated in a wide range of omics (such as genomics, epigenomics, transcriptomics, metabolomics, and metagenomics) studies, and increasing evidence in these studies indicates that the biological architecture of complex traits involves a large number of omics variants each with minor effects but collectively accounting for the full phenotypic variability. Thus, a major challenge in many "ome-wide" association analyses is to achieve adequate statistical power to identify multiple variants of small effect sizes, which is notoriously difficult for studies with relatively small-sample sizes. A small-sample adjustment incorporated in the kernel machine regression framework was proposed to solve this for association studies under various settings. However, such an adjustment in the generalized linear mixed model (GLMM) framework, which accounts for both sample relatedness and non-Gaussian outcomes, has not yet been attempted. In this study, we fill this gap by extending small-sample adjustment in kernel machine association test to GLMM. We propose a new Variant-Set Association Test (VSAT), a powerful and efficient analysis tool in GLMM, to examine the association between a set of omics variants and correlated phenotypes. The usefulness of VSAT is demonstrated using both numerical simulation studies and applications to data collected from multiple association studies. The software for implementing the proposed method in R is available at https://www.github.com/jchen1981/SSKAT.
Collapse
Affiliation(s)
- Xiang Zhan
- Department of Public Health Sciences, Pennsylvania State University, Hershey, Pennsylvania, USA
| | - Kalins Banerjee
- Department of Public Health Sciences, Pennsylvania State University, Hershey, Pennsylvania, USA
| | - Jun Chen
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
15
|
Testing and controlling for horizontal pleiotropy with probabilistic Mendelian randomization in transcriptome-wide association studies. Nat Commun 2020; 11:3861. [PMID: 32737316 PMCID: PMC7395774 DOI: 10.1038/s41467-020-17668-6] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Accepted: 07/10/2020] [Indexed: 02/06/2023] Open
Abstract
Integrating results from genome-wide association studies (GWASs) and gene expression studies through transcriptome-wide association study (TWAS) has the potential to shed light on the causal molecular mechanisms underlying disease etiology. Here, we present a probabilistic Mendelian randomization (MR) method, PMR-Egger, for TWAS applications. PMR-Egger relies on a MR likelihood framework that unifies many existing TWAS and MR methods, accommodates multiple correlated instruments, tests the causal effect of gene on trait in the presence of horizontal pleiotropy, and is scalable to hundreds of thousands of individuals. In simulations, PMR-Egger provides calibrated type I error control for causal effect testing in the presence of horizontal pleiotropic effects, is reasonably robust under various types of model misspecifications, is more powerful than existing TWAS/MR approaches, and can directly test for horizontal pleiotropy. We illustrate the benefits of PMR-Egger in applications to 39 diseases and complex traits obtained from three GWASs including the UK Biobank. Transcriptome-wide association studies integrate GWAS and transcriptome data to examine the molecular mechanisms underlying disease etiology. Here the authors present PMR-Egger, a powerful TWAS method based on probabilistic Mendelian Randomization.
Collapse
|
16
|
Xia Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 171:309-491. [PMID: 32475527 DOI: 10.1016/bs.pmbts.2020.04.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.
Collapse
Affiliation(s)
- Yinglin Xia
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, United States.
| |
Collapse
|
17
|
Association test using Copy Number Profile Curves (CONCUR) enhances power in rare copy number variant analysis. PLoS Comput Biol 2020; 16:e1007797. [PMID: 32365089 PMCID: PMC7224564 DOI: 10.1371/journal.pcbi.1007797] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Revised: 05/14/2020] [Accepted: 03/18/2020] [Indexed: 12/12/2022] Open
Abstract
Copy number variants (CNVs) are the gain or loss of DNA segments in the genome that can vary in dosage and length. CNVs comprise a large proportion of variation in human genomes and impact health conditions. To detect rare CNV associations, kernel-based methods have been shown to be a powerful tool due to their flexibility in modeling the aggregate CNV effects, their ability to capture effects from different CNV features, and their accommodation of effect heterogeneity. To perform a kernel association test, a CNV locus needs to be defined so that locus-specific effects can be retained during aggregation. However, CNV loci are arbitrarily defined and different locus definitions can lead to different performance depending on the underlying effect patterns. In this work, we develop a new kernel-based test called CONCUR (i.e., copy number profile curve-based association test) that is free from a definition of locus and evaluates CNV-phenotype associations by comparing individuals' copy number profiles across the genomic regions. CONCUR is built on the proposed concepts of "copy number profile curves" to describe the CNV profile of an individual, and the "common area under the curve (cAUC) kernel" to model the multi-feature CNV effects. The proposed method captures the effects of CNV dosage and length, accounts for the numerical nature of copy numbers, and accommodates between- and within-locus etiological heterogeneity without the need to define artificial CNV loci as required in current kernel methods. In a variety of simulation settings, CONCUR shows comparable or improved power over existing approaches. Real data analyses suggest that CONCUR is well powered to detect CNV effects in the Swedish Schizophrenia Study and the Taiwan Biobank.
Collapse
|
18
|
Banerjee K, Zhao N, Srinivasan A, Xue L, Hicks SD, Middleton FA, Wu R, Zhan X. An Adaptive Multivariate Two-Sample Test With Application to Microbiome Differential Abundance Analysis. Front Genet 2019; 10:350. [PMID: 31068967 PMCID: PMC6491633 DOI: 10.3389/fgene.2019.00350] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 04/01/2019] [Indexed: 01/21/2023] Open
Abstract
Differential abundance analysis is a crucial task in many microbiome studies, where the central goal is to identify microbiome taxa associated with certain biological or clinical conditions. There are two different modes of microbiome differential abundance analysis: the individual-based univariate differential abundance analysis and the group-based multivariate differential abundance analysis. The univariate analysis identifies differentially abundant microbiome taxa subject to multiple correction under certain statistical error measurements such as false discovery rate, which is typically complicated by the high-dimensionality of taxa and complex correlation structure among taxa. The multivariate analysis evaluates the overall shift in the abundance of microbiome composition between two conditions, which provides useful preliminary differential information for the necessity of follow-up validation studies. In this paper, we present a novel Adaptive multivariate two-sample test for Microbiome Differential Analysis (AMDA) to examine whether the composition of a taxa-set are different between two conditions. Our simulation studies and real data applications demonstrated that the AMDA test was often more powerful than several competing methods while preserving the correct type I error rate. A free implementation of our AMDA method in R software is available at https://github.com/xyz5074/AMDA.
Collapse
Affiliation(s)
- Kalins Banerjee
- Department of Public Health Sciences, Pennsylvania State University, Hershey, PA, United States
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, United States
| | - Arun Srinivasan
- Department of Statistics, Pennsylvania State University, University Park, PA, United States
| | - Lingzhou Xue
- Department of Statistics, Pennsylvania State University, University Park, PA, United States
| | - Steven D. Hicks
- Department of Pediatrics, Pennsylvania State University, Hershey, PA, United States
| | - Frank A. Middleton
- Department of Neuroscience, State University of New York Upstate Medical University, Syracuse, NY, United States
| | - Rongling Wu
- Department of Public Health Sciences, Pennsylvania State University, Hershey, PA, United States
| | - Xiang Zhan
- Department of Public Health Sciences, Pennsylvania State University, Hershey, PA, United States,*Correspondence: Xiang Zhan
| |
Collapse
|
19
|
Chen H, Huffman JE, Brody JA, Wang C, Lee S, Li Z, Gogarten SM, Sofer T, Bielak LF, Bis JC, Blangero J, Bowler RP, Cade BE, Cho MH, Correa A, Curran JE, de Vries PS, Glahn DC, Guo X, Johnson AD, Kardia S, Kooperberg C, Lewis JP, Liu X, Mathias RA, Mitchell BD, O’Connell JR, Peyser PA, Post WS, Reiner AP, Rich SS, Rotter JI, Silverman EK, Smith JA, Vasan RS, Wilson JG, Yanek LR, Redline S, Smith NL, Boerwinkle E, Borecki IB, Cupples LA, Laurie CC, Morrison AC, Rice KM, Lin X, Rice KM, Lin X. Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies. Am J Hum Genet 2019; 104:260-274. [PMID: 30639324 DOI: 10.1016/j.ajhg.2018.12.012] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2018] [Accepted: 12/17/2018] [Indexed: 12/12/2022] Open
Abstract
With advances in whole-genome sequencing (WGS) technology, more advanced statistical methods for testing genetic association with rare variants are being developed. Methods in which variants are grouped for analysis are also known as variant-set, gene-based, and aggregate unit tests. The burden test and sequence kernel association test (SKAT) are two widely used variant-set tests, which were originally developed for samples of unrelated individuals and later have been extended to family data with known pedigree structures. However, computationally efficient and powerful variant-set tests are needed to make analyses tractable in large-scale WGS studies with complex study samples. In this paper, we propose the variant-set mixed model association tests (SMMAT) for continuous and binary traits using the generalized linear mixed model framework. These tests can be applied to large-scale WGS studies involving samples with population structure and relatedness, such as in the National Heart, Lung, and Blood Institute's Trans-Omics for Precision Medicine (TOPMed) program. SMMATs share the same null model for different variant sets, and a virtue of this null model, which includes covariates only, is that it needs to be fit only once for all tests in each genome-wide analysis. Simulation studies show that all the proposed SMMATs correctly control type I error rates for both continuous and binary traits in the presence of population structure and relatedness. We also illustrate our tests in a real data example of analysis of plasma fibrinogen levels in the TOPMed program (n = 23,763), using the Analysis Commons, a cloud-based computing platform.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Kenneth M Rice
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Statistics, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
20
|
Zhai J, Knox K, Twigg HL, Zhou H, Zhou JJ. Exact variance component tests for longitudinal microbiome studies. Genet Epidemiol 2019; 43:250-262. [PMID: 30623484 DOI: 10.1002/gepi.22185] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 10/28/2018] [Accepted: 11/26/2018] [Indexed: 01/12/2023]
Abstract
In metagenomic studies, testing the association between microbiome composition and clinical outcomes translates to testing the nullity of variance components. Motivated by a lung human immunodeficiency virus (HIV) microbiome project, we study longitudinal microbiome data by using variance component models with more than two variance components. Current testing strategies only apply to models with exactly two variance components and when sample sizes are large. Therefore, they are not applicable to longitudinal microbiome studies. In this paper, we propose exact tests (score test, likelihood ratio test, and restricted likelihood ratio test) to (a) test the association of the overall microbiome composition in a longitudinal design and (b) detect the association of one specific microbiome cluster while adjusting for the effects from related clusters. Our approach combines the exact tests for null hypothesis with a single variance component with a strategy of reducing multiple variance components to a single one. Simulation studies demonstrate that our method has a correct type I error rate and superior power compared to existing methods at small sample sizes and weak signals. Finally, we apply our method to a longitudinal pulmonary microbiome study of HIV-infected patients and reveal two interesting genera Prevotella and Veillonella associated with forced vital capacity. Our findings shed light on the impact of the lung microbiome on HIV complexities. The method is implemented in the open-source, high-performance computing language Julia and is freely available at https://github.com/JingZhai63/VCmicrobiome.
Collapse
Affiliation(s)
- Jing Zhai
- Department of Epidemiology and Biostatistics, University of Arizona, Tucson, Arizona
| | - Kenneth Knox
- Division of Pulmonary, Allergy, Critical Care, Sleep Medicine, Department of Medicine, University of Arizona, Tucson, Arizona
| | - Homer L Twigg
- Division of Pulmonary, Critical Care, Sleep, and Occupational Medicine, Indiana University Medical Center, Indianapolis, Indiana
| | - Hua Zhou
- Department of Biostatistics, University of California, Los Angeles, California
| | - Jin J Zhou
- Department of Epidemiology and Biostatistics, University of Arizona, Tucson, Arizona
| |
Collapse
|
21
|
Larson NB, Chen J, Schaid DJ. A review of kernel methods for genetic association studies. Genet Epidemiol 2019; 43:122-136. [PMID: 30604442 DOI: 10.1002/gepi.22180] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 11/09/2018] [Accepted: 11/26/2018] [Indexed: 12/17/2022]
Abstract
Evaluating the association of multiple genetic variants with a trait of interest by use of kernel-based methods has made a significant impact on how genetic association analyses are conducted. An advantage of kernel methods is that they tend to be robust when the genetic variants have effects that are a mixture of positive and negative effects, as well as when there is a small fraction of causal variants. Another advantage is that kernel methods fit within the framework of mixed models, providing flexible ways to adjust for additional covariates that influence traits. Herein, we review the basic ideas behind the use of kernel methods for genetic association analysis as well as recent methodological advancements for different types of traits, multivariate traits, pedigree data, and longitudinal data. Finally, we discuss opportunities for future research.
Collapse
Affiliation(s)
- Nicholas B Larson
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Jun Chen
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Daniel J Schaid
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
22
|
Schweiger R, Fisher E, Weissbrod O, Rahmani E, Müller-Nurasyid M, Kunze S, Gieger C, Waldenberger M, Rosset S, Halperin E. Detecting heritable phenotypes without a model using fast permutation testing for heritability and set-tests. Nat Commun 2018; 9:4919. [PMID: 30464216 PMCID: PMC6249264 DOI: 10.1038/s41467-018-07276-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Accepted: 10/26/2018] [Indexed: 01/08/2023] Open
Abstract
Testing for association between a set of genetic markers and a phenotype is a fundamental task in genetic studies. Standard approaches for heritability and set testing strongly rely on parametric models that make specific assumptions regarding phenotypic variability. Here, we show that resulting p-values may be inflated by up to 15 orders of magnitude, in a heritability study of methylation measurements, and in a heritability and expression quantitative trait loci analysis of gene expression profiles. We propose FEATHER, a method for fast permutation-based testing of marker sets and of heritability, which properly controls for false-positive results. FEATHER eliminated 47% of methylation sites found to be heritable by the parametric test, suggesting a substantial inflation of false-positive findings by alternative methods. Our approach can rapidly identify heritable phenotypes out of millions of phenotypes acquired via high-throughput technologies, does not suffer from model misspecification and is highly efficient. Standard approaches for heritability and set testing in statistical genetics rely on parametric models that might not hold in reality and give inflated p-values. Here, the authors develop a fast method for permutation-based testing of marker sets and of heritability that does not suffer from model misspecification.
Collapse
Affiliation(s)
- Regev Schweiger
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 6997801, Israel.
| | - Eyal Fisher
- School of Mathematical Sciences, Department of Statistics, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Omer Weissbrod
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA
| | - Elior Rahmani
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 6997801, Israel
| | - Martina Müller-Nurasyid
- Institute of Genetic Epidemiology, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg, 85764, Germany.,Department of Medicine I, Ludwig-Maximilians-Universität, Munich, 80539, Germany.,DZHK (German Centre for Cardiovascular Research), partner site Munich Heart Alliance, Munich, 80636, Germany
| | - Sonja Kunze
- Institute of Epidemiology II, Helmholtz Zentrum München - German Research Center for Environmental Health, 85764, Neuherberg, Germany.,Research Unit of Molecular Epidemiology, Helmholtz Zentrum München-German Research Center for Environmental Health, 85764, Neuherberg, Germany
| | - Christian Gieger
- Institute of Epidemiology II, Helmholtz Zentrum München - German Research Center for Environmental Health, 85764, Neuherberg, Germany.,Research Unit of Molecular Epidemiology, Helmholtz Zentrum München-German Research Center for Environmental Health, 85764, Neuherberg, Germany
| | - Melanie Waldenberger
- DZHK (German Centre for Cardiovascular Research), partner site Munich Heart Alliance, Munich, 80636, Germany.,Institute of Epidemiology II, Helmholtz Zentrum München - German Research Center for Environmental Health, 85764, Neuherberg, Germany.,Research Unit of Molecular Epidemiology, Helmholtz Zentrum München-German Research Center for Environmental Health, 85764, Neuherberg, Germany
| | - Saharon Rosset
- School of Mathematical Sciences, Department of Statistics, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Eran Halperin
- Los Angeles, University of California Los Angeles, Los Angeles, 90095, CA, USA.,Department of Anesthesiology and Perioperative Medicine, University of California, Los Angeles, 90095, CA, USA
| |
Collapse
|
23
|
Zhan X, Xue L, Zheng H, Plantinga A, Wu MC, Schaid DJ, Zhao N, Chen J. A small‐sample kernel association test for correlated data with application to microbiome association studies. Genet Epidemiol 2018; 42:772-782. [DOI: 10.1002/gepi.22160] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Revised: 06/27/2018] [Accepted: 07/15/2018] [Indexed: 01/11/2023]
Affiliation(s)
- Xiang Zhan
- Department of Public Health SciencesPennsylvania State UniversityHershey Pennsylvania
| | - Lingzhou Xue
- Department of StatisticsPennsylvania State UniversityUniversity Park Pennsylvania
| | - Haotian Zheng
- Department of Mathematical SciencesTsinghua UniversityBeijing China
| | - Anna Plantinga
- Department of BiostatisticsUniversity of WashingtonSeattle Washington
| | - Michael C. Wu
- Department of BiostatisticsUniversity of WashingtonSeattle Washington
- Division of Public Health SciencesFred Hutchinson Cancer Research CenterSeattle Washington
| | - Daniel J. Schaid
- Division of Biomedical Statistics and InformaticsMayo ClinicRochester Minnesota
| | - Ni Zhao
- Department of BiostatisticsJohns Hopkins UniversityBaltimore Maryland
| | - Jun Chen
- Division of Biomedical Statistics and InformaticsMayo ClinicRochester Minnesota
- Center for Individualized MedicineMayo ClinicRochester Minnesota
| |
Collapse
|
24
|
Wang K. Conditional asymptotic inference for the kernel association test. Bioinformatics 2018; 33:3733-3739. [PMID: 28961861 DOI: 10.1093/bioinformatics/btx511] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Accepted: 08/08/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation The kernel association test (KAT) is popular in biological studies for its ability to combine weak effects potentially of opposite direction. Its P-value is typically assessed via its (unconditional) asymptotic distribution. However, such an asymptotic distribution is known only for continuous traits and for dichotomous traits. Furthermore, the derived P-values are known to be conservative when sample size is small, especially for the important case of dichotomous traits. One alternative is the permutation test, a widely accepted approximation to the exact finite sample conditional inference. But it is time-consuming to use in practice due to stringent significance criteria commonly seen in these analyses. Results Based on a previous theoretical result a conditional asymptotic distribution for the KAT is introduced. This distribution provides an alternative approximation to the exact distribution of the KAT. An explicit expression of this distribution is provided from which P-values can be easily computed. This method applies to any type of traits. The usefulness of this approach is demonstrated via extensive simulation studies using real genotype data and an analysis of genetic data from the Ocular Hypertension Treatment Study. Numerical results showed that the new method can control the type I error rate and is a bit conservative when compared to the permutation method. Nevertheless the proposed method may be used as a fast screening method. A time-consuming permutation procedure may be conducted at locations that show signals of association. Availability and implementation An implementation of the proposed method is provided in the R package iGasso. Contact kai-wang@uiowa.edu.
Collapse
Affiliation(s)
- Kai Wang
- Department of Biostatistics, University of Iowa, Iowa City, IA 52246, USA
| |
Collapse
|
25
|
Davenport CA, Maity A, Sullivan PF, Tzeng JY. A Powerful Test for SNP Effects on Multivariate Binary Outcomes using Kernel Machine Regression. STATISTICS IN BIOSCIENCES 2018; 10:117-138. [PMID: 30420901 PMCID: PMC6226013 DOI: 10.1007/s12561-017-9189-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Revised: 12/20/2016] [Accepted: 03/15/2017] [Indexed: 10/19/2022]
Abstract
Evaluating multiple binary outcomes is common in genetic studies of complex diseases. These outcomes are often correlated because they are collected from the same individual and they may share common marker effects. In this paper, we propose a procedure to test for effect of a SNP-set on multiple, possibly correlated, binary responses. We develop a score-based test using a nonparametric modeling framework that jointly models the global effect of the marker set. We account for the nonlinear effects and potentially complicated interaction between markers using reproducing kernels. Our testing procedure only requires estimation under the null hypothesis and we use multivariate generalized estimating equations (GEEs) to estimate the model components to account for the correlation among the outcomes. We evaluate finite sample performance of our test via simulation study and demonstrated our methods using the CATIE antibody study data and the CoLaus Study data.
Collapse
Affiliation(s)
- Clemontina A Davenport
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, NC 27707, USA
| | - Arnab Maity
- Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | - Patrick F Sullivan
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jung-Ying Tzeng
- Department of Statistics, Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695, USA. Department of Statistics, National Cheng-Kung University, Tainan, Taiwan Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
26
|
RL-SKAT: An Exact and Efficient Score Test for Heritability and Set Tests. Genetics 2017; 207:1275-1283. [PMID: 29025915 DOI: 10.1534/genetics.117.300395] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Accepted: 09/24/2017] [Indexed: 11/18/2022] Open
Abstract
Testing for the existence of variance components in linear mixed models is a fundamental task in many applicative fields. In statistical genetics, the score test has recently become instrumental in the task of testing an association between a set of genetic markers and a phenotype. With few markers, this amounts to set-based variance component tests, which attempt to increase power in association studies by aggregating weak individual effects. When the entire genome is considered, it allows testing for the heritability of a phenotype, defined as the proportion of phenotypic variance explained by genetics. In the popular score-based Sequence Kernel Association Test (SKAT) method, the assumed distribution of the score test statistic is uncalibrated in small samples, with a correction being computationally expensive. This may cause severe inflation or deflation of P-values, even when the null hypothesis is true. Here, we characterize the conditions under which this discrepancy holds, and show it may occur also in large real datasets, such as a dataset from the Wellcome Trust Case Control Consortium 2 (n = 13,950) study, and, in particular, when the individuals in the sample are unrelated. In these cases, the SKAT approximation tends to be highly overconservative and therefore underpowered. To address this limitation, we suggest an efficient method to calculate exact P-values for the score test in the case of a single variance component and a continuous response vector, which can speed up the analysis by orders of magnitude. Our results enable fast and accurate application of the score test in heritability and in set-based association tests. Our method is available in http://github.com/cozygene/RL-SKAT.
Collapse
|
27
|
A Powerful Framework for Integrating eQTL and GWAS Summary Data. Genetics 2017; 207:893-902. [PMID: 28893853 DOI: 10.1534/genetics.117.300270] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Accepted: 09/05/2017] [Indexed: 01/26/2023] Open
Abstract
Two new gene-based association analysis methods, called PrediXcan and TWAS for GWAS individual-level and summary data, respectively, were recently proposed to integrate GWAS with eQTL data, alleviating two common problems in GWAS by boosting statistical power and facilitating biological interpretation of GWAS discoveries. Based on a novel reformulation of PrediXcan and TWAS, we propose a more powerful gene-based association test to integrate single set or multiple sets of eQTL data with GWAS individual-level data or summary statistics. The proposed test was applied to several GWAS datasets, including two lipid summary association datasets based on [Formula: see text] and [Formula: see text] samples, respectively, and uncovered more known or novel trait-associated genes, showcasing much improved performance of our proposed method. The software implementing the proposed method is freely available as an R package.
Collapse
|
28
|
Powerful Genetic Association Analysis for Common or Rare Variants with High-Dimensional Structured Traits. Genetics 2017. [PMID: 28642271 DOI: 10.1534/genetics.116.199646] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Many genetic association studies collect a wide range of complex traits. As these traits may be correlated and share a common genetic mechanism, joint analysis can be statistically more powerful and biologically more meaningful. However, most existing tests for multiple traits cannot be used for high-dimensional and possibly structured traits, such as network-structured transcriptomic pathway expressions. To overcome potential limitations, in this article we propose the dual kernel-based association test (DKAT) for testing the association between multiple traits and multiple genetic variants, both common and rare. In DKAT, two individual kernels are used to describe the phenotypic and genotypic similarity, respectively, between pairwise subjects. Using kernels allows for capturing structure while accommodating dimensionality. Then, the association between traits and genetic variants is summarized by a coefficient which measures the association between two kernel matrices. Finally, DKAT evaluates the hypothesis of nonassociation with an analytical P-value calculation without any computationally expensive resampling procedures. By collapsing information in both traits and genetic variants using kernels, the proposed DKAT is shown to have a correct type-I error rate and higher power than other existing methods in both simulation studies and application to a study of genetic regulation of pathway gene expressions.
Collapse
|
29
|
Gallego V, Luz Calle M, Oller R. Kernel-Based Measure of Variable Importance for Genetic Association Studies. Int J Biostat 2017. [PMID: 28628480 DOI: 10.1515/ijb-2016-0087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The identification of genetic variants that are associated with disease risk is an important goal of genetic association studies. Standard approaches perform univariate analysis where each genetic variant, usually Single Nucleotide Polymorphisms (SNPs), is tested for association with disease status. Though many genetic variants have been identified and validated so far using this univariate approach, for most complex diseases a large part of their genetic component is still unknown, the so called missing heritability. We propose a Kernel-based measure of variable importance (KVI) that provides the contribution of a SNP, or a group of SNPs, to the joint genetic effect of a set of genetic variants. KVI can be used for ranking genetic markers individually, sets of markers that form blocks of linkage disequilibrium or sets of genetic variants that lie in a gene or a genetic pathway. We prove that, unlike the univariate analysis, KVI captures the relationship with other genetic variants in the analysis, even when measured at the individual level for each genetic variable separately. This is specially relevant and powerful for detecting genetic interactions. We illustrate the results with data from an Alzheimer's disease study and show through simulations that the rankings based on KVI improve those rankings based on two measures of importance provided by the Random Forest. We also prove with a simulation study that KVI is very powerful for detecting genetic interactions.
Collapse
|
30
|
Zhan X, Plantinga A, Zhao N, Wu MC. A fast small-sample kernel independence test for microbiome community-level association analysis. Biometrics 2017; 73:1453-1463. [PMID: 28295177 DOI: 10.1111/biom.12684] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Revised: 02/01/2017] [Accepted: 02/01/2017] [Indexed: 12/13/2022]
Abstract
To fully understand the role of microbiome in human health and diseases, researchers are increasingly interested in assessing the relationship between microbiome composition and host genomic data. The dimensionality of the data as well as complex relationships between microbiota and host genomics pose considerable challenges for analysis. In this article, we apply a kernel RV coefficient (KRV) test to evaluate the overall association between host gene expression and microbiome composition. The KRV statistic can capture nonlinear correlations and complex relationships among the individual data types and between gene expression and microbiome composition through measuring general dependency. Testing proceeds via a similar route as existing tests of the generalized RV coefficients and allows for rapid p-value calculation. Strategies to allow adjustment for confounding effects, which is crucial for avoiding misleading results, and to alleviate the problem of selecting the most favorable kernel are considered. Simulation studies show that KRV is useful in testing statistical independence with finite samples given the kernels are appropriately chosen, and can powerfully identify existing associations between microbiome composition and host genomic data while protecting type I error. We apply the KRV to a microbiome study examining the relationship between host transcriptome and microbiome composition within the context of inflammatory bowel disease and are able to derive new biological insights and provide formal inference on prior qualitative observations.
Collapse
Affiliation(s)
- Xiang Zhan
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, U.S.A
| | - Anna Plantinga
- Department of Biostatistics, University of Washington, Seattle, Washington 98195, U.S.A
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland 21205, U.S.A
| | - Michael C Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, U.S.A
| |
Collapse
|
31
|
Plantinga A, Zhan X, Zhao N, Chen J, Jenq RR, Wu MC. MiRKAT-S: a community-level test of association between the microbiota and survival times. MICROBIOME 2017; 5:17. [PMID: 28179014 PMCID: PMC5299808 DOI: 10.1186/s40168-017-0239-9] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Accepted: 01/31/2017] [Indexed: 05/17/2023]
Abstract
BACKGROUND Community-level analysis of the human microbiota has culminated in the discovery of relationships between overall shifts in the microbiota and a wide range of diseases and conditions. However, existing work has primarily focused on analysis of relatively simple dichotomous or quantitative outcomes, for example, disease status or biomarker levels. Recently, there is also considerable interest in the relationship between the microbiota and censored survival outcomes, such as in clinical trials. How to conduct community-level analysis with censored survival outcomes is unclear, since standard dissimilarity-based tests cannot accommodate censored survival times and no alternative methods exist. METHODS We develop a new approach, MiRKAT-S, for community-level analysis of microbiome data with censored survival times. MiRKAT-S uses ecologically informative distance metrics, such as the UniFrac distances, to generate matrices of pairwise distances between individuals' taxonomic profiles. The distance matrices are transformed into kernel (similarity) matrices, which are used to compare similarity in the microbiota to similarity in survival times between individuals. RESULTS Simulation studies using synthetic microbial communities demonstrate correct control of type I error and adequate power. We also apply MiRKAT-S to examine the relationship between the gut microbiota and survival after allogeneic blood or bone marrow transplant. CONCLUSIONS We present MiRKAT-S, a method that facilitates community-level analysis of the association between the microbiota and survival outcomes and therefore provides a new approach to analysis of microbiome data arising from clinical trials.
Collapse
Affiliation(s)
- Anna Plantinga
- Department of Biostatistics, University of Washington, 1705 NE Pacific Street, Seattle, Washington USA
| | - Xiang Zhan
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, Washington USA
| | - Ni Zhao
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 615 N Wolfe St, Baltimore, Maryland USA
| | - Jun Chen
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, Minnesota USA
| | - Robert R. Jenq
- Departments of Genomic Medicine and Stem Cell Transplantation, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, 1515 Holcombe Blvd, Houston, Unit 1954 TX USA
| | - Michael C. Wu
- Department of Biostatistics, University of Washington, 1705 NE Pacific Street, Seattle, Washington USA
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, Washington USA
| |
Collapse
|
32
|
Sheng X, Xie W, Zhou Y. Analysis of genotype effects for the immunosuppression via two-step method. BIO WEB OF CONFERENCES 2017. [DOI: 10.1051/bioconf/20170802010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
33
|
Zhan X, Tong X, Zhao N, Maity A, Wu MC, Chen J. A small-sample multivariate kernel machine test for microbiome association studies. Genet Epidemiol 2016; 41:210-220. [DOI: 10.1002/gepi.22030] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Revised: 10/04/2016] [Accepted: 10/22/2016] [Indexed: 12/28/2022]
Affiliation(s)
- Xiang Zhan
- Public Health Sciences; Fred Hutchinson Cancer Research Center; Seattle WA USA
| | - Xingwei Tong
- School of Mathematical Sciences; Beijing Normal University; Beijing China
| | - Ni Zhao
- Department of Biostatistics; Johns Hopkins University; Baltimore MD USA
| | - Arnab Maity
- Department of Statistics; North Carolina State University; Raleigh NC USA
| | - Michael C. Wu
- Public Health Sciences; Fred Hutchinson Cancer Research Center; Seattle WA USA
| | - Jun Chen
- Division of Biomedical Statistics and Informatics, Mayo Clinic; Rochester MN USA
| |
Collapse
|
34
|
Zhan X, Girirajan S, Zhao N, Wu MC, Ghosh D. A novel copy number variants kernel association test with application to autism spectrum disorders studies. Bioinformatics 2016; 32:3603-3610. [PMID: 27497442 DOI: 10.1093/bioinformatics/btw500] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Revised: 06/28/2016] [Accepted: 07/22/2016] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Copy number variants (CNVs) have been implicated in a variety of neurodevelopmental disorders, including autism spectrum disorders, intellectual disability and schizophrenia. Recent advances in high-throughput genomic technologies have enabled rapid discovery of many genetic variants including CNVs. As a result, there is increasing interest in studying the role of CNVs in the etiology of many complex diseases. Despite the availability of an unprecedented wealth of CNV data, methods for testing association between CNVs and disease-related traits are still under-developed due to the low prevalence and complicated multi-scale features of CNVs. RESULTS We propose a novel CNV kernel association test (CKAT) in this paper. To address the low prevalence, CNVs are first grouped into CNV regions (CNVR). Then, taking into account the multi-scale features of CNVs, we first design a single-CNV kernel which summarizes the similarity between two CNVs, and next aggregate the single-CNV kernel to a CNVR kernel which summarizes the similarity between two CNVRs. Finally, association between CNVR and disease-related traits is assessed by comparing the kernel-based similarity with the similarity in the trait using a score test for variance components in a random effect model. We illustrate the proposed CKAT using simulations and show that CKAT is more powerful than existing methods, while always being able to control the type I error. We also apply CKAT to a real dataset examining the association between CNV and autism spectrum disorders, which demonstrates the potential usefulness of the proposed method. AVAILABILITY AND IMPLEMENTATION A R package to implement the proposed CKAT method is available at http://works.bepress.com/debashis_ghosh/ CONTACTS: xzhan@fhcrc.org or debashis.ghosh@ucdenver.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiang Zhan
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Santhosh Girirajan
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA.,Department of Anthropology, Pennsylvania State University, University Park, PA 16802, USA and
| | - Ni Zhao
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Michael C Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, University of Colorado, Aurora, CO 80045, USA
| |
Collapse
|
35
|
Wang K. Boosting the Power of the Sequence Kernel Association Test by Properly Estimating Its Null Distribution. Am J Hum Genet 2016; 99:104-14. [PMID: 27292111 DOI: 10.1016/j.ajhg.2016.05.011] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 05/02/2016] [Indexed: 01/22/2023] Open
Abstract
The sequence kernel association test (SKAT) is probably the most popular statistical test used in rare-variant association studies. Its null distribution involves unknown parameters that need to be estimated. The current estimation method has a valid type I error rate, but the power is compromised given that all subjects are used for estimation. I have developed an estimation method that uses only control subjects. Named SKAT+, this method uses the same test statistic as SKAT but differs in the way the null distribution is estimated. Extensive simulation studies and applications to data from the Genetic Analysis Workshop 17 and the Ocular Hypertension Treatment Study demonstrated that SKAT+ has superior power over SKAT while maintaining control over the type I error rate. This method is applicable to extensions of SKAT in the literature.
Collapse
Affiliation(s)
- Kai Wang
- Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, IA 52242, USA.
| |
Collapse
|
36
|
Yang H, Li S, Cao H, Zhang C, Cui Y. Predicting disease trait with genomic data: a composite kernel approach. Brief Bioinform 2016; 18:591-601. [DOI: 10.1093/bib/bbw043] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2016] [Indexed: 01/17/2023] Open
|