1
|
Ghazi AR, Thompson KN, Bhosle A, Mei Z, Yan Y, Wang F, Wang K, Franzosa EA, Huttenhower C. Quantifying Metagenomic Strain Associations from Microbiomes with Anpan. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.06.631550. [PMID: 39829854 PMCID: PMC11741421 DOI: 10.1101/2025.01.06.631550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
Genetic and genomic variation among microbial strains can dramatically influence their phenotypes and environmental impact, including on human health. However, inferential methods for quantifying these differences have been lacking. Strain-level metagenomic profiling data has several features that make traditional statistical methods challenging to use, including high dimensionality, extreme variation among samples, and complex phylogenetic relatedness. We present Anpan, a set of quantitative methods addressing three key challenges in microbiome strain epidemiology. First, adaptive filtering designed to interrogate microbial strain gene carriage is combined with linear models to identify strain-specific genetic elements associated with host health outcomes and other phenotypes. Second, phylogenetic generalized linear mixed models are used to characterize the association of sub-species lineages with such phenotypes. Finally, random effects models are used to identify pathways more likely to be retained or lost by outcome-associated strains. We validated our methods by simulation, showing that we achieve more accurate effect size estimation and a lower false positive rate compared to alternative methodologies. We then applied our methods to a dataset of 1,262 colorectal cancer patients, identifying functionally adaptive genes and strong phylogenetic effects associated with CRC status, sometimes complementing and sometimes extending known species-level microbiome CRC biomarkers. Anpan's methods have been implemented as a publicly available R library to support microbial community strain and genetic epidemiology in a variety of contexts, environments, and phenotypes.
Collapse
Affiliation(s)
- Andrew R Ghazi
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kelsey N Thompson
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Harvard Chan Microbiome in Public Health Center, Boston, MA, USA
| | - Amrisha Bhosle
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Harvard Chan Microbiome in Public Health Center, Boston, MA, USA
| | - Zhendong Mei
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Yan Yan
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Fenglei Wang
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Kai Wang
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Eric A Franzosa
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Harvard Chan Microbiome in Public Health Center, Boston, MA, USA
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Harvard Chan Microbiome in Public Health Center, Boston, MA, USA
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
2
|
Little P, Hsu L, Sun W. Associating somatic mutation with clinical outcomes through kernel regression and optimal transport. Biometrics 2023; 79:2705-2718. [PMID: 36217816 PMCID: PMC10455040 DOI: 10.1111/biom.13769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 09/16/2022] [Indexed: 11/30/2022]
Abstract
Somatic mutations in cancer patients are inherently sparse and potentially high dimensional. Cancer patients may share the same set of deregulated biological processes perturbed by different sets of somatically mutated genes. Therefore, when assessing the associations between somatic mutations and clinical outcomes, gene-by-gene analysis is often under-powered because it does not capture the complex disease mechanisms shared across cancer patients. Rather than testing genes one by one, an intuitive approach is to aggregate somatic mutation data of multiple genes to assess their joint association with clinical outcomes. The challenge is how to aggregate such information. Building on the optimal transport method, we propose a principled approach to estimate the similarity of somatic mutation profiles of multiple genes between tumor samples, while accounting for gene-gene similarities defined by gene annotations or empirical mutational patterns. Using such similarities, we can assess the associations between somatic mutations and clinical outcomes by kernel regression. We have applied our method to analyze somatic mutation data of 17 cancer types and identified at least five cancer types, where somatic mutations are associated with overall survival, progression-free interval, or cytolytic activity.
Collapse
Affiliation(s)
- Paul Little
- Biostatistics Program, Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, U.S.A
| | - Li Hsu
- Biostatistics Program, Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, U.S.A
- Department of Biostatistics, University of Washington, Seattle, Washington, U.S.A
| | - Wei Sun
- Biostatistics Program, Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, U.S.A
- Department of Biostatistics, University of Washington, Seattle, Washington, U.S.A
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| |
Collapse
|
3
|
Wang C, Ahn J, Tarpey T, Yi SS, Hayes RB, Li H. A microbial causal mediation analytic tool for health disparity and applications in body mass index. MICROBIOME 2023; 11:164. [PMID: 37496080 PMCID: PMC10373330 DOI: 10.1186/s40168-023-01608-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 06/22/2023] [Indexed: 07/28/2023]
Abstract
BACKGROUND Emerging evidence suggests the potential mediating role of microbiome in health disparities. However, no analytic framework can be directly used to analyze microbiome as a mediator between health disparity and clinical outcome, due to the non-manipulable nature of the exposure and the unique structure of microbiome data, including high dimensionality, sparsity, and compositionality. METHODS Considering the modifiable and quantitative features of the microbiome, we propose a microbial causal mediation model framework, SparseMCMM_HD, to uncover the mediating role of microbiome in health disparities, by depicting a plausible path from a non-manipulable exposure (e.g., ethnicity or region) to the outcome through the microbiome. The proposed SparseMCMM_HD rigorously defines and quantifies the manipulable disparity measure that would be eliminated by equalizing microbiome profiles between comparison and reference groups and innovatively and successfully extends the existing microbial mediation methods, which are originally proposed under potential outcome or counterfactual outcome study design, to address health disparities. RESULTS Through three body mass index (BMI) studies selected from the curatedMetagenomicData 3.4.2 package and the American gut project: China vs. USA, China vs. UK, and Asian or Pacific Islander (API) vs. Caucasian, we exhibit the utility of the proposed SparseMCMM_HD framework for investigating the microbiome's contributions in health disparities. Specifically, BMI exhibits disparities and microbial community diversities are significantly distinctive between reference and comparison groups in all three applications. By employing SparseMCMM_HD, we illustrate that microbiome plays a crucial role in explaining the disparities in BMI between ethnicities or regions. 20.63%, 33.09%, and 25.71% of the overall disparity in BMI in China-USA, China-UK, and API-Caucasian comparisons, respectively, would be eliminated if the between-group microbiome profiles were equalized; and 15, 18, and 16 species are identified to play the mediating role respectively. CONCLUSIONS The proposed SparseMCMM_HD is an effective and validated tool to elucidate the mediating role of microbiome in health disparity. Three BMI applications shed light on the utility of microbiome in reducing BMI disparity by manipulating microbial profiles. Video Abstract.
Collapse
Affiliation(s)
- Chan Wang
- Department of Population Health, Division of Biostatistics, New York University Grossman School of Medicine, New York, NY, 10016, USA
| | - Jiyoung Ahn
- Department of Population Health, Division of Epidemiology, New York University Grossman School of Medicine, New York, NY, 10016, USA
| | - Thaddeus Tarpey
- Department of Population Health, Division of Biostatistics, New York University Grossman School of Medicine, New York, NY, 10016, USA
| | - Stella S Yi
- Department of Population Health Section for Health Equity, New York University Grossman School of Medicine, New York, 10016, USA
| | - Richard B Hayes
- Department of Population Health, Division of Epidemiology, New York University Grossman School of Medicine, New York, NY, 10016, USA
| | - Huilin Li
- Department of Population Health, Division of Biostatistics, New York University Grossman School of Medicine, New York, NY, 10016, USA.
| |
Collapse
|
4
|
Gu W, Koh H, Jang H, Lee B, Kang B. MiSurv: an Integrative Web Cloud Platform for User-Friendly Microbiome Data Analysis with Survival Responses. Microbiol Spectr 2023; 11:e0505922. [PMID: 37039671 PMCID: PMC10269532 DOI: 10.1128/spectrum.05059-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 03/12/2023] [Indexed: 04/12/2023] Open
Abstract
Investigators have studied the treatment effects on human health or disease, the treatment effects on human microbiome, and the roles of the microbiome on human health or disease. Especially, in a clinical trial, investigators commonly trace disease status over a lengthy period to survey the sequential disease progression for different treatment groups (e.g., treatment versus placebo, new treatment versus old treatment). Hence, disease responses are often available in the form of survival (i.e., time-to-event) responses stratified by treatment groups. While the recent web cloud platforms have enabled user-friendly microbiome data processing and analytics, there is currently no web cloud platform to analyze microbiome data with survival responses. Therefore, we introduce here an integrative web cloud platform, called MiSurv, for comprehensive microbiome data analysis with survival responses. IMPORTANCE MiSurv consists of a data processing module and its following four data analytic modules: (i) Module 1: Comparative survival analysis between treatment groups, (ii) Module 2: Comparative analysis in microbial composition between treatment groups, (iii) Module 3: Association testing between microbial composition and survival responses, (iv) Module 4: Prediction modeling using microbial taxa on survival responses. We demonstrate its use through an example trial on the effects of antibiotic use on the survival rate against type 1 diabetes (T1D) onset and gut microbiome composition, respectively, and the effects of the gut microbiome on the survival rate against T1D onset. MiSurv is freely available on our web server (http://misurv.micloud.kr) or can alternatively run on the user's local computer (https://github.com/wg99526/MiSurvGit).
Collapse
Affiliation(s)
- Won Gu
- Department of Applied Mathematics and Statistics, The State University of New York, Korea, Incheon, South Korea
| | - Hyunwook Koh
- Department of Applied Mathematics and Statistics, The State University of New York, Korea, Incheon, South Korea
| | - Hyojung Jang
- Department of Applied Mathematics and Statistics, The State University of New York, Korea, Incheon, South Korea
| | - Byungho Lee
- Department of Applied Mathematics and Statistics, The State University of New York, Korea, Incheon, South Korea
| | - Byungkon Kang
- Department of Computer Science, The State University of New York, Korea, Incheon, South Korea
| |
Collapse
|
5
|
Sun H, Wang Y, Xiao Z, Huang X, Wang H, He T, Jiang X. multiMiAT: an optimal microbiome-based association test for multicategory phenotypes. Brief Bioinform 2023; 24:7005163. [PMID: 36702753 DOI: 10.1093/bib/bbad012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 12/31/2022] [Accepted: 01/03/2023] [Indexed: 01/28/2023] Open
Abstract
Microbes can affect the metabolism and immunity of human body incessantly, and the dysbiosis of human microbiome drives not only the occurrence but also the progression of disease (i.e. multiple statuses of disease). Recently, microbiome-based association tests have been widely developed to detect the association between the microbiome and host phenotype. However, the existing methods have not achieved satisfactory performance in testing the association between the microbiome and ordinal/nominal multicategory phenotypes (e.g. disease severity and tumor subtype). In this paper, we propose an optimal microbiome-based association test for multicategory phenotypes, namely, multiMiAT. Specifically, under the multinomial logit model framework, we first introduce a microbiome regression-based kernel association test for multicategory phenotypes (multiMiRKAT). As a data-driven optimal test, multiMiAT then integrates multiMiRKAT, score test and MiRKAT-MC to maintain excellent performance in diverse association patterns. Massive simulation experiments prove the success of our method. Furthermore, multiMiAT is also applied to real microbiome data experiments to detect the association between the gut microbiome and clinical statuses of colorectal cancer as well as for diverse statuses of Clostridium difficile infections.
Collapse
Affiliation(s)
- Han Sun
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
- School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
| | - Yue Wang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
| | - Zhen Xiao
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
- School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
| | - Xiaoyun Huang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
- Collaborative & Innovative Center for Educational Technology, Central China Normal University, Wuhan 430079, China
| | - Haodong Wang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
| | - Tingting He
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
- National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan 430079, China
| | - Xingpeng Jiang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
- National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
6
|
Wojciechowski S, Majchrzak-Górecka M, Biernat P, Odrzywołek K, Pruss Ł, Zych K, Jan Majta, Milanowska-Zabel K. Machine learning on the road to unlocking microbiota's potential for boosting immune checkpoint therapy. Int J Med Microbiol 2022; 312:151560. [PMID: 36113358 DOI: 10.1016/j.ijmm.2022.151560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 07/15/2022] [Accepted: 08/31/2022] [Indexed: 10/14/2022] Open
Abstract
The intestinal microbiota is a complex and diverse ecological community that fulfills multiple functions and substantially impacts human health. Despite its plasticity, unfavorable conditions can cause perturbations leading to so-called dysbiosis, which have been connected to multiple diseases. Unfortunately, understanding the mechanisms underlying the crosstalk between those microorganisms and their host is proving to be difficult. Traditionally used bioinformatic tools have difficulties to fully exploit big data generated for this purpose by modern high throughput screens. Machine Learning (ML) may be a potential means of solving such problems, but it requires diligent application to allow for drawing valid conclusions. This is especially crucial as gaining insight into the mechanistic basis of microbial impact on human health is highly anticipated in numerous fields of study. This includes oncology, where growing amounts of studies implicate the gut ecosystems in both cancerogenesis and antineoplastic treatment outcomes. Based on these reports and first signs of clinical benefits related to microbiota modulation in human trials, hopes are rising for the development of microbiome-derived diagnostics and therapeutics. In this mini-review, we're inspecting analytical approaches used to uncover the role of gut microbiome in immune checkpoint therapy (ICT) with the use of shotgun metagenomic sequencing (SMS) data.
Collapse
Affiliation(s)
| | | | | | - Krzysztof Odrzywołek
- Ardigen, Podole 76, 30-394 Kraków, Poland; Institute of Computer Science, Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków, Poland
| | - Łukasz Pruss
- Ardigen, Podole 76, 30-394 Kraków, Poland; Department of Biochemistry, Molecular Biology and Biotechnology, Faculty of Chemistry, Wroclaw University of Science and Technology, 50-373 Wroclaw, Poland
| | | | - Jan Majta
- Ardigen, Podole 76, 30-394 Kraków, Poland; Department of Computational Biophysics and Bioinformatics, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Krakow, Poland
| | | |
Collapse
|
7
|
Hu Y, Li Y, Satten GA, Hu YJ. Testing microbiome associations with survival times at both the community and individual taxon levels. PLoS Comput Biol 2022; 18:e1010509. [PMID: 36103548 PMCID: PMC9512219 DOI: 10.1371/journal.pcbi.1010509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 09/26/2022] [Accepted: 08/23/2022] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Finding microbiome associations with possibly censored survival times is an important problem, especially as specific taxa could serve as biomarkers for disease prognosis or as targets for therapeutic interventions. The two existing methods for survival outcomes, MiRKAT-S and OMiSA, are restricted to testing associations at the community level and do not provide results at the individual taxon level. An ad hoc approach testing each taxon with a survival outcome using the Cox proportional hazard model may not perform well in the microbiome setting with sparse count data and small sample sizes. METHODS We have previously developed the linear decomposition model (LDM) for testing continuous or discrete outcomes that unifies community-level and taxon-level tests into one framework. Here we extend the LDM to test survival outcomes. We propose to use the Martingale residuals or the deviance residuals obtained from the Cox model as continuous covariates in the LDM. We further construct tests that combine the results of analyzing each set of residuals separately. Finally, we extend PERMANOVA, the most commonly used distance-based method for testing community-level hypotheses, to handle survival outcomes in a similar manner. RESULTS Using simulated data, we showed that the LDM-based tests preserved the false discovery rate for testing individual taxa and had good sensitivity. The LDM-based community-level tests and PERMANOVA-based tests had comparable or better power than MiRKAT-S and OMiSA. An analysis of data on the association of the gut microbiome and the time to acute graft-versus-host disease revealed several dozen associated taxa that would not have been achievable by any community-level test, as well as improved community-level tests by the LDM and PERMANOVA over those obtained using MiRKAT-S and OMiSA. CONCLUSIONS Unlike existing methods, our new methods are capable of discovering individual taxa that are associated with survival times, which could be of important use in clinical settings.
Collapse
Affiliation(s)
- Yingtian Hu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America
| | - Yunxiao Li
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America
| | - Glen A. Satten
- Department of Gynecology and Obstetrics, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Yi-Juan Hu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America
- * E-mail:
| |
Collapse
|
8
|
Wang C, Segal LN, Hu J, Zhou B, Hayes RB, Ahn J, Li H. Microbial risk score for capturing microbial characteristics, integrating multi-omics data, and predicting disease risk. MICROBIOME 2022; 10:121. [PMID: 35932029 PMCID: PMC9354433 DOI: 10.1186/s40168-022-01310-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 06/20/2022] [Indexed: 05/27/2023]
Abstract
BACKGROUND With the rapid accumulation of microbiome-wide association studies, a great amount of microbiome data are available to study the microbiome's role in human disease and advance the microbiome's potential use for disease prediction. However, the unique features of microbiome data hinder its utility for disease prediction. METHODS Motivated from the polygenic risk score framework, we propose a microbial risk score (MRS) framework to aggregate the complicated microbial profile into a summarized risk score that can be used to measure and predict disease susceptibility. Specifically, the MRS algorithm involves two steps: (1) identifying a sub-community consisting of the signature microbial taxa associated with disease and (2) integrating the identified microbial taxa into a continuous score. The first step is carried out using the existing sophisticated microbial association tests and pruning and thresholding method in the discovery samples. The second step constructs a community-based MRS by calculating alpha diversity on the identified sub-community in the validation samples. Moreover, we propose a multi-omics data integration method by jointly modeling the proposed MRS and other risk scores constructed from other omics data in disease prediction. RESULTS Through three comprehensive real-data analyses using the NYU Langone Health COVID-19 cohort, the gut microbiome health index (GMHI) multi-study cohort, and a large type 1 diabetes cohort separately, we exhibit and evaluate the utility of the proposed MRS framework for disease prediction and multi-omics data integration. In addition, the disease-specific MRSs for colorectal adenoma, colorectal cancer, Crohn's disease, and rheumatoid arthritis based on the relative abundances of 5, 6, 12, and 6 microbial taxa, respectively, are created and validated using the GMHI multi-study cohort. Especially, Crohn's disease MRS achieves AUCs of 0.88 (0.85-0.91) and 0.86 (0.78-0.95) in the discovery and validation cohorts, respectively. CONCLUSIONS The proposed MRS framework sheds light on the utility of the microbiome data for disease prediction and multi-omics integration and provides a great potential in understanding the microbiome's role in disease diagnosis and prognosis. Video Abstract.
Collapse
Affiliation(s)
- Chan Wang
- Division of Biostatistics, Department of Population Health, New York University Grossman School of Medicine, New York, NY 10016 USA
| | - Leopoldo N. Segal
- Division of Pulmonary and Critical Care Medicine, New York University Grossman School of Medicine, New York, NY 10017 USA
| | - Jiyuan Hu
- Division of Biostatistics, Department of Population Health, New York University Grossman School of Medicine, New York, NY 10016 USA
| | - Boyan Zhou
- Division of Biostatistics, Department of Population Health, New York University Grossman School of Medicine, New York, NY 10016 USA
| | - Richard B. Hayes
- Division of Epidemiology, Department of Population Health, New York University Grossman School of Medicine, New York, NY 10016 USA
| | - Jiyoung Ahn
- Division of Epidemiology, Department of Population Health, New York University Grossman School of Medicine, New York, NY 10016 USA
| | - Huilin Li
- Division of Biostatistics, Department of Population Health, New York University Grossman School of Medicine, New York, NY 10016 USA
| |
Collapse
|
9
|
Dong TS, Jacobs JP, Agopian V, Pisegna JR, Ayoub W, Durazo F, Enayati P, Sundaram V, Benhammou JN, Noureddin M, Choi G, Lagishetty V, Fiehn O, Goodman MT, Elashoff D, Hussain SK. Duodenal Microbiome and Serum Metabolites Predict Hepatocellular Carcinoma in a Multicenter Cohort of Patients with Cirrhosis. Dig Dis Sci 2022; 67:3831-3841. [PMID: 34799768 PMCID: PMC9287237 DOI: 10.1007/s10620-021-07299-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 10/18/2021] [Indexed: 12/09/2022]
Abstract
BACKGROUND Hepatocellular carcinoma (HCC) is rapidly increasing in the U.S. and is a leading cause of mortality for patients with cirrhosis. Discovering novel biomarkers for risk stratification of HCC is paramount. We examined biomarkers of the gut-liver axis in a prospective multicenter cohort. METHODS Patients with cirrhosis without a history of HCC were recruited between May 2015 and March 2020 and prospectively followed at 3 tertiary care hospitals in Los Angeles. Microbiome analysis was performed on duodenal biopsies and metabolomic analysis was performed on serum samples, collected at the time of enrollment. Optimal microbiome-based survival analysis and Cox proportional hazards regression analysis were used to determine microbiota and metabolite associations with HCC development, respectively. RESULTS A total of 227 participants with liver cirrhosis contributed a total of 459.58 person-years of follow-up, with 14 incident HCC diagnoses. Male sex (HR = 7.06, 95% CI = 1.02-54.86) and baseline hepatic encephalopathy (HE, HR = 4.65, 95% CI = 1.60-13.52) were associated with developing HCC over follow-up. Adjusting for age, sex, baseline HE, and alkaline phosphatase, an increased risk of HCC were observed for participants with the highest versus lowest three quartiles for duodenal Alloprevotella (HR = 3.22, 95% CI = 1.06-9.73) and serum taurocholic acid (HR = 6.87, 95% CI = 2.32-20.27), methionine (HR = 9.97, 95% CI = 3.02-32.94), and methioninesulfoxide (HR = 5.60, 95% CI = 1.84-17.10). Being in the highest quartile for Alloprevotella or methionine had a sensitivity and specificity for developing HCC of 85.71% and 60.56%, respectively, with an odds ratio of 10.92 (95% CI = 2.23-53.48). CONCLUSION Alloprevotella and methionine, methioninesulfoxide, and taurocholic acid predicted future HCC development in a high-risk population of participants with liver cirrhosis.
Collapse
Affiliation(s)
- Tien S Dong
- The Vatche and Tamar Manoukian Division of Digestive Diseases, Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- UCLA Microbiome Center, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Jonathan P Jacobs
- The Vatche and Tamar Manoukian Division of Digestive Diseases, Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- UCLA Microbiome Center, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Division of Gastroenterology, Hepatology and Parenteral Nutrition, VA Greater Los Angeles Healthcare System, Los Angeles, CA, USA
| | - Vatche Agopian
- Department of Surgery, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Joseph R Pisegna
- Division of Gastroenterology, Hepatology and Parenteral Nutrition, VA Greater Los Angeles Healthcare System, Los Angeles, CA, USA
- Department of Medicine and Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Walid Ayoub
- Division of Digestive and Liver Diseases, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Francisco Durazo
- Froedtert Hospital Transplant Center, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Pedram Enayati
- Division of Digestive and Liver Diseases, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Vinay Sundaram
- Division of Digestive and Liver Diseases, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Jihane N Benhammou
- The Vatche and Tamar Manoukian Division of Digestive Diseases, Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Mazen Noureddin
- Division of Digestive and Liver Diseases, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Gina Choi
- The Vatche and Tamar Manoukian Division of Digestive Diseases, Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Department of Surgery, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Venu Lagishetty
- The Vatche and Tamar Manoukian Division of Digestive Diseases, Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- UCLA Microbiome Center, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Oliver Fiehn
- West Coast Metabolomics Center, University of California, Davis, CA, USA
| | - Marc T Goodman
- Cedars-Sinai Cancer and Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - David Elashoff
- Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles, CA, USA
| | - Shehnaz K Hussain
- Department of Public Health Sciences, School of Medicine and Comprehensive Cancer Center, University of California, Davis, Medical Sciences 1C, One Shields Avenue, Davis, CA, 95616, USA.
| |
Collapse
|
10
|
Wang C, Segal LN, Hu J, Zhou B, Hayes R, Ahn J, Li H. Microbial Risk Score for Capturing Microbial Characteristics, Integrating Multi-omics Data, and Predicting Disease Risk. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2022.06.07.495127. [PMID: 35702150 PMCID: PMC9196107 DOI: 10.1101/2022.06.07.495127] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Background With the rapid accumulation of microbiome-wide association studies, a great amount of microbiome data are available to study the microbiome's role in human disease and advance the microbiome's potential use for disease prediction. However, the unique features of microbiome data hinder its utility for disease prediction. Methods Motivated from the polygenic risk score framework, we propose a microbial risk score (MRS) framework to aggregate the complicated microbial profile into a summarized risk score that can be used to measure and predict disease susceptibility. Specifically, the MRS algorithm involves two steps: 1) identifying a sub-community consisting of the signature microbial taxa associated with disease, and 2) integrating the identified microbial taxa into a continuous score. The first step is carried out using the existing sophisticated microbial association tests and pruning and thresholding method in the discovery samples. The second step constructs a community-based MRS by calculating alpha diversity on the identified sub-community in the validation samples. Moreover, we propose a multi-omics data integration method by jointly modeling the proposed MRS and other risk scores constructed from other omics data in disease prediction. Results Through three comprehensive real data analyses using the NYU Langone Health COVID-19 cohort, the gut microbiome health index (GMHI) multi-study cohort, and a large type 1 diabetes cohort separately, we exhibit and evaluate the utility of the proposed MRS framework for disease prediction and multi-omics data integration. In addition, the disease-specific MRSs for colorectal adenoma, colorectal cancer, Crohn's disease, and rheumatoid arthritis based on the relative abundances of 5, 6, 12, and 6 microbial taxa respectively are created and validated using the GMHI multi-study cohort. Especially, Crohn's disease MRS achieves AUCs of 0.88 ([0.85-0.91]) and 0.86 ([0.78-0.95]) in the discovery and validation cohorts, respectively. Conclusions The proposed MRS framework sheds light on the utility of the microbiome data for disease prediction and multi-omics integration, and provides great potential in understanding the microbiome's role in disease diagnosis and prognosis.
Collapse
Affiliation(s)
- Chan Wang
- Division of Biostatistics, Department of Population Health, New York University Grossman School of Medicine, New York, 10016, NY, USA
| | - Leopoldo N. Segal
- Division of Pulmonary and Critical Care Medicine, New York University Grossman School of Medicine, New York, 10017, NY, USA
| | - Jiyuan Hu
- Division of Biostatistics, Department of Population Health, New York University Grossman School of Medicine, New York, 10016, NY, USA
| | - Boyan Zhou
- Division of Biostatistics, Department of Population Health, New York University Grossman School of Medicine, New York, 10016, NY, USA
| | - Richard Hayes
- Division of Epidemiology, Department of Population Health, New York University Grossman School of Medicine, New York, 10016, NY, USA
| | - Jiyoung Ahn
- Division of Epidemiology, Department of Population Health, New York University Grossman School of Medicine, New York, 10016, NY, USA
| | - Huilin Li
- Division of Biostatistics, Department of Population Health, New York University Grossman School of Medicine, New York, 10016, NY, USA
| |
Collapse
|
11
|
Sun H, Huang X, Huo B, Tan Y, He T, Jiang X. Detecting sparse microbial association signals adaptively from longitudinal microbiome data based on generalized estimating equations. Brief Bioinform 2022; 23:6585623. [PMID: 35561307 DOI: 10.1093/bib/bbac149] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Revised: 03/11/2022] [Accepted: 04/02/2022] [Indexed: 12/18/2022] Open
Abstract
The association between the compositions of microbial communities and various host phenotypes is an important research topic. Microbiome association research addresses multiple domains, such as human disease and diet. Statistical methods for testing microbiome-phenotype associations have been studied recently to determine their ability to assess longitudinal microbiome data. However, existing methods fail to detect sparse association signals in longitudinal microbiome data. In this paper, we developed a novel method, namely aGEEMIHC, which is a data-driven adaptive microbiome higher criticism analysis based on generalized estimating equations to detect sparse microbial association signals from longitudinal microbiome data. aGEEMiHC adopts generalized estimating equations framework that fully considers the correlation among different observations from the same subject in longitudinal data. To be robust to diverse correlation structures for longitudinal data, aGEEMiHC integrates multiple microbiome higher criticism analyses based on generalized estimating equations with different working correlation structures. Extensive simulation experiments demonstrate that aGEEMiHC can control the type I error correctly and achieve superior performance according to a statistical power comparison. We also applied it to longitudinal microbiome data with various types of host phenotypes to demonstrate the stability of our method. aGEEMiHC is also utilized for real longitudinal microbiome data, and we found a significant association between the gut microbiome and Crohn's disease. In addition, our method ranks the significant factors associated with the host phenotype to provide potential biomarkers.
Collapse
Affiliation(s)
- Han Sun
- School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China.,Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
| | - Xiaoyun Huang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China.,Collaborative & Innovative Center for Educational Technology, Central China Normal University, Wuhan 430079, China
| | - Ban Huo
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China.,School of Computer, Central China Normal University, Wuhan 430079, China
| | - Yuting Tan
- School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China.,Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
| | - Tingting He
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China.,School of Computer, Central China Normal University, Wuhan 430079, China.,National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan 430079, China
| | - Xingpeng Jiang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China.,School of Computer, Central China Normal University, Wuhan 430079, China.,National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
12
|
Novel application of survival models for predicting microbial community transitions with variable selection for eDNA. Appl Environ Microbiol 2022; 88:e0214621. [PMID: 35138931 DOI: 10.1128/aem.02146-21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Survival analysis is a prolific statistical tool in medicine for inferring risk and time to disease-related events. However, it is under-utilized in microbiome research to predict microbial community mediated events, partly due to the sparsity and high dimensional nature of the data. We advance the application of Cox proportional hazards (Cox PH) survival models to environmental DNA (eDNA) data with feature selection suitable for filtering irrelevant and redundant taxonomic variables. Selection methods are compared in terms of false positives, sensitivity, and survival estimation accuracy in simulation and in a real data setting to forecast harmful cyanobacterial blooms. A novel extension of a method for selecting microbial biomarkers with survival data (SuRFCox) reliably outperforms other methods. We determine Cox PH models with SuRFCox selected predictors are more robust to varied signal, noise, and data correlation structure. SuRFCox also yields the most accurate and consistent prediction of blooms according to cross-validated testing by year over eight different bloom seasons. Identification of common biomarkers among validated survival forecasts over changing conditions has clear biological significance. Survival models with such biomarkers inform risk assessment and provide insight into the causes of critical community transitions. Importance In this paper, we report on a novel approach of selecting microorganisms for model-based prediction of the time to critical microbially-modulated events (e.g., harmful algal blooms, clinical outcomes, community shifts, etc.). Our novel method for identifying biomarkers from large, dynamic communities of microbes has broad utility to environmental and ecological impact risk assessment and public health. Results will also promote theoretical and practical advancements relevant to the biology of specific organisms. To address the unique challenge posed by diverse environmental conditions and sparse microbes, we developed a novel method of selecting predictors for modelling time-to-event data. Competing methods for selecting predictors are rigorously compared to determine which is the most accurate and generalizable. Model forecasts are applied to show suitable predictors can precisely quantify the risk over time of biological events like harmful cyanobacterial blooms.
Collapse
|
13
|
Mohamed N, Litlekalsøy J, Ahmed IA, Martinsen EMH, Furriol J, Javier-Lopez R, Elsheikh M, Gaafar NM, Morgado L, Mundra S, Johannessen AC, Osman TAH, Nginamau ES, Suleiman A, Costea DE. Analysis of Salivary Mycobiome in a Cohort of Oral Squamous Cell Carcinoma Patients From Sudan Identifies Higher Salivary Carriage of Malassezia as an Independent and Favorable Predictor of Overall Survival. Front Cell Infect Microbiol 2021; 11:673465. [PMID: 34712619 PMCID: PMC8547610 DOI: 10.3389/fcimb.2021.673465] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 08/27/2021] [Indexed: 12/20/2022] Open
Abstract
Background Microbial dysbiosis and microbiome-induced inflammation have emerged as important factors in oral squamous cell carcinoma (OSCC) tumorigenesis during the last two decades. However, the “rare biosphere” of the oral microbiome, including fungi, has been sparsely investigated. This study aimed to characterize the salivary mycobiome in a prospective Sudanese cohort of OSCC patients and to explore patterns of diversities associated with overall survival (OS). Materials and Methods Unstimulated saliva samples (n = 72) were collected from patients diagnosed with OSCC (n = 59) and from non-OSCC control volunteers (n = 13). DNA was extracted using a combined enzymatic–mechanical extraction protocol. The salivary mycobiome was assessed using a next-generation sequencing (NGS)-based methodology by amplifying the ITS2 region. The impact of the abundance of different fungal genera on the survival of OSCC patients was analyzed using Kaplan–Meier and Cox regression survival analyses (SPPS). Results Sixteen genera were identified exclusively in the saliva of OSCC patients. Candida, Malassezia, Saccharomyces, Aspergillus, and Cyberlindnera were the most relatively abundant fungal genera in both groups and showed higher abundance in OSCC patients. Kaplan–Meier survival analysis showed higher salivary carriage of the Candida genus significantly associated with poor OS of OSCC patients (Breslow test: p = 0.043). In contrast, the higher salivary carriage of Malassezia showed a significant association with favorable OS in OSCC patients (Breslow test: p = 0.039). The Cox proportional hazards multiple regression model was applied to adjust the salivary carriage of both Candida and Malassezia according to age (p = 0.029) and identified the genus Malassezia as an independent predictor of OS (hazard ratio = 0.383, 95% CI = 0.16–0.93, p = 0.03). Conclusion The fungal compositional patterns in saliva from OSCC patients were different from those of individuals without OSCC. The fungal genus Malassezia was identified as a putative prognostic biomarker and therapeutic target for OSCC.
Collapse
Affiliation(s)
- Nazar Mohamed
- Gade Laboratory for Pathology, Department of Clinical Medicine, and Center for Cancer Biomarkers CCBIO, University of Bergen, Bergen, Norway.,Department of Oral and Maxillofacial Surgery/Department of Basic Sciences, University of Khartoum, Khartoum, Sudan
| | - Jorunn Litlekalsøy
- Gade Laboratory for Pathology, Department of Clinical Medicine, and Center for Cancer Biomarkers CCBIO, University of Bergen, Bergen, Norway
| | - Israa Abdulrahman Ahmed
- Gade Laboratory for Pathology, Department of Clinical Medicine, and Center for Cancer Biomarkers CCBIO, University of Bergen, Bergen, Norway.,Department of Operative Dentistry, University of Science & Technology, Omdurman, Sudan
| | | | - Jessica Furriol
- Department of Nephrology, Haukeland University Hospital, Bergen, Norway
| | - Ruben Javier-Lopez
- Department of Biological Sciences, The Faculty of Mathematics and Natural Sciences, University of Bergen, Bergen, Norway
| | - Mariam Elsheikh
- Department of Oral and Maxillofacial Surgery/Department of Basic Sciences, University of Khartoum, Khartoum, Sudan.,Department of Oral & Maxillofacial Surgery, Khartoum Dental Teaching Hospital, Khartoum, Sudan
| | - Nuha Mohamed Gaafar
- Gade Laboratory for Pathology, Department of Clinical Medicine, and Center for Cancer Biomarkers CCBIO, University of Bergen, Bergen, Norway.,Department of Oral and Maxillofacial Surgery/Department of Basic Sciences, University of Khartoum, Khartoum, Sudan
| | - Luis Morgado
- Section for Genetics and Evolutionary Biology (EvoGene), Department of Biosciences, The Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway
| | - Sunil Mundra
- Section for Genetics and Evolutionary Biology (EvoGene), Department of Biosciences, The Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway.,Department of Biology, College of Science, United Arab Emirates University, Al Ain, Abu Dhabi, United Arab Emirates
| | - Anne Christine Johannessen
- Gade Laboratory for Pathology, Department of Clinical Medicine, and Center for Cancer Biomarkers CCBIO, University of Bergen, Bergen, Norway.,Department of Pathology, Laboratory Clinic, Haukeland University Hospital, Bergen, Norway
| | - Tarig Al-Hadi Osman
- Gade Laboratory for Pathology, Department of Clinical Medicine, and Center for Cancer Biomarkers CCBIO, University of Bergen, Bergen, Norway
| | - Elisabeth Sivy Nginamau
- Gade Laboratory for Pathology, Department of Clinical Medicine, and Center for Cancer Biomarkers CCBIO, University of Bergen, Bergen, Norway.,Department of Pathology, Laboratory Clinic, Haukeland University Hospital, Bergen, Norway
| | - Ahmed Suleiman
- Department of Oral and Maxillofacial Surgery/Department of Basic Sciences, University of Khartoum, Khartoum, Sudan.,Department of Oral & Maxillofacial Surgery, Khartoum Dental Teaching Hospital, Khartoum, Sudan
| | - Daniela Elena Costea
- Gade Laboratory for Pathology, Department of Clinical Medicine, and Center for Cancer Biomarkers CCBIO, University of Bergen, Bergen, Norway.,Department of Pathology, Laboratory Clinic, Haukeland University Hospital, Bergen, Norway
| |
Collapse
|
14
|
Sun H, Huang X, Fu L, Huo B, He T, Jiang X. A powerful adaptive microbiome-based association test for microbial association signals with diverse sparsity levels. J Genet Genomics 2021; 48:851-859. [PMID: 34411712 DOI: 10.1016/j.jgg.2021.08.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 08/06/2021] [Accepted: 08/06/2021] [Indexed: 01/12/2023]
Abstract
The dysbiosis of microbiome may have negative effects on a host phenotype. The microbes related to the host phenotype are regarded as microbial association signals. Recently, statistical methods based on microbiome-phenotype association tests have been extensively developed to detect these association signals. However, the currently available methods do not perform well to detect microbial association signals when dealing with diverse sparsity levels (i.e., sparse, low sparse, non-sparse). Actually, the real association patterns related to different host phenotypes are not unique. Here, we propose a powerful and adaptive microbiome-based association test to detect microbial association signals with diverse sparsity levels, designated as MiATDS. In particular, we define probability degree to measure the associations between microbes and the host phenotype and introduce the adaptive weighted sum of powered score tests by considering both probability degree and phylogenetic information. We design numerous simulation experiments for the task of detecting association signals with diverse sparsity levels to prove the performance of the method. We find that type I error rates can be well-controlled and MiATDS shows superior efficiency on the power. By applying to real data analysis, MiATDS displays reliable practicability too. The R package is available at https://github.com/XiaoyunHuang33/MiATDS.
Collapse
Affiliation(s)
- Han Sun
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China; School of Computer, Central China Normal University, Wuhan 430079, China; School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
| | - Xiaoyun Huang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China; School of Computer, Central China Normal University, Wuhan 430079, China; Collaborative & Innovative Center for Educational Technology, Central China Normal University, Wuhan 430079, China
| | - Lingling Fu
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China; School of Computer, Central China Normal University, Wuhan 430079, China; School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
| | - Ban Huo
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China; School of Computer, Central China Normal University, Wuhan 430079, China
| | - Tingting He
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China; School of Computer, Central China Normal University, Wuhan 430079, China; National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan 430079, China
| | - Xingpeng Jiang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China; School of Computer, Central China Normal University, Wuhan 430079, China; National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan 430079, China.
| |
Collapse
|
15
|
Wang C, Hu J, Blaser MJ, Li H. Microbial trend analysis for common dynamic trend, group comparison, and classification in longitudinal microbiome study. BMC Genomics 2021; 22:667. [PMID: 34525957 PMCID: PMC8442444 DOI: 10.1186/s12864-021-07948-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 08/25/2021] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND The human microbiome is inherently dynamic and its dynamic nature plays a critical role in maintaining health and driving disease. With an increasing number of longitudinal microbiome studies, scientists are eager to learn the comprehensive characterization of microbial dynamics and their implications to the health and disease-related phenotypes. However, due to the challenging structure of longitudinal microbiome data, few analytic methods are available to characterize the microbial dynamics over time. RESULTS We propose a microbial trend analysis (MTA) framework for the high-dimensional and phylogenetically-based longitudinal microbiome data. In particular, MTA can perform three tasks: 1) capture the common microbial dynamic trends for a group of subjects at the community level and identify the dominant taxa; 2) examine whether or not the microbial overall dynamic trends are significantly different between groups; 3) classify an individual subject based on its longitudinal microbial profiling. Our extensive simulations demonstrate that the proposed MTA framework is robust and powerful in hypothesis testing, taxon identification, and subject classification. Our real data analyses further illustrate the utility of MTA through a longitudinal study in mice. CONCLUSIONS The proposed MTA framework is an attractive and effective tool in investigating dynamic microbial pattern from longitudinal microbiome studies.
Collapse
Affiliation(s)
- Chan Wang
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, 10016 NY USA
| | - Jiyuan Hu
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, 10016 NY USA
| | - Martin J. Blaser
- Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, 08854-8021 NJ USA
| | - Huilin Li
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, 10016 NY USA
| |
Collapse
|
16
|
Koh H, Tuddenham S, Sears CL, Zhao N. Meta-analysis methods for multiple related markers: Applications to microbiome studies with the results on multiple α-diversity indices. Stat Med 2021; 40:2859-2876. [PMID: 33768631 DOI: 10.1002/sim.8940] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 12/18/2020] [Accepted: 02/10/2021] [Indexed: 11/10/2022]
Abstract
Meta-analysis is a practical and powerful analytic tool that enables a unified statistical inference across the results from multiple studies. Notably, researchers often report the results on multiple related markers in each study (eg, various α-diversity indices in microbiome studies). However, univariate meta-analyses are limited to combining the results on a single common marker at a time, whereas existing multivariate meta-analyses are limited to the situations where marker-by-marker correlations are given in each study. Thus, here we introduce two meta-analysis methods, multi-marker meta-analysis (mMeta) and adaptive multi-marker meta-analysis (aMeta), to combine multiple studies throughout multiple related markers with no priori results on marker-by-marker correlations. mMeta is a statistical estimator for a pooled estimate and its SE across all the studies and markers, whereas aMeta is a statistical test based on the test statistic of the minimum P-value among marker-specific meta-analyses. mMeta conducts both effect estimation and hypothesis testing based on a weighted average of marker-specific pooled estimates while estimating marker-by-marker correlations non-parametrically via permutations, yet its power is only moderate. In contrast, aMeta closely approaches the highest power among marker-specific meta-analyses, yet it is limited to hypothesis testing. While their applications can be broader, we illustrate the use of mMeta and aMeta to combine microbiome studies throughout multiple α-diversity indices. We evaluate mMeta and aMeta in silico and apply them to real microbiome studies on the disparity in α-diversity by the status of human immunodeficiency virus (HIV) infection. The R package for mMeta and aMeta is freely available at https://github.com/hk1785/mMeta.
Collapse
Affiliation(s)
- Hyunwook Koh
- Department of Applied Mathematics and Statistics, The State University of New York, Korea, Incheon, South Korea
| | - Susan Tuddenham
- Department of Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| | - Cynthia L Sears
- Department of Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| |
Collapse
|
17
|
Luna PN, Mansbach JM, Shaw CA. A joint modeling approach for longitudinal microbiome data improves ability to detect microbiome associations with disease. PLoS Comput Biol 2020; 16:e1008473. [PMID: 33315858 PMCID: PMC7769610 DOI: 10.1371/journal.pcbi.1008473] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 12/28/2020] [Accepted: 10/27/2020] [Indexed: 02/02/2023] Open
Abstract
Changes in the composition of the microbiome over time are associated with myriad human illnesses. Unfortunately, the lack of analytic techniques has hindered researchers' ability to quantify the association between longitudinal microbial composition and time-to-event outcomes. Prior methodological work developed the joint model for longitudinal and time-to-event data to incorporate time-dependent biomarker covariates into the hazard regression approach to disease outcomes. The original implementation of this joint modeling approach employed a linear mixed effects model to represent the time-dependent covariates. However, when the distribution of the time-dependent covariate is non-Gaussian, as is the case with microbial abundances, researchers require different statistical methodology. We present a joint modeling framework that uses a negative binomial mixed effects model to determine longitudinal taxon abundances. We incorporate these modeled microbial abundances into a hazard function with a parameterization that not only accounts for the proportional nature of microbiome data, but also generates biologically interpretable results. Herein we demonstrate the performance improvements of our approach over existing alternatives via simulation as well as a previously published longitudinal dataset studying the microbiome during pregnancy. The results demonstrate that our joint modeling framework for longitudinal microbiome count data provides a powerful methodology to uncover associations between changes in microbial abundances over time and the onset of disease. This method offers the potential to equip researchers with a deeper understanding of the associations between longitudinal microbial composition changes and disease outcomes. This new approach could potentially lead to new diagnostic biomarkers or inform clinical interventions to help prevent or treat disease.
Collapse
Affiliation(s)
- Pamela N. Luna
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Statistics, Rice University, Houston, Texas, United States of America
| | - Jonathan M. Mansbach
- Department of Pediatrics, Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Chad A. Shaw
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Statistics, Rice University, Houston, Texas, United States of America
| |
Collapse
|
18
|
Wilson N, Zhao N, Zhan X, Koh H, Fu W, Chen J, Li H, Wu MC, Plantinga AM. MiRKAT: kernel machine regression-based global association tests for the microbiome. Bioinformatics 2020; 37:1595-1597. [PMID: 33225342 PMCID: PMC8495888 DOI: 10.1093/bioinformatics/btaa951] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 10/13/2020] [Accepted: 10/28/2020] [Indexed: 11/14/2022] Open
Abstract
SUMMARY Distance-based tests of microbiome beta diversity are an integral part of many microbiome analyses. MiRKAT enables distance-based association testing with a wide variety of outcome types, including continuous, binary, censored time-to-event, multivariate, correlated and high-dimensional outcomes. Omnibus tests allow simultaneous consideration of multiple distance and dissimilarity measures, providing higher power across a range of simulation scenarios. Two measures of effect size, a modified R-squared coefficient and a kernel RV coefficient, are incorporated to allow comparison of effect sizes across multiple kernels. AVAILABILITY AND IMPLEMENTATION MiRKAT is available on CRAN as an R package. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nehemiah Wilson
- Department of Mathematics and Statistics, Williams
College, Williamstown, MA 01267, USA
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg
School of Public Health, Baltimore, MD 21205, USA
| | - Xiang Zhan
- Department of Public Health Sciences, Penn State
College of Medicine, Hershey, PA 17033, USA
| | - Hyunwook Koh
- Department of Applied Mathematics and Statistics,
The State University of New York, Korea (SUNY Korea), Incheon
21985, South Korea
| | - Weijia Fu
- Institute for Health Metrics and Evaluation,
University of Washington, Seattle, WA 98121, USA
| | - Jun Chen
- Division of Biomedical Statistics and Informatics,
Department of Health Sciences Research, Mayo Clinic, Rochester, MN
55905, USA
| | - Hongzhe Li
- Department of Biostatistics, Epidemiology and
Informatics, Perelman School of Medicine, University of
Pennsylvania, Philadelphia, PA 19104, USA
| | - Michael C Wu
- Public Health Sciences Division, Biostatistics and
Biomathematics Program, Fred Hutchinson Cancer Research Center,
Seattle, WA 98109, USA
| | - Anna M Plantinga
- Department of Mathematics and Statistics, Williams
College, Williamstown, MA 01267, USA,To whom correspondence should be addressed.
| |
Collapse
|
19
|
Xia Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 171:309-491. [PMID: 32475527 DOI: 10.1016/bs.pmbts.2020.04.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.
Collapse
Affiliation(s)
- Yinglin Xia
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, United States.
| |
Collapse
|
20
|
Koh H, Zhao N. A powerful microbial group association test based on the higher criticism analysis for sparse microbial association signals. MICROBIOME 2020; 8:63. [PMID: 32393397 PMCID: PMC7216722 DOI: 10.1186/s40168-020-00834-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 03/23/2020] [Indexed: 05/05/2023]
Abstract
BACKGROUND In human microbiome studies, it is crucial to evaluate the association between microbial group (e.g., community or clade) composition and a host phenotype of interest. In response, a number of microbial group association tests have been proposed, which account for the unique features of the microbiome data (e.g., high-dimensionality, compositionality, phylogenetic relationship). These tests generally fall in the class of aggregation tests which amplify the overall group association by combining all the underlying microbial association signals, and, therefore, they are powerful when many microbial species are associated with a given host phenotype (i.e., low sparsity). However, in practice, the microbial association signals can be highly sparse, and this is especially the situation where we have a difficulty to discover the microbial group association. METHODS Here, we introduce a powerful microbial group association test for sparse microbial association signals, namely, microbiome higher criticism analysis (MiHC). MiHC is a data-driven omnibus test taken in a search space spanned by tailoring the higher criticism test to incorporate phylogenetic information and/or modulate sparsity levels and including the Simes test for excessively high sparsity levels. Therefore, MiHC robustly adapts to diverse phylogenetic relevance and sparsity levels. RESULTS Our simulations show that MiHC maintains a high power at different phylogenetic relevance and sparsity levels with correct type I error controls. We also apply MiHC to four real microbiome datasets to test the association between respiratory tract microbiome and smoking status, the association between the infant's gut microbiome and delivery mode, the association between the gut microbiome and type 1 diabetes status, and the association between the gut microbiome and human immunodeficiency virus status. CONCLUSIONS In practice, the true underlying association pattern on the extent of phylogenetic relevance and sparsity is usually unknown. Therefore, MiHC can be a useful analytic tool because of its high adaptivity to diverse phylogenetic relevance and sparsity levels. MiHC can be implemented in the R computing environment using our software package freely available at https://github.com/hk1785/MiHC.
Collapse
Affiliation(s)
- Hyunwook Koh
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 615 North Wolfe Street, Office E3622, Baltimore, MD, 21205, USA
| | - Ni Zhao
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 615 North Wolfe Street, Office E3622, Baltimore, MD, 21205, USA.
| |
Collapse
|
21
|
Peters BA, Wilson M, Moran U, Pavlick A, Izsak A, Wechter T, Weber JS, Osman I, Ahn J. Relating the gut metagenome and metatranscriptome to immunotherapy responses in melanoma patients. Genome Med 2019; 11:61. [PMID: 31597568 PMCID: PMC6785875 DOI: 10.1186/s13073-019-0672-4] [Citation(s) in RCA: 148] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 09/12/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Recent evidence suggests that immunotherapy efficacy in melanoma is modulated by gut microbiota. Few studies have examined this phenomenon in humans, and none have incorporated metatranscriptomics, important for determining expression of metagenomic functions in the microbial community. METHODS In melanoma patients undergoing immunotherapy, gut microbiome was characterized in pre-treatment stool using 16S rRNA gene and shotgun metagenome sequencing (n = 27). Transcriptional expression of metagenomic pathways was confirmed with metatranscriptome sequencing in a subset of 17. We examined associations of taxa and metagenomic pathways with progression-free survival (PFS) using 500 × 10-fold cross-validated elastic-net penalized Cox regression. RESULTS Higher microbial community richness was associated with longer PFS in 16S and shotgun data (p < 0.05). Clustering based on overall microbiome composition divided patients into three groups with differing PFS; the low-risk group had 99% lower risk of progression than the high-risk group at any time during follow-up (p = 0.002). Among the species selected in regression, abundance of Bacteroides ovatus, Bacteroides dorei, Bacteroides massiliensis, Ruminococcus gnavus, and Blautia producta were related to shorter PFS, and Faecalibacterium prausnitzii, Coprococcus eutactus, Prevotella stercorea, Streptococcus sanguinis, Streptococcus anginosus, and Lachnospiraceae bacterium 3 1 46FAA to longer PFS. Metagenomic functions related to PFS that had correlated metatranscriptomic expression included risk-associated pathways of L-rhamnose degradation, guanosine nucleotide biosynthesis, and B vitamin biosynthesis. CONCLUSIONS This work adds to the growing evidence that gut microbiota are related to immunotherapy outcomes, and identifies, for the first time, transcriptionally expressed metagenomic pathways related to PFS. Further research is warranted on microbial therapeutic targets to improve immunotherapy outcomes.
Collapse
Affiliation(s)
- Brandilyn A Peters
- Department of Population Health, NYU School of Medicine, New York, NY, 10016, USA
| | - Melissa Wilson
- Department of Medicine, NYU School of Medicine, New York, NY, USA
- NYU Perlmutter Cancer Center, New York, NY, USA
- Present Address: Sidney Kimmel Cancer Center, Thomas Jefferson University, Philadelphia, PA, USA
| | - Una Moran
- NYU Perlmutter Cancer Center, New York, NY, USA
- The Ronald O. Perelman Department of Dermatology, NYU School of Medicine, New York, NY, USA
| | - Anna Pavlick
- Department of Medicine, NYU School of Medicine, New York, NY, USA
- NYU Perlmutter Cancer Center, New York, NY, USA
| | - Allison Izsak
- The Ronald O. Perelman Department of Dermatology, NYU School of Medicine, New York, NY, USA
| | - Todd Wechter
- The Ronald O. Perelman Department of Dermatology, NYU School of Medicine, New York, NY, USA
| | - Jeffrey S Weber
- Department of Medicine, NYU School of Medicine, New York, NY, USA
- NYU Perlmutter Cancer Center, New York, NY, USA
| | - Iman Osman
- Department of Medicine, NYU School of Medicine, New York, NY, USA
- NYU Perlmutter Cancer Center, New York, NY, USA
- The Ronald O. Perelman Department of Dermatology, NYU School of Medicine, New York, NY, USA
| | - Jiyoung Ahn
- Department of Population Health, NYU School of Medicine, New York, NY, 10016, USA.
- NYU Perlmutter Cancer Center, New York, NY, USA.
| |
Collapse
|
22
|
Plantinga AM, Chen J, Jenq RR, Wu MC. pldist: ecological dissimilarities for paired and longitudinal microbiome association analysis. Bioinformatics 2019; 35:3567-3575. [PMID: 30863868 PMCID: PMC6761933 DOI: 10.1093/bioinformatics/btz120] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Revised: 01/27/2019] [Accepted: 02/13/2019] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION The human microbiome is notoriously variable across individuals, with a wide range of 'healthy' microbiomes. Paired and longitudinal studies of the microbiome have become increasingly popular as a way to reduce unmeasured confounding and to increase statistical power by reducing large inter-subject variability. Statistical methods for analyzing such datasets are scarce. RESULTS We introduce a paired UniFrac dissimilarity that summarizes within-individual (or within-pair) shifts in microbiome composition and then compares these compositional shifts across individuals (or pairs). This dissimilarity depends on a novel transformation of relative abundances, which we then extend to more than two time points and incorporate into several phylogenetic and non-phylogenetic dissimilarities. The data transformation and resulting dissimilarities may be used in a wide variety of downstream analyses, including ordination analysis and distance-based hypothesis testing. Simulations demonstrate that tests based on these dissimilarities retain appropriate type 1 error and high power. We apply the method in two real datasets. AVAILABILITY AND IMPLEMENTATION The R package pldist is available on GitHub at https://github.com/aplantin/pldist. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anna M Plantinga
- Department of Mathematics and Statistics, Williams College, Williamstown, MA, USA,To whom correspondence should be addressed. E-mail: or
| | - Jun Chen
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA,Microbiome Program, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA
| | - Robert R Jenq
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA,Department of Stem Cell Transplantation, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Michael C Wu
- Department of Biostatistics and Biomathematics Program, Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA,Department of Biostatistics, University of Washington, Seattle, WA, USA,To whom correspondence should be addressed. E-mail: or
| |
Collapse
|
23
|
Koh H, Li Y, Zhan X, Chen J, Zhao N. A Distance-Based Kernel Association Test Based on the Generalized Linear Mixed Model for Correlated Microbiome Studies. Front Genet 2019; 10:458. [PMID: 31156711 PMCID: PMC6532659 DOI: 10.3389/fgene.2019.00458] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Accepted: 04/30/2019] [Indexed: 12/12/2022] Open
Abstract
Researchers have increasingly employed family-based or longitudinal study designs to survey the roles of the human microbiota on diverse host traits of interest (e. g., health/disease status, medical intervention, behavioral/environmental factor). Such study designs are useful to properly control for potential confounders or the sensitive changes in microbial composition and host traits. However, downstream data analysis is challenging because the measurements within clusters (e.g., families, subjects including repeated measures) tend to be correlated so that statistical methods based on the independence assumption cannot be used. For the correlated microbiome studies, a distance-based kernel association test based on the linear mixed model, namely, correlated sequence kernel association test (cSKAT), has recently been introduced. cSKAT models the microbial community using an ecological distance (e.g., Jaccard/Bray-Curtis dissimilarity, unique fraction distance), and then tests its association with a host trait. Similar to prior distance-based kernel association tests (e.g., microbiome regression-based kernel association test), the use of ecological distances gives a high power to cSKAT. However, cSKAT is limited to handling Gaussian traits [e.g., body mass index (BMI)] and a single chosen distance measure at a time. The power of cSKAT differs a lot by which distance measure is used. However, choosing an optimal distance measure is challenging because of the unknown nature of the true association. Here, we introduce a distance-based kernel association test based on the generalized linear mixed model (GLMM), namely, GLMM-MiRKAT, to handle diverse types of traits, such as Gaussian (e.g., BMI), Binomial (e.g., disease status, treatment/placebo) or Poisson (e.g., number of tumors/treatments) traits. We further propose a data-driven adaptive test of GLMM-MiRKAT, namely, aGLMM-MiRKAT, so as to avoid the need to choose the optimal distance measure. Our extensive simulations demonstrate that aGLMM-MiRKAT is robustly powerful while correctly controlling type I error rates. We apply aGLMM-MiRKAT to real familial and longitudinal microbiome data, where we discover significant disparity in microbial community composition by BMI status and the frequency of antibiotic use. In summary, aGLMM-MiRKAT is a useful analytical tool with its broad applicability to diverse types of traits, robust power and valid statistical inference.
Collapse
Affiliation(s)
- Hyunwook Koh
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Yutong Li
- School of Physics, Peking University, Beijing, China
| | - Xiang Zhan
- Department of Public Health Sciences, Pennsylvania State University, Hershey, PA, United States
| | - Jun Chen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| |
Collapse
|
24
|
Peters BA, Hayes RB, Goparaju C, Reid C, Pass HI, Ahn J. The Microbiome in Lung Cancer Tissue and Recurrence-Free Survival. Cancer Epidemiol Biomarkers Prev 2019; 28:731-740. [PMID: 30733306 DOI: 10.1158/1055-9965.epi-18-0966] [Citation(s) in RCA: 115] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Revised: 11/05/2018] [Accepted: 01/28/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Human microbiota have many functions that could contribute to cancer initiation and/or progression at local sites, yet the relation of the lung microbiota to lung cancer prognosis has not been studied. METHODS In a pilot study, 16S rRNA gene sequencing was performed on paired lung tumor and remote normal samples from the same lobe/segment in 19 patients with non-small cell lung cancer (NSCLC). We explored associations of tumor or normal tissue microbiome diversity and composition with recurrence-free (RFS) and disease-free survival (DFS), and compared microbiome diversity and composition between paired tumor and normal samples. RESULTS Higher richness and diversity in normal tissue were associated with reduced RFS (richness P = 0.08, Shannon index P = 0.03) and DFS (richness P = 0.03, Shannon index P = 0.02), as was normal tissue overall microbiome composition (Bray-Curtis P = 0.09 for RFS and P = 0.02 for DFS). In normal tissue, greater abundance of family Koribacteraceae was associated with increased RFS and DFS, whereas greater abundance of families Bacteroidaceae, Lachnospiraceae, and Ruminococcaceae were associated with reduced RFS or DFS (P < 0.05). Tumor tissue diversity and overall composition were not associated with RFS or DFS. Tumor tissue had lower richness and diversity (P ≤ 0.0001) than paired normal tissue, though overall microbiome composition did not differ between the paired samples. CONCLUSIONS We demonstrate, for the first time, a potential relationship between the normal lung microbiota and lung cancer prognosis, which requires confirmation in a larger study. IMPACT Definition of bacterial biomarkers of prognosis may lead to improved survival outcomes for patients with lung cancer.
Collapse
Affiliation(s)
- Brandilyn A Peters
- Department of Population Health, NYU School of Medicine, New York, New York
| | - Richard B Hayes
- Department of Population Health, NYU School of Medicine, New York, New York
- NYU Perlmutter Cancer Center, New York, New York
| | - Chandra Goparaju
- Department of Cardiothoracic Surgery, NYU School of Medicine, New York, New York
| | - Christopher Reid
- Department of Cardiothoracic Surgery, NYU School of Medicine, New York, New York
| | - Harvey I Pass
- NYU Perlmutter Cancer Center, New York, New York
- Department of Cardiothoracic Surgery, NYU School of Medicine, New York, New York
| | - Jiyoung Ahn
- Department of Population Health, NYU School of Medicine, New York, New York.
- NYU Perlmutter Cancer Center, New York, New York
| |
Collapse
|
25
|
Relationship Between MiRKAT and Coefficient of Determination in Similarity Matrix Regression. Processes (Basel) 2019. [DOI: 10.3390/pr7020079] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
The Microbiome Regression-based Kernel Association Test (MiRKAT) is widely used in testing for the association between microbiome compositions and an outcome of interest. The MiRKAT statistic is derived as a variance-component score test in a kernel machine regression-based generalized linear mixed model. In this brief report, we show that the MiRKAT statistic is proportional to the R 2 (coefficient of determination) statistic in a similarity matrix regression, which characterizes the fraction of variability in outcome similarity, explained by microbiome similarity (up to a constant).
Collapse
|
26
|
An adaptive microbiome α-diversity-based association analysis method. Sci Rep 2018; 8:18026. [PMID: 30575793 PMCID: PMC6303306 DOI: 10.1038/s41598-018-36355-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2018] [Accepted: 11/19/2018] [Indexed: 12/12/2022] Open
Abstract
To relate microbial diversity with various host traits of interest (e.g., phenotypes, clinical interventions, environmental factors) is a critical step for generic assessments about the disparity in human microbiota among different populations. The performance of the current item-by-item α-diversity-based association tests is sensitive to the choice of α-diversity metric and unpredictable due to the unknown nature of the true association. The approach of cherry-picking a test for the smallest p-value or the largest effect size among multiple item-by-item analyses is not even statistically valid due to the inherent multiplicity issue. Investigators have recently introduced microbial community-level association tests while blustering statistical power increase of their proposed methods. However, they are purely a test for significance which does not provide any estimation facilities on the effect direction and size of a microbial community; hence, they are not in practical use. Here, I introduce a novel microbial diversity association test, namely, adaptive microbiome α-diversity-based association analysis (aMiAD). aMiAD simultaneously tests the significance and estimates the effect score of the microbial diversity on a host trait, while robustly maintaining high statistical power and accurate estimation with no issues in validity.
Collapse
|
27
|
Hu J, Koh H, He L, Liu M, Blaser MJ, Li H. A two-stage microbial association mapping framework with advanced FDR control. MICROBIOME 2018; 6:131. [PMID: 30045760 PMCID: PMC6060480 DOI: 10.1186/s40168-018-0517-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Accepted: 07/11/2018] [Indexed: 05/31/2023]
Abstract
BACKGROUND In microbiome studies, it is important to detect taxa which are associated with pathological outcomes at the lowest definable taxonomic rank, such as genus or species. Traditionally, taxa at the target rank are tested for individual association, followed by the Benjamini-Hochberg (BH) procedure to control for false discovery rate (FDR). However, this approach neglects the dependence structure among taxa and may lead to conservative results. The taxonomic tree of microbiome data represents alignment from phylum to species rank and characterizes evolutionary relationships across microbial taxa. Taxa that are closer on the tree usually have similar responses to the exposure (environment). The statistical power in microbial association tests can be enhanced by efficiently employing the prior evolutionary information via the taxonomic tree. METHODS We propose a two-stage microbial association mapping framework (massMap) which uses grouping information from the taxonomic tree to strengthen statistical power in association tests at the target rank. massMap first screens the association of taxonomic groups at a pre-selected higher taxonomic rank using a powerful microbial group test OMiAT. The method then proceeds to test the association for each candidate taxon at the target rank within the significant taxonomic groups identified in the first stage. Hierarchical BH (HBH) and selected subset testing (SST) procedures are evaluated to control the FDR for the two-stage structured tests. RESULTS Our simulations show that massMap incorporating OMiAT and the advanced FDR controlling methodologies largely alleviates the multiplicity issue. It is statistically more powerful than the traditional association mapping directly at the target rank while controlling the FDR at desired levels under most scenarios. In our real data analyses, massMap detects more or the same amount of associated species with smaller adjusted p values compared to the traditional method, which further illustrates the efficiency of the proposed framework. The R package of massMap is publicly available at https://sites.google.com/site/huilinli09/software and https://github.com/JiyuanHu/ . CONCLUSIONS massMap is a novel microbial association mapping framework and achieves additional efficiency by utilizing the intrinsic taxonomic structure of microbiome data.
Collapse
Affiliation(s)
- Jiyuan Hu
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY 10016 USA
- Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, 200433 China
| | - Hyunwook Koh
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY 10016 USA
| | - Linchen He
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY 10016 USA
| | - Menghan Liu
- Department of Medicine, New York University School of Medicine, New York, NY 10016 USA
| | - Martin J. Blaser
- Department of Medicine, New York University School of Medicine, New York, NY 10016 USA
| | - Huilin Li
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY 10016 USA
| |
Collapse
|