1
|
Shi K, Liu Q, Ji Q, He Q, Zhao XM. MicroHDF: predicting host phenotypes with metagenomic data using a deep forest-based framework. Brief Bioinform 2024; 25:bbae530. [PMID: 39446191 PMCID: PMC11500453 DOI: 10.1093/bib/bbae530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 09/25/2024] [Accepted: 10/07/2024] [Indexed: 10/25/2024] Open
Abstract
The gut microbiota plays a vital role in human health, and significant effort has been made to predict human phenotypes, especially diseases, with the microbiota as a promising indicator or predictor with machine learning (ML) methods. However, the accuracy is impacted by a lot of factors when predicting host phenotypes with the metagenomic data, e.g. small sample size, class imbalance, high-dimensional features, etc. To address these challenges, we propose MicroHDF, an interpretable deep learning framework to predict host phenotypes, where a cascade layers of deep forest units is designed for handling sample class imbalance and high dimensional features. The experimental results show that the performance of MicroHDF is competitive with that of existing state-of-the-art methods on 13 publicly available datasets of six different diseases. In particular, it performs best with the area under the receiver operating characteristic curve of 0.9182 ± 0.0098 and 0.9469 ± 0.0076 for inflammatory bowel disease (IBD) and liver cirrhosis, respectively. Our MicroHDF also shows better performance and robustness in cross-study validation. Furthermore, MicroHDF is applied to two high-risk diseases, IBD and autism spectrum disorder, as case studies to identify potential biomarkers. In conclusion, our method provides an effective and reliable prediction of the host phenotype and discovers informative features with biological insights.
Collapse
Affiliation(s)
- Kai Shi
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, Gaungxi 541004, China
- Guangxi Key Laboratory of Embedded Technology and Intelligent Systems, Guilin University of Technology, Guilin, Gaungxi 541004, China
| | - Qiaohui Liu
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, Gaungxi 541004, China
| | - Qingrong Ji
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, Gaungxi 541004, China
| | - Qisheng He
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, Gaungxi 541004, China
| | - Xing-Ming Zhao
- Huzhou Central Hospital, Affiliated Central Hospital Huzhou University, Huzhou, Zhejiang 313000, China
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| |
Collapse
|
2
|
Nagpal S, Mande SS, Hooda H, Dutta U, Taneja B. EnsembleSeq: a workflow towards real-time, rapid, and simultaneous multi-kingdom-amplicon sequencing for holistic and resource-effective microbiome research at scale. Microbiol Spectr 2024; 12:e0415023. [PMID: 38687072 PMCID: PMC11237516 DOI: 10.1128/spectrum.04150-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 03/30/2024] [Indexed: 05/02/2024] Open
Abstract
Bacterial communities are often concomitantly present with numerous microorganisms in the human body and other natural environments. Amplicon-based microbiome studies have generally paid skewed attention, that too at a rather shallow genus level resolution, to the highly abundant bacteriome, with interest now forking toward the other microorganisms, particularly fungi. Given the generally sparse abundance of other microbes in the total microbiome, simultaneous sequencing of amplicons targeting multiple microbial kingdoms could be possible even with full multiplexing. Guiding studies are currently needed for performing and monitoring multi-kingdom-amplicon sequencing and data capture at scale. Aiming to address these gaps, amplification of full-length bacterial 16S rRNA gene and entire fungal internal-transcribed spacer (ITS) region was performed for human saliva samples (n = 96, including negative and positive controls). Combined amplicon DNA libraries were prepared for nanopore sequencing using a major fraction of 16S molecules and a minor fraction of ITS amplicons. Sequencing was performed in a single run of an R10.4.1 flow cell employing the latest V14 chemistry. An approach for real-time monitoring of the species saturation using dynamic rarefaction was designed as a guiding determinant of optimal run time. Real-time saturation monitoring for both bacterial and fungal species enabled the completion of sequencing within 30 hours, utilizing less than 60% of the total nanopores. Approximately 5 million high quality (HQ) taxonomically assigned reads were generated (~4.2 million bacterial and 0.7 million fungal), providing a wider (beyond bacteriome) snapshot of human oral microbiota at species-level resolution. Among the more than 400 bacterial and 240 fungal species identified in the studied samples, the species of Streptococcus (e.g., Streptococcus mitis and Streptococcus oralis) and Candida (e.g., Candida albicans and Candida tropicalis) were observed to be the dominating microbes in the oral cavity, respectively. This conformed well with the previous reports of the human oral microbiota. EnsembleSeq provides a proof-of-concept toward the identification of both fungal and bacterial species simultaneously in a single fully multiplexed nanopore sequencing run in a time- and resource-effective manner. Details of this workflow, along with the associated codebase, are provided to enable large-scale application for a holistic species-level microbiome study. IMPORTANCE Human microbiome is a sum total of a variety of microbial genomes (including bacteria, fungi, protists, viruses, etc.) present in and on the human body. Yet, a majority of amplicon-based microbiome studies have largely remained skewed toward bacteriome as an assumed proxy of the total microbiome, primarily at a shallow genus level. Cost, time, effort, data quality/management, and importantly lack of guiding studies often limit progress in the direction of moving beyond bacteriome. Here, EnsembleSeq presents a proof-of-concept toward concomitantly capturing multiple-kingdoms of microorganisms (bacteriome and mycobiome) in a fully multiplexed (96-sample) single run of long-read amplicon sequencing. In addition, the workflow captures dynamic tracking of species-level saturation in a time- and resource-effective manner.
Collapse
Affiliation(s)
- Sunil Nagpal
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
- TCS Research, Tata Consultancy Services Ltd, Pune, India
| | | | - Harish Hooda
- Department of Gastroenterology, Post Graduate Institute of Medical Education and Research, Chandigarh, India
| | - Usha Dutta
- Department of Gastroenterology, Post Graduate Institute of Medical Education and Research, Chandigarh, India
| | - Bhupesh Taneja
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
3
|
Challa A, Maras JS, Nagpal S, Tripathi G, Taneja B, Kachhawa G, Sood S, Dhawan B, Acharya P, Upadhyay AD, Yadav M, Sharma R, Bajpai M, Gupta S. Multi-omics analysis identifies potential microbial and metabolite diagnostic biomarkers of bacterial vaginosis. J Eur Acad Dermatol Venereol 2024; 38:1152-1165. [PMID: 38284174 DOI: 10.1111/jdv.19805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 11/06/2023] [Indexed: 01/30/2024]
Abstract
BACKGROUND Bacterial vaginosis (BV) is a common clinical manifestation of a perturbed vaginal ecology associated with adverse sexual and reproductive health outcomes if left untreated. The existing diagnostic modalities are either cumbersome or require skilled expertise, warranting alternate tests. Application of machine-learning tools to heterogeneous and high-dimensional multi-omics datasets finds promising potential in data integration and may aid biomarker discovery. OBJECTIVES The present study aimed to evaluate the potential of the microbiome and metabolome-derived biomarkers in BV diagnosis. Interpretable machine-learning algorithms were used to evaluate the utility of an integrated-omics-derived classification model. METHODS Vaginal samples obtained from reproductive-age group women with (n = 40) and without BV (n = 40) were subjected to 16S rRNA amplicon sequencing and LC-MS-based metabolomics. The vaginal microbiome and metabolome were characterized, and machine-learning analysis was performed to build a classification model using biomarkers with the highest diagnostic accuracy. RESULTS Microbiome-based diagnostic model exhibited a ROC-AUC (10-fold CV) of 0.84 ± 0.21 and accuracy of 0.79 ± 0.18, and important features were Aerococcus spp., Mycoplasma hominis, Sneathia spp., Lactobacillus spp., Prevotella spp., Gardnerella spp. and Fannyhessea vaginae. The metabolome-derived model displayed superior performance with a ROC-AUC of 0.97 ± 0.07 and an accuracy of 0.92 ± 0.08. Beta-leucine, methylimidazole acetaldehyde, dimethylethanolamine, L-arginine and beta cortol were among key predictive metabolites for BV. A predictive model combining both microbial and metabolite features exhibited a high ROC-AUC of 0.97 ± 0.07 and accuracy of 0.94 ± 0.08 with diagnostic performance only slightly superior to the metabolite-based model. CONCLUSION Application of machine-learning tools to multi-omics datasets aid biomarker discovery with high predictive performance. Metabolome-derived classification models were observed to have superior diagnostic performance in predicting BV than microbiome-based biomarkers.
Collapse
Affiliation(s)
- A Challa
- Department of Dermatology and Venereology, All India Institute of Medical Sciences, New Delhi, India
| | - J S Maras
- Department of Molecular and Cellular Medicine, Institute of Liver and Biliary Sciences, New Delhi, India
| | - S Nagpal
- TCS Research, Tata Consultancy Services Ltd, Pune, India
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - G Tripathi
- Department of Molecular and Cellular Medicine, Institute of Liver and Biliary Sciences, New Delhi, India
| | - B Taneja
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - G Kachhawa
- Department of Obstetrics and Gynaecology, All India Institute of Medical Sciences, New Delhi, India
| | - S Sood
- Department of Microbiology, All India Institute of Medical Sciences, New Delhi, India
| | - B Dhawan
- Department of Microbiology, All India Institute of Medical Sciences, New Delhi, India
| | - P Acharya
- Department of Biochemistry, All India Institute of Medical Sciences, New Delhi, India
| | - A D Upadhyay
- Department of Biostatistics, All India Institute of Medical Sciences, New Delhi, India
| | - M Yadav
- Department of Molecular and Cellular Medicine, Institute of Liver and Biliary Sciences, New Delhi, India
| | - R Sharma
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - M Bajpai
- Department of Transfusion Medicine, Institute of Liver and Biliary Sciences, New Delhi, India
| | - S Gupta
- Department of Dermatology and Venereology, All India Institute of Medical Sciences, New Delhi, India
| |
Collapse
|
4
|
Roy G, Prifti E, Belda E, Zucker JD. Deep learning methods in metagenomics: a review. Microb Genom 2024; 10:001231. [PMID: 38630611 PMCID: PMC11092122 DOI: 10.1099/mgen.0.001231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/27/2024] [Indexed: 04/19/2024] Open
Abstract
The ever-decreasing cost of sequencing and the growing potential applications of metagenomics have led to an unprecedented surge in data generation. One of the most prevalent applications of metagenomics is the study of microbial environments, such as the human gut. The gut microbiome plays a crucial role in human health, providing vital information for patient diagnosis and prognosis. However, analysing metagenomic data remains challenging due to several factors, including reference catalogues, sparsity and compositionality. Deep learning (DL) enables novel and promising approaches that complement state-of-the-art microbiome pipelines. DL-based methods can address almost all aspects of microbiome analysis, including novel pathogen detection, sequence classification, patient stratification and disease prediction. Beyond generating predictive models, a key aspect of these methods is also their interpretability. This article reviews DL approaches in metagenomics, including convolutional networks, autoencoders and attention-based models. These methods aggregate contextualized data and pave the way for improved patient care and a better understanding of the microbiome's key role in our health.
Collapse
Affiliation(s)
- Gaspar Roy
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
| | - Edi Prifti
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| | - Eugeni Belda
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| | - Jean-Daniel Zucker
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| |
Collapse
|
5
|
Computational Resources for Molecular Biology 2022. J Mol Biol 2022; 434:167625. [PMID: 35569508 DOI: 10.1016/j.jmb.2022.167625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|