1
|
Pais N, Ravishanker N, Rajasekaran S, Weinstock G, Tran DB. Randomized feature selection based semi-supervised latent Dirichlet allocation for microbiome analysis. Sci Rep 2024; 14:8855. [PMID: 38632488 PMCID: PMC11024186 DOI: 10.1038/s41598-024-59682-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Accepted: 04/13/2024] [Indexed: 04/19/2024] Open
Abstract
Health and disease are fundamentally influenced by microbial communities and their genes (the microbiome). An in-depth analysis of microbiome structure that enables the classification of individuals based on their health can be crucial in enhancing diagnostics and treatment strategies to improve the overall well-being of an individual. In this paper, we present a novel semi-supervised methodology known as Randomized Feature Selection based Latent Dirichlet Allocation (RFSLDA) to study the impact of the gut microbiome on a subject's health status. Since the data in our study consists of fuzzy health labels, which are self-reported, traditional supervised learning approaches may not be suitable. As a first step, based on the similarity between documents in text analysis and gut-microbiome data, we employ Latent Dirichlet Allocation (LDA), a topic modeling approach which uses microbiome counts as features to group subjects into relatively homogeneous clusters, without invoking any knowledge of observed health status (labels) of subjects. We then leverage information from the observed health status of subjects to associate these clusters with the most similar health status making it a semi-supervised approach. Finally, a feature selection technique is incorporated into the model to improve the overall classification performance. The proposed method provides a semi-supervised topic modelling approach that can help handle the high dimensionality of the microbiome data in association studies. Our experiments reveal that our semi-supervised classification algorithm is effective and efficient in terms of high classification accuracy compared to popular supervised learning approaches like SVM and multinomial logistic model. The RFSLDA framework is attractive because it (i) enhances clustering accuracy by identifying key bacteria types as indicators of health status, (ii) identifies key bacteria types within each group based on estimates of the proportion of bacteria types within the groups, and (iii) computes a measure of within-group similarity to identify highly similar subjects in terms of their health status.
Collapse
Affiliation(s)
- Namitha Pais
- Department of Statistics, University of Connecticut, Storrs, CT, USA.
| | | | | | | | - Dong-Binh Tran
- Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| |
Collapse
|
2
|
Kumar B, Lorusso E, Fosso B, Pesole G. A comprehensive overview of microbiome data in the light of machine learning applications: categorization, accessibility, and future directions. Front Microbiol 2024; 15:1343572. [PMID: 38419630 PMCID: PMC10900530 DOI: 10.3389/fmicb.2024.1343572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 01/29/2024] [Indexed: 03/02/2024] Open
Abstract
Metagenomics, Metabolomics, and Metaproteomics have significantly advanced our knowledge of microbial communities by providing culture-independent insights into their composition and functional potential. However, a critical challenge in this field is the lack of standard and comprehensive metadata associated with raw data, hindering the ability to perform robust data stratifications and consider confounding factors. In this comprehensive review, we categorize publicly available microbiome data into five types: shotgun sequencing, amplicon sequencing, metatranscriptomic, metabolomic, and metaproteomic data. We explore the importance of metadata for data reuse and address the challenges in collecting standardized metadata. We also, assess the limitations in metadata collection of existing public repositories collecting metagenomic data. This review emphasizes the vital role of metadata in interpreting and comparing datasets and highlights the need for standardized metadata protocols to fully leverage metagenomic data's potential. Furthermore, we explore future directions of implementation of Machine Learning (ML) in metadata retrieval, offering promising avenues for a deeper understanding of microbial communities and their ecological roles. Leveraging these tools will enhance our insights into microbial functional capabilities and ecological dynamics in diverse ecosystems. Finally, we emphasize the crucial metadata role in ML models development.
Collapse
Affiliation(s)
- Bablu Kumar
- Università degli Studi di Milano, Milan, Italy
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
| | - Erika Lorusso
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
- National Research Council, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Bari, Italy
| | - Bruno Fosso
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
| | - Graziano Pesole
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
- National Research Council, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Bari, Italy
| |
Collapse
|
3
|
Wu Z, Guo Y, Hayakawa M, Yang W, Lu Y, Ma J, Li L, Li C, Liu Y, Niu J. Artificial intelligence-driven microbiome data analysis for estimation of postmortem interval and crime location. Front Microbiol 2024; 15:1334703. [PMID: 38314433 PMCID: PMC10834752 DOI: 10.3389/fmicb.2024.1334703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 01/08/2024] [Indexed: 02/06/2024] Open
Abstract
Microbial communities, demonstrating dynamic changes in cadavers and the surroundings, provide invaluable insights for forensic investigations. Conventional methodologies for microbiome sequencing data analysis face obstacles due to subjectivity and inefficiency. Artificial Intelligence (AI) presents an efficient and accurate tool, with the ability to autonomously process and analyze high-throughput data, and assimilate multi-omics data, encompassing metagenomics, transcriptomics, and proteomics. This facilitates accurate and efficient estimation of the postmortem interval (PMI), detection of crime location, and elucidation of microbial functionalities. This review presents an overview of microorganisms from cadavers and crime scenes, emphasizes the importance of microbiome, and summarizes the application of AI in high-throughput microbiome data processing in forensic microbiology.
Collapse
Affiliation(s)
- Ze Wu
- Department of Dermatology, General Hospital of Northern Theater Command, Shenyang, China
| | - Yaoxing Guo
- Department of Dermatology, The First Hospital of China Medical University, Shenyang, China
- Key Laboratory of Immunodermatology, Ministry of Education and NHC, Shenyang, China
- National Joint Engineering Research Center for Theranostics of Immunological Skin Diseases, Shenyang, China
| | - Miren Hayakawa
- Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| | - Wei Yang
- Department of Dermatology, General Hospital of Northern Theater Command, Shenyang, China
| | - Yansong Lu
- Department of Dermatology, General Hospital of Northern Theater Command, Shenyang, China
| | - Jingyi Ma
- Department of Dermatology, General Hospital of Northern Theater Command, Shenyang, China
| | - Linghui Li
- Department of Dermatology, General Hospital of Northern Theater Command, Shenyang, China
| | - Chuntao Li
- Department of Dermatology, General Hospital of Northern Theater Command, Shenyang, China
| | - Yingchun Liu
- Department of Dermatology, General Hospital of Northern Theater Command, Shenyang, China
| | - Jun Niu
- Department of Dermatology, General Hospital of Northern Theater Command, Shenyang, China
| |
Collapse
|