1
|
Karwowska Z, Aasmets O, Kosciolek T, Org E. Effects of data transformation and model selection on feature importance in microbiome classification data. MICROBIOME 2025; 13:2. [PMID: 39754220 PMCID: PMC11699698 DOI: 10.1186/s40168-024-01996-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Accepted: 12/04/2024] [Indexed: 01/06/2025]
Abstract
BACKGROUND Accurate classification of host phenotypes from microbiome data is crucial for advancing microbiome-based therapies, with machine learning offering effective solutions. However, the complexity of the gut microbiome, data sparsity, compositionality, and population-specificity present significant challenges. Microbiome data transformations can alleviate some of the aforementioned challenges, but their usage in machine learning tasks has largely been unexplored. RESULTS Our analysis of over 8500 samples from 24 shotgun metagenomic datasets showed that it is possible to classify healthy and diseased individuals using microbiome data with minimal dependence on the choice of algorithm or transformation. Presence-absence transformations performed comparably to abundance-based transformations, and only a small subset of predictors is necessary for accurate classification. However, while different transformations resulted in comparable classification performance, the most important features varied significantly, which highlights the need to reevaluate machine learning-based biomarker detection. CONCLUSIONS Microbiome data transformations can significantly influence feature selection but have a limited effect on classification accuracy. Our findings suggest that while classification is robust across different transformations, the variation in feature selection necessitates caution when using machine learning for biomarker identification. This research provides valuable insights for applying machine learning to microbiome data and identifies important directions for future work.
Collapse
Affiliation(s)
- Zuzanna Karwowska
- Małopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
- Doctoral School of Exact and Natural Sciences, Jagiellonian University, Krakow, Poland
- Sano Centre for Computational Medicine, Krakow, Poland
| | - Oliver Aasmets
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Tomasz Kosciolek
- Małopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland.
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland.
- Sano Centre for Computational Medicine, Krakow, Poland.
| | - Elin Org
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia.
| |
Collapse
|
2
|
Tsoumtsa Meda L, Lagarde J, Guillier L, Roussel S, Douarre PE. Using GWAS and Machine Learning to Identify and Predict Genetic Variants Associated with Foodborne Bacteria Phenotypic Traits. Methods Mol Biol 2025; 2852:223-253. [PMID: 39235748 DOI: 10.1007/978-1-0716-4100-2_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2024]
Abstract
One of the main challenges in food microbiology is to prevent the risk of outbreaks by avoiding the distribution of food contaminated by bacteria. This requires constant monitoring of the circulating strains throughout the food production chain. Bacterial genomes contain signatures of natural evolution and adaptive markers that can be exploited to better understand the behavior of pathogen in the food industry. The monitoring of foodborne strains can therefore be facilitated by the use of these genomic markers capable of rapidly providing essential information on isolated strains, such as the source of contamination, risk of illness, potential for biofilm formation, and tolerance or resistance to biocides. The increasing availability of large genome datasets is enhancing the understanding of the genetic basis of complex traits such as host adaptation, virulence, and persistence. Genome-wide association studies have shown very promising results in the discovery of genomic markers that can be integrated into rapid detection tools. In addition, machine learning has successfully predicted phenotypes and classified important traits. Genome-wide association and machine learning tools have therefore the potential to support decision-making circuits intending at reducing the burden of foodborne diseases. The aim of this chapter review is to provide knowledge on the use of these two methods in food microbiology and to recommend their use in the field.
Collapse
Affiliation(s)
- Landry Tsoumtsa Meda
- ACTALIA, La Roche-sur-Foron, France
- ANSES, Salmonella and Listeria Unit (USEL), University of Paris-Est, Maisons-Alfort Laboratory for Food Safety, Maisons-Alfort, France
| | - Jean Lagarde
- ANSES, Salmonella and Listeria Unit (USEL), University of Paris-Est, Maisons-Alfort Laboratory for Food Safety, Maisons-Alfort, France
- INRAE, Unit of Process Optimisation in Food, Agriculture and the Environment (UR OPAALE), Rennes, France
| | | | - Sophie Roussel
- ANSES, Salmonella and Listeria Unit (USEL), University of Paris-Est, Maisons-Alfort Laboratory for Food Safety, Maisons-Alfort, France
| | - Pierre-Emmanuel Douarre
- ANSES, Salmonella and Listeria Unit (USEL), University of Paris-Est, Maisons-Alfort Laboratory for Food Safety, Maisons-Alfort, France.
| |
Collapse
|
3
|
Patel RA, Panche AN, Harke SN. Gut microbiome-gut brain axis-depression: interconnection. World J Biol Psychiatry 2025; 26:1-36. [PMID: 39713871 DOI: 10.1080/15622975.2024.2436854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Revised: 11/26/2024] [Accepted: 11/28/2024] [Indexed: 12/24/2024]
Abstract
OBJECTIVES The relationship between the gut microbiome and mental health, particularly depression, has gained significant attention. This review explores the connection between microbial metabolites, dysbiosis, and depression. The gut microbiome, comprising diverse microorganisms, maintains physiological balance and influences health through the gut-brain axis, a communication pathway between the gut and the central nervous system. METHODS Dysbiosis, an imbalance in the gut microbiome, disrupts this axis and worsens depressive symptoms. Factors like diet, antibiotics, and lifestyle can cause this imbalance, leading to changes in microbial composition, metabolism, and immune responses. This imbalance can induce inflammation, disrupt neurotransmitter regulation, and affect hormonal and epigenetic processes, all linked to depression. RESULTS Microbial metabolites, such as short-chain fatty acids and neurotransmitters, are key to gut-brain communication, influencing immune regulation and mood. The altered production of these metabolites is associated with depression. While progress has been made in understanding the gut-brain axis, more research is needed to clarify causative relationships and develop new treatments. The emerging field of psychobiotics and microbiome-targeted therapies shows promise for innovative depression treatments by harnessing the gut microbiome's potential. CONCLUSIONS Epigenetic mechanisms, including DNA methylation and histone modifications, are crucial in how the gut microbiota impacts mental health. Understanding these mechanisms offers new prospects for preventing and treating depression through the gut-brain axis.
Collapse
Affiliation(s)
- Ruhina Afroz Patel
- Institute of Biosciences and Technology, MGM University, Aurangabad, India
| | - Archana N Panche
- Institute of Biosciences and Technology, MGM University, Aurangabad, India
| | - Sanjay N Harke
- Institute of Biosciences and Technology, MGM University, Aurangabad, India
| |
Collapse
|
4
|
Mikles B, Schmidt CJ, Benbow ME, Jordan HR, Pechal JL. Potential postmortem microbial biomarkers of infant and younger children death investigation. J Forensic Sci 2024. [PMID: 39682072 DOI: 10.1111/1556-4029.15677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 08/08/2024] [Accepted: 11/13/2024] [Indexed: 12/18/2024]
Abstract
Microbial communities associated with the human body are highly dynamic and reflect the host environment and lifestyle over time. Studies show death is no exception, with data demonstrating similar antemortem and postmortem microbiomes up to 48 h following death. These predictable microbial biomarkers can inform death investigation by helping to estimate the postmortem interval and build models to identify cause and manner of death. However, no attempts have been made to model potential microbial biomarkers in pediatric (≤2 years) deaths. This study provided a cross-sectional survey of the microbiota of 53 pediatric cases (black, white, both sexes) seen in Wayne County, Michigan. Autopsy cases represented accidents, homicides, or natural causes. Postmortem microbiome were collected by swabbing the eyes, ears, nose, mouth, umbilicus, brain, rectum, trabecular space, and cardiac blood. 16S rRNA sequence analyses indicated that sex, race, age, body site, and manner of death (MOD) had significant effects on microbiome composition, with significant interactions among MOD, race, and age. Amplicon sequence variants identified intra- and interhost dispersion of the postmortem microbiome depending on death circumstance. Among manners of death, non-accidental deaths were significantly distinct from all other deaths, and among body sites the rectum was distinct in its microbial composition. There is a real need for robust postmortem microbiome before it can be standardized as a practical tool for use in forensic investigation or public health. These results inform postmortem microbial variability during pediatric death investigation that contributes to a larger effort to understand the postmortem microbiome.
Collapse
Affiliation(s)
- Bethany Mikles
- Department of Entomology, Michigan State University, East Lansing, Michigan, USA
| | - Carl J Schmidt
- Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA
| | - M Eric Benbow
- Department of Entomology, Michigan State University, East Lansing, Michigan, USA
- Department of Osteopathic Medical Specialties, Michigan State University, East Lansing, Michigan, USA
- AgBioResearch, Michigan State University, East Lansing, Michigan, USA
- Ecology, Evolution and Behavior Program, Michigan State University, East Lansing, Michigan, USA
| | - Heather R Jordan
- Department of Biological Sciences, Mississippi State University, Starkville, Mississippi, USA
| | - Jennifer L Pechal
- Department of Entomology, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
5
|
Monshizadeh M, Hong Y, Ye Y. Multitask knowledge-primed neural network for predicting missing metadata and host phenotype based on human microbiome. BIOINFORMATICS ADVANCES 2024; 5:vbae203. [PMID: 39735577 PMCID: PMC11676323 DOI: 10.1093/bioadv/vbae203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Revised: 11/27/2024] [Accepted: 12/11/2024] [Indexed: 12/31/2024]
Abstract
Motivation Microbial signatures in the human microbiome are closely associated with various human diseases, driving the development of machine learning models for microbiome-based disease prediction. Despite progress, challenges remain in enhancing prediction accuracy, generalizability, and interpretability. Confounding factors, such as host's gender, age, and body mass index, significantly influence the human microbiome, complicating microbiome-based predictions. Results To address these challenges, we developed MicroKPNN-MT, a unified model for predicting human phenotype based on microbiome data, as well as additional metadata like age and gender. This model builds on our earlier MicroKPNN framework, which incorporates prior knowledge of microbial species into neural networks to enhance prediction accuracy and interpretability. In MicroKPNN-MT, metadata, when available, serves as additional input features for prediction. Otherwise, the model predicts metadata from microbiome data using additional decoders. We applied MicroKPNN-MT to microbiome data collected in mBodyMap, covering healthy individuals and 25 different diseases, and demonstrated its potential as a predictive tool for multiple diseases, which at the same time provided predictions for the missing metadata. Our results showed that incorporating real or predicted metadata helped improve the accuracy of disease predictions, and more importantly, helped improve the generalizability of the predictive models. Availability and implementation https://github.com/mgtools/MicroKPNN-MT.
Collapse
Affiliation(s)
- Mahsa Monshizadeh
- Computer Science Department, Indiana University, Bloomington, IN 47408, United States
| | - Yuhui Hong
- Computer Science Department, Indiana University, Bloomington, IN 47408, United States
| | - Yuzhen Ye
- Computer Science Department, Indiana University, Bloomington, IN 47408, United States
| |
Collapse
|
6
|
Hosseiniyan Khatibi SM, Dimaano NG, Veliz E, Sundaresan V, Ali J. Exploring and exploiting the rice phytobiome to tackle climate change challenges. PLANT COMMUNICATIONS 2024; 5:101078. [PMID: 39233440 PMCID: PMC11671768 DOI: 10.1016/j.xplc.2024.101078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 08/07/2024] [Accepted: 09/02/2024] [Indexed: 09/06/2024]
Abstract
The future of agriculture is uncertain under the current climate change scenario. Climate change directly and indirectly affects the biotic and abiotic elements that control agroecosystems, jeopardizing the safety of the world's food supply. A new area that focuses on characterizing the phytobiome is emerging. The phytobiome comprises plants and their immediate surroundings, involving numerous interdependent microscopic and macroscopic organisms that affect the health and productivity of plants. Phytobiome studies primarily focus on the microbial communities associated with plants, which are referred to as the plant microbiome. The development of high-throughput sequencing technologies over the past 10 years has dramatically advanced our understanding of the structure, functionality, and dynamics of the phytobiome; however, comprehensive methods for using this knowledge are lacking, particularly for major crops such as rice. Considering the impact of rice production on world food security, gaining fresh perspectives on the interdependent and interrelated components of the rice phytobiome could enhance rice production and crop health, sustain rice ecosystem function, and combat the effects of climate change. Our review re-conceptualizes the complex dynamics of the microscopic and macroscopic components in the rice phytobiome as influenced by human interventions and changing environmental conditions driven by climate change. We also discuss interdisciplinary and systematic approaches to decipher and reprogram the sophisticated interactions in the rice phytobiome using novel strategies and cutting-edge technology. Merging the gigantic datasets and complex information on the rice phytobiome and their application in the context of regenerative agriculture could lead to sustainable rice farming practices that are resilient to the impacts of climate change.
Collapse
Affiliation(s)
| | - Niña Gracel Dimaano
- International Rice Research Institute, Los Baños, Laguna, Philippines; College of Agriculture and Food Science, University of the Philippines Los Baños, Los Baños, Laguna, Philippines
| | - Esteban Veliz
- College of Biological Sciences, University of California, Davis, Davis, CA, USA
| | - Venkatesan Sundaresan
- College of Biological Sciences, University of California, Davis, Davis, CA, USA; College of Agricultural and Environmental Sciences, University of California, Davis, Davis, CA, USA
| | - Jauhar Ali
- International Rice Research Institute, Los Baños, Laguna, Philippines.
| |
Collapse
|
7
|
Chang CC, Liu TC, Lu CJ, Chiu HC, Lin WN. Explainable machine learning model for identifying key gut microbes and metabolites biomarkers associated with myasthenia gravis. Comput Struct Biotechnol J 2024; 23:1572-1583. [PMID: 38650589 PMCID: PMC11035017 DOI: 10.1016/j.csbj.2024.04.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 03/14/2024] [Accepted: 04/07/2024] [Indexed: 04/25/2024] Open
Abstract
Diagnostic markers for myasthenia gravis (MG) are limited; thus, innovative approaches are required for supportive diagnosis and personalized care. Gut microbes are associated with MG pathogenesis; however, few studies have adopted machine learning (ML) to identify the associations among MG, gut microbiota, and metabolites. In this study, we developed an explainable ML model to predict biomarkers for MG diagnosis. We enrolled 19 MG patients and 10 non-MG individuals. Stool samples were collected and microbiome assessment was performed using 16S rRNA sequencing. Untargeted metabolic profiling was conducted to identify fecal amplicon significant variants (ASVs) and metabolites. We developed an explainable ML model in which the top ASVs and metabolites are combined to identify the best predictive performance. This model uses the SHapley Additive exPlanations method to generate both global and personalized explanations. Fecal microbe-metabolite composition differed significantly between groups. The key bacterial families were Lachnospiraceae and Ruminococcaceae, and the top three features were Lachnospiraceae, inosine, and methylhistidine. An ML model trained with the top 1 % ASVs and top 15 % metabolites combined outperformed all other models. Personalized explanations revealed different patterns of microbe-metabolite contributions in patients with MG. The integration of the microbiota-metabolite features and the development of an explainable ML framework can accurately identify MG and provide personalized explanations, revealing the associations between gut microbiota, metabolites, and MG. An online calculator employing this algorithm was developed that provides a streamlined interface for MG diagnosis screening and conducting personalized evaluations.
Collapse
Affiliation(s)
- Che-Cheng Chang
- PhD Program in Nutrition and Food Science, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Neurology, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City, Taiwan
- Graduate Institute of Biomedical and Pharmaceutical Science, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Tzu-Chi Liu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Hou-Chang Chiu
- School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Neurology, Taipei Medical University, Shuang-Ho Hospital, New Taipei City, Taiwan
| | - Wei-Ning Lin
- PhD Program in Nutrition and Food Science, Fu Jen Catholic University, New Taipei City, Taiwan
- Graduate Institute of Biomedical and Pharmaceutical Science, Fu Jen Catholic University, New Taipei City, Taiwan
| |
Collapse
|
8
|
Oh VKS, Li RW. Wise Roles and Future Visionary Endeavors of Current Emperor: Advancing Dynamic Methods for Longitudinal Microbiome Meta-Omics Data in Personalized and Precision Medicine. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2400458. [PMID: 39535493 DOI: 10.1002/advs.202400458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 09/16/2024] [Indexed: 11/16/2024]
Abstract
Understanding the etiological complexity of diseases requires identifying biomarkers longitudinally associated with specific phenotypes. Advanced sequencing tools generate dynamic microbiome data, providing insights into microbial community functions and their impact on health. This review aims to explore the current roles and future visionary endeavors of dynamic methods for integrating longitudinal microbiome multi-omics data in personalized and precision medicine. This work seeks to synthesize existing research, propose best practices, and highlight innovative techniques. The development and application of advanced dynamic methods, including the unified analytical frameworks and deep learning tools in artificial intelligence, are critically examined. Aggregating data on microbes, metabolites, genes, and other entities offers profound insights into the interactions among microorganisms, host physiology, and external stimuli. Despite progress, the absence of gold standards for validating analytical protocols and data resources of various longitudinal multi-omics studies remains a significant challenge. The interdependence of workflow steps critically affects overall outcomes. This work provides a comprehensive roadmap for best practices, addressing current challenges with advanced dynamic methods. The review underscores the biological effects of clinical, experimental, and analytical protocol settings on outcomes. Establishing consensus on dynamic microbiome inter-studies and advancing reliable analytical protocols are pivotal for the future of personalized and precision medicine.
Collapse
Affiliation(s)
- Vera-Khlara S Oh
- Big Biomedical Data Integration and Statistical Analysis (DIANA) Research Center, Department of Data Science, College of Natural Sciences, Jeju National University, Jeju City, Jeju Do, 63243, South Korea
| | - Robert W Li
- United States Department of Agriculture, Agricultural Research Service, Animal Genomics and Improvement Laboratory, Beltsville, MD, 20705, USA
| |
Collapse
|
9
|
Tian H, Tang R. Prediction of Crohn's disease based on deep feature recognition. Comput Biol Chem 2024; 113:108231. [PMID: 39362115 DOI: 10.1016/j.compbiolchem.2024.108231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 09/21/2024] [Accepted: 09/28/2024] [Indexed: 10/05/2024]
Abstract
BACKGROUND Crohn's disease is a complex genetic disease that involves chronic gastrointestinal inflammation and results from a complex set of genetic, environmental, and immunological factors. By analyzing data from the human microbiome, genetic information can be used to predict Crohn's disease. Recent advances in deep learning have demonstrated its effectiveness in feature extraction and the use of deep learning to decode genetic information for disease prediction. METHODS In this paper, we present a deep learning-based model that utilizes a sequential convolutional attention network (SCAN) for feature extraction, incorporates adaptive additive interval losses to enhance these features, and employs support vector machines (SVM) for classification. To address the challenge of unbalanced Crohn's disease samples, we propose a random noise one-hot encoding data augmentation method. RESULTS Data augmentation with random noise accelerates training convergence, while SCAN-SVM effectively extracts features with adaptive additive interval loss enhancing differentiation. Our approach outperforms benchmark methods, achieving an average accuracy of 0.80 and a kappa value of 0.76, and we validate the effectiveness of feature enhancement. CONCLUSIONS In summary, we use deep feature recognition to effectively analyze the potential information in genes, which has a good application potential for gene analysis and prediction of Crohn's disease.
Collapse
Affiliation(s)
- Hui Tian
- Anhui University of Chinese Medicine, Hefei 230038, China.
| | - Ran Tang
- The First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei 230031, China.
| |
Collapse
|
10
|
Han H, Choi YH, Kim SY, Park JH, Chung J, Na HS. Optimizing microbiome reference databases with PacBio full-length 16S rRNA sequencing for enhanced taxonomic classification and biomarker discovery. Front Microbiol 2024; 15:1485073. [PMID: 39654676 PMCID: PMC11625778 DOI: 10.3389/fmicb.2024.1485073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Accepted: 10/28/2024] [Indexed: 12/12/2024] Open
Abstract
Background The study of the human microbiome is crucial for understanding disease mechanisms, identifying biomarkers, and guiding preventive measures. Advances in sequencing platforms, particularly 16S rRNA sequencing, have revolutionized microbiome research. Despite the benefits, large microbiome reference databases (DBs) pose challenges, including computational demands and potential inaccuracies. This study aimed to determine if full-length 16S rRNA sequencing data produced by PacBio could be used to optimize reference DBs and be applied to Illumina V3-V4 targeted sequencing data for microbial study. Methods Oral and gut microbiome data (PRJNA1049979) were retrieved from NCBI. DADA2 was applied to full-length 16S rRNA PacBio data to obtain amplicon sequencing variants (ASVs). The RDP reference DB was used to assign the ASVs, which were then used as a reference DB to train the classifier. QIIME2 was used for V3-V4 targeted Illumina data analysis. BLAST was used to analyze alignment statistics. Linear discriminant analysis Effect Size (LEfSe) was employed for discriminant analysis. Results ASVs produced by PacBio showed coverage of the oral microbiome similar to the Human Oral Microbiome Database. A phylogenetic tree was trimmed at various thresholds to obtain an optimized reference DB. This established method was then applied to gut microbiome data, and the optimized gut microbiome reference DB provided improved taxa classification and biomarker discovery efficiency. Conclusion Full-length 16S rRNA sequencing data produced by PacBio can be used to construct a microbiome reference DB. Utilizing an optimized reference DB can increase the accuracy of microbiome classification and enhance biomarker discovery.
Collapse
Affiliation(s)
- Hyejung Han
- Department of Oral Microbiology, School of Dentistry, Pusan National University, Yangsan, Republic of Korea
| | - Yoon Hee Choi
- Department of Internal Medicine, Dongnam Institute of Radiological and Medical Sciences, Busan, Republic of Korea
| | - Si Yeong Kim
- Department of Oral Microbiology, School of Dentistry, Pusan National University, Yangsan, Republic of Korea
| | - Jung Hwa Park
- Department of Oral Microbiology, School of Dentistry, Pusan National University, Yangsan, Republic of Korea
| | - Jin Chung
- Department of Oral Microbiology, School of Dentistry, Pusan National University, Yangsan, Republic of Korea
| | - Hee Sam Na
- Department of Oral Microbiology, School of Dentistry, Pusan National University, Yangsan, Republic of Korea
| |
Collapse
|
11
|
Neiroukh D, Hajdarpasic A, Ayhan C, Sultan S, Soliman O. Gut Microbial Taxonomy and Its Role as a Biomarker in Aortic Diseases: A Systematic Review and Future Perspectives. J Clin Med 2024; 13:6938. [PMID: 39598083 PMCID: PMC11594723 DOI: 10.3390/jcm13226938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 10/31/2024] [Accepted: 11/11/2024] [Indexed: 11/29/2024] Open
Abstract
Background/Objectives: Evidence of the association between the gut microbiome and cardiovascular diseases has accumulated. An imbalance or dysbiosis of this system has been shown to play a role in the pathogenesis of cardiovascular events, including aortic diseases. We aimed to elucidate the findings of the gut microbial taxonomy associated with aortic diseases and their subtypes. Furthermore, we sought to investigate whether gut microbiome dysbiosis can be used as a biomarker for aortic disease detection and to identify which species can be disease-specific. Methods: A systematic search was conducted using the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines for original research papers on gut microbiome composition in patients with aortic disease, using patients without aortic disease as the control (i.e., healthy controls). The databases PubMed, Scopus, Cochrane, and Web of Science were used by employing the medical subject headings (MeSH) terms "aortic diseases", "microbiome"," microbiota", and "taxa" before August 2024. We extracted the study characteristics, study population, and gut microbiome in aortic disease, including microbiota taxa diversity and abundance, regardless of taxa level. The National Institutes of Health (NIH) Quality Assessment Tool was used to assess study quality. Data were synthesized narratively to address the heterogeneity of the studies. Results: In this review, twelve studies that have identified gut microbial species and their potential impact on aortic disease pathogenesis were included. The studies showed the phyla dominance of Bacillota, Pseudomonadota, Actinomycetota, Bacteroidota, and Euryarchaeota in aortic disease patients. We also included the taxa sequencing methods and those used to extract the microorganisms. Aortic diseases were categorized into Takayasu's arteritis, giant cell arteritis, aortic aneurysm, and aortic dissection. Aortic disease patients had a higher rate of dysbiosis when compared to the healthy control groups, with significantly different microbiome composition. Conclusions: Patients with aortic disease exhibit a distinct difference between their gut microbiota composition and that of the healthy controls, which suggests a potential biomarker role of gut dysbiosis. Further exploration of the microbiome and its metagenome interface can help identify its role in aortic disease pathogenesis in depth, generating future therapeutic options. However, a unified methodology is required to identify potential microbial biomarkers in cardiovascular and cardiometabolic diseases.
Collapse
Affiliation(s)
- Dina Neiroukh
- Discipline of Cardiology, School of Medicine, University of Galway, H91 TK33 Galway, Ireland; (D.N.); (C.A.)
- CORRIB-CURAM-Vascular Group, University of Galway, H91 TK33 Galway, Ireland;
| | - Aida Hajdarpasic
- Department of Medical Biology and Genetics, Sarajevo Medical School, University Sarajevo School of Science and Technology, 71000 Sarajevo, Bosnia and Herzegovina;
| | - Cagri Ayhan
- Discipline of Cardiology, School of Medicine, University of Galway, H91 TK33 Galway, Ireland; (D.N.); (C.A.)
- CORRIB-CURAM-Vascular Group, University of Galway, H91 TK33 Galway, Ireland;
| | - Sherif Sultan
- CORRIB-CURAM-Vascular Group, University of Galway, H91 TK33 Galway, Ireland;
- Western Vascular Institute, Department of Vascular and Endovascular Surgery, University Hospital Galway, University of Galway, H91 TK33 Galway, Ireland
- Department of Vascular Surgery and Endovascular Surgery, Galway Clinic, Royal College of Surgeons in Ireland, Galway Affiliated Hospital, H91 HHT0 Galway, Ireland
| | - Osama Soliman
- Discipline of Cardiology, School of Medicine, University of Galway, H91 TK33 Galway, Ireland; (D.N.); (C.A.)
- CORRIB-CURAM-Vascular Group, University of Galway, H91 TK33 Galway, Ireland;
- Euro Heart Foundation, 3071 Rotterdam, The Netherlands
| |
Collapse
|
12
|
Bašić-Čičak D, Hasić Telalović J, Pašić L. Utilizing Artificial Intelligence for Microbiome Decision-Making: Autism Spectrum Disorder in Children from Bosnia and Herzegovina. Diagnostics (Basel) 2024; 14:2536. [PMID: 39594202 PMCID: PMC11592508 DOI: 10.3390/diagnostics14222536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 10/17/2024] [Accepted: 10/21/2024] [Indexed: 11/28/2024] Open
Abstract
BACKGROUND/OBJECTIVES The study of microbiome composition shows positive indications for application in the diagnosis and treatment of many conditions and diseases. One such condition is autism spectrum disorder (ASD). We aimed to analyze gut microbiome samples from children in Bosnia and Herzegovina to identify microbial differences between neurotypical children and those with ASD. Additionally, we developed machine learning classifiers to differentiate between the two groups using microbial abundance and predicted functional pathways. METHODS A total of 60 gut microbiome samples (16S rRNA sequences) were analyzed, with 44 from children with ASD and 16 from neurotypical children. Four machine learning algorithms (Random Forest, Support Vector Classification, Gradient Boosting, and Extremely Randomized Tree Classifier) were applied to create eight classification models based on bacterial abundance at the genus level and KEGG pathways. Model accuracy was evaluated, and an external dataset was introduced to test model generalizability. RESULTS The highest classification accuracy (80%) was achieved with Random Forest and Extremely Randomized Tree Classifier using genus-level taxa. The Random Forest model also performed well (78%) with KEGG pathways. When tested on an independent dataset, the model maintained high accuracy (79%), confirming its generalizability. CONCLUSIONS This study identified significant microbial differences between neurotypical children and children with ASD. Machine learning classifiers, particularly Random Forest and Extremely Randomized Tree Classifier, achieved strong accuracy. Validation with external data demonstrated that the models could generalize across different datasets, highlighting their potential use.
Collapse
Affiliation(s)
- Džana Bašić-Čičak
- Computer Science Department, University Sarajevo School of Science and Technology, Hrasnička cesta 3a, 71000 Sarajevo, Bosnia and Herzegovina;
| | - Jasminka Hasić Telalović
- Computer Science Department, University Sarajevo School of Science and Technology, Hrasnička cesta 3a, 71000 Sarajevo, Bosnia and Herzegovina;
| | - Lejla Pašić
- Sarajevo Medical School, University Sarajevo School of Science and Technology, Hrasnička cesta 3a, 71000 Sarajevo, Bosnia and Herzegovina;
| |
Collapse
|
13
|
Bakir-Gungor B, Temiz M, Inal Y, Cicekyurt E, Yousef M. CCPred: Global and population-specific colorectal cancer prediction and metagenomic biomarker identification at different molecular levels using machine learning techniques. Comput Biol Med 2024; 182:109098. [PMID: 39293338 DOI: 10.1016/j.compbiomed.2024.109098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 08/29/2024] [Accepted: 08/31/2024] [Indexed: 09/20/2024]
Abstract
Colorectal cancer (CRC) ranks as the third most common cancer globally and the second leading cause of cancer-related deaths. Recent research highlights the pivotal role of the gut microbiota in CRC development and progression. Understanding the complex interplay between disease development and metagenomic data is essential for CRC diagnosis and treatment. Current computational models employ machine learning to identify metagenomic biomarkers associated with CRC, yet there is a need to improve their accuracy through a holistic biological knowledge perspective. This study aims to evaluate CRC-associated metagenomic data at species, enzymes, and pathway levels via conducting global and population-specific analyses. These analyses utilize relative abundance values from human gut microbiome sequencing data and robust classification models are built for disease prediction and biomarker identification. For global CRC prediction and biomarker identification, the features that are identified by SelectKBest (SKB), Information Gain (IG), and Extreme Gradient Boosting (XGBoost) methods are combined. Population-based analysis includes within-population, leave-one-dataset-out (LODO) and cross-population approaches. Four classification algorithms are employed for CRC classification. Random Forest achieved an AUC of 0.83 for species data, 0.78 for enzyme data and 0.76 for pathway data globally. On the global scale, potential taxonomic biomarkers include ruthenibacterium lactatiformanas; enzyme biomarkers include RNA 2' 3' cyclic 3' phosphodiesterase; and pathway biomarkers include pyruvate fermentation to acetone pathway. This study underscores the potential of machine learning models trained on metagenomic data for improved disease prediction and biomarker discovery. The proposed model and associated files are available at https://github.com/TemizMus/CCPRED.
Collapse
Affiliation(s)
- Burcu Bakir-Gungor
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, 38080, Turkey
| | - Mustafa Temiz
- Department of Electrical and Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, 38080, Turkey.
| | - Yasin Inal
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, 38080, Turkey
| | - Emre Cicekyurt
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, 38080, Turkey
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, 13206, Israel; Galilee Digital Health Research Center (GDH), Zefat Academic College, Israel
| |
Collapse
|
14
|
Akhmedov M, Espinoza JL. Addressing the surge of infections by multidrug-resistant Enterobacterales in hematopoietic cell transplantation. Blood Rev 2024; 68:101229. [PMID: 39217051 DOI: 10.1016/j.blre.2024.101229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 08/17/2024] [Accepted: 08/19/2024] [Indexed: 09/04/2024]
Abstract
Patients undergoing hematopoietic cell transplantation (HCT) have an increased risk of developing severe infections. In recent years, bloodstream infections caused by Gram-negative bacteria have been increasingly reported among HCT recipients, and many of these infections are caused by bacterial strains of the Enterobacterales order. Among these pathogens, particularly concerning are the multidrug-resistant Enterobacterales (MDRE), such as Extended Spectrum β-lactamase-producing Enterobacterales and Carbapenem-resistant Enterobacterales, since infections caused by these pathogens are difficult to treat due to the limited antimicrobial options and are associated with worse transplant outcomes. We summarized the evidence from studies published in PubMed and Scopus on the burden of MDRE infections in HCT recipients, and strategies for the management and prevention of these infections, including strict adherence to recommended infection control practices and multidisciplinary antimicrobial stewardship, the use of probiotics, and fecal microbiota transplantation, are also discussed.
Collapse
Affiliation(s)
- Mobil Akhmedov
- Department of High-dose Chemotherapy and Bone Marrow Transplantation, P. Hertsen Moscow Oncology Research Institute, Russia; Department of Oncology and Oncosurgery, Russian University of Medicine, Russia
| | | |
Collapse
|
15
|
Porreca A, Ibrahimi E, Maturo F, Marcos Zambrano LJ, Meto M, Lopes MB. Robust prediction of colorectal cancer via gut microbiome 16S rRNA sequencing data. J Med Microbiol 2024; 73. [PMID: 39377779 DOI: 10.1099/jmm.0.001903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/09/2024] Open
Abstract
Introduction. The study addresses the challenge of utilizing human gut microbiome data for the early detection of colorectal cancer (CRC). The research emphasizes the potential of using machine learning techniques to analyze complex microbiome datasets, providing a non-invasive approach to identifying CRC-related microbial markers.Hypothesis/Gap Statement. The primary hypothesis is that a robust machine learning-based analysis of 16S rRNA microbiome data can identify specific microbial features that serve as effective biomarkers for CRC detection, overcoming the limitations of classical statistical models in high-dimensional settings.Aim. The primary objective of this study is to explore and validate the potential of the human microbiome, specifically in the colon, as a valuable source of biomarkers for colorectal cancer (CRC) detection and progression. The focus is on developing a classifier that effectively predicts the presence of CRC and normal samples based on the analysis of three previously published faecal 16S rRNA sequencing datasets.Methodology. To achieve the aim, various machine learning techniques are employed, including random forest (RF), recursive feature elimination (RFE) and a robust correlation-based technique known as the fuzzy forest (FF). The study utilizes these methods to analyse the three datasets, comparing their performance in predicting CRC and normal samples. The emphasis is on identifying the most relevant microbial features (taxa) associated with CRC development via partial dependence plots, i.e. a machine learning tool focused on explainability, visualizing how a feature influences the predicted outcome.Results. The analysis of the three faecal 16S rRNA sequencing datasets reveals the consistent and superior predictive performance of the FF compared to the RF and RFE. Notably, FF proves effective in addressing the correlation problem when assessing the importance of microbial taxa in explaining the development of CRC. The results highlight the potential of the human microbiome as a non-invasive means to detect CRC and underscore the significance of employing FF for improved predictive accuracy.Conclusion. In conclusion, this study underscores the limitations of classical statistical techniques in handling high-dimensional information such as human microbiome data. The research demonstrates the potential of the human microbiome, specifically in the colon, as a valuable source of biomarkers for CRC detection. Applying machine learning techniques, particularly the FF, is a promising approach for building a classifier to predict CRC and normal samples. The findings advocate for integrating FF to overcome the challenges associated with correlation when identifying crucial microbial features linked to CRC development.
Collapse
Affiliation(s)
- Annamaria Porreca
- Department of Economics, Statistics and Business, Faculty of Economics and Law, Universitas Mercatorum, Rome, Italy
| | - Eliana Ibrahimi
- Department of Biology, University of Tirana, Tirana, Albania
| | - Fabrizio Maturo
- Department of Economics, Statistics and Business, Faculty of Technological and Innovation Sciences, Universitas Mercatorum, Rome, Italy
| | - Laura Judith Marcos Zambrano
- Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
| | - Melisa Meto
- Department of Biology, University of Tirana, Tirana, Albania
| | - Marta B Lopes
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
- UNIDEMI, Research and Development Unit for Mechanical and Industrial Engineering, NOVA School of Science and Technology, Caparica, Portugal
| |
Collapse
|
16
|
Obeagu EI, Okoroiwu GI, Ubosi NI, Obeagu GU, Onohuean H, Muhammad T, Adias TC. Revolution in malaria detection: unveiling current breakthroughs and tomorrow's possibilities in biomarker innovation. Ann Med Surg (Lond) 2024; 86:5859-5876. [PMID: 39359838 PMCID: PMC11444567 DOI: 10.1097/ms9.0000000000002383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 07/06/2024] [Indexed: 10/04/2024] Open
Abstract
The ongoing battle against malaria has seen significant advancements in diagnostic methodologies, particularly through the discovery and application of novel biomarkers. Traditional diagnostic techniques, such as microscopy and rapid diagnostic tests, have their limitations in terms of sensitivity, specificity, and the ability to detect low-level infections. Recent breakthroughs in biomarker research promise to overcome these challenges, providing more accurate, rapid, and non-invasive detection methods. These advancements are critical in enhancing early detection, guiding effective treatment, and ultimately reducing the global malaria burden. Innovative approaches in biomarker detection are leveraging cutting-edge technologies like next-generation sequencing, proteomics, and metabolomics. These techniques have led to the identification of new biomarkers that can be detected in blood, saliva, or urine, offering less invasive and more scalable options for widespread screening. For instance, the discovery of specific volatile organic compounds in the breath of infected individuals presents a revolutionary non-invasive diagnostic tool. Additionally, the integration of machine learning algorithms with biomarker data is enhancing the precision and predictive power of malaria diagnostics, making it possible to distinguish between different stages of infection and identify drug-resistant strains. Looking ahead, the future of malaria detection lies in the continued exploration of multi-biomarker panels and the development of portable, point-of-care diagnostic devices. The incorporation of smartphone-based technologies and wearable biosensors promises to bring real-time monitoring and remote diagnostics to even the most resource-limited settings.
Collapse
Affiliation(s)
| | - G. I.A. Okoroiwu
- Department of Public Health Science, Faculty of Health Sciences, National Open University of Nigeria, Jabi, Abuja
| | - N. I. Ubosi
- Department of Public Health Science, Faculty of Health Sciences, National Open University of Nigeria, Jabi, Abuja
| | | | - Hope Onohuean
- Biopharmaceutics Unit, Department of Pharmacology and Toxicology, School of Pharmacy, Kampala International University, Kampala
- Biomolecules, Metagenomics, Endocrine and Tropical Disease Research Group (BMETDREG), Kampala International University, Western Campus, Ishaka-Bushenyi, Uganda
| | - Tukur Muhammad
- Department of Science Education & Educational Foundations, Faculty of Education Kampala International University Western Campus
| | - Teddy C. Adias
- Department of Haematology and Blood Transfusion Science, Faculty of Medical Laboratory Science, Federal University Otuoke, Bayelsa State, Nigeria
| |
Collapse
|
17
|
Kostic T, Schloter M, Arruda P, Berg G, Charles TC, Cotter PD, Kiran GS, Lange L, Maguin E, Meisner A, van Overbeek L, Sanz Y, Sarand I, Selvin J, Tsakalidou E, Smidt H, Wagner M, Sessitsch A. Concepts and criteria defining emerging microbiome applications. Microb Biotechnol 2024; 17:e14550. [PMID: 39236296 PMCID: PMC11376781 DOI: 10.1111/1751-7915.14550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Accepted: 07/29/2024] [Indexed: 09/07/2024] Open
Abstract
In recent years, microbiomes and their potential applications for human, animal or plant health, food production and environmental management came into the spotlight of major national and international policies and strategies. This has been accompanied by substantial R&D investments in both public and private sectors, with an increasing number of products entering the market. Despite widespread agreement on the potential of microbiomes and their uses across disciplines, stakeholders and countries, there is no consensus on what defines a microbiome application. This often results in non-comprehensive communication or insufficient documentation making commercialisation and acceptance of the novel products challenging. To showcase the complexity of this issue we discuss two selected, well-established applications and propose criteria defining a microbiome application and their conditions of use for clear communication, facilitating suitable regulatory frameworks and building trust among stakeholders.
Collapse
Affiliation(s)
- Tanja Kostic
- AIT Austrian Institute of Technology GmbHViennaAustria
| | | | | | | | | | - Paul D. Cotter
- Teagasc Food Research Centre, MooreparkAPC Microbiome Ireland and VistaMilkCorkIreland
| | | | - Lene Lange
- LL‐BioEconomy, Research and AdvisoryCopenhagenDenmark
| | - Emmanuelle Maguin
- Université Paris‐Saclay, INRAE, AgroParisTech, MICALIS UMR1319Jouy‐en‐JosasFrance
| | - Annelein Meisner
- Wageningen University & Research, Wageningen ResearchWageningenThe Netherlands
| | - Leo van Overbeek
- Wageningen University & Research, Wageningen ResearchWageningenThe Netherlands
| | - Yolanda Sanz
- Institute of Agrochemistry and Food Technology – Spanish National Research Council (IATA‐CSIC)PaternaValenciaSpain
| | - Inga Sarand
- Tallinn University of TechnologyTallinnEstonia
| | | | | | - Hauke Smidt
- Laboratory of MicrobiologyWageningen University & ResearchWageningenThe Netherlands
| | - Martin Wagner
- FFoQSI GmbH – Austrian Competence Centre for Feed and Food Quality, Safety and InnovationTullnAustria
| | | |
Collapse
|
18
|
Kundu P, Beura S, Mondal S, Das AK, Ghosh A. Machine learning for the advancement of genome-scale metabolic modeling. Biotechnol Adv 2024; 74:108400. [PMID: 38944218 DOI: 10.1016/j.biotechadv.2024.108400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 05/13/2024] [Accepted: 06/23/2024] [Indexed: 07/01/2024]
Abstract
Constraint-based modeling (CBM) has evolved as the core systems biology tool to map the interrelations between genotype, phenotype, and external environment. The recent advancement of high-throughput experimental approaches and multi-omics strategies has generated a plethora of new and precise information from wide-ranging biological domains. On the other hand, the continuously growing field of machine learning (ML) and its specialized branch of deep learning (DL) provide essential computational architectures for decoding complex and heterogeneous biological data. In recent years, both multi-omics and ML have assisted in the escalation of CBM. Condition-specific omics data, such as transcriptomics and proteomics, helped contextualize the model prediction while analyzing a particular phenotypic signature. At the same time, the advanced ML tools have eased the model reconstruction and analysis to increase the accuracy and prediction power. However, the development of these multi-disciplinary methodological frameworks mainly occurs independently, which limits the concatenation of biological knowledge from different domains. Hence, we have reviewed the potential of integrating multi-disciplinary tools and strategies from various fields, such as synthetic biology, CBM, omics, and ML, to explore the biochemical phenomenon beyond the conventional biological dogma. How the integrative knowledge of these intersected domains has improved bioengineering and biomedical applications has also been highlighted. We categorically explained the conventional genome-scale metabolic model (GEM) reconstruction tools and their improvement strategies through ML paradigms. Further, the crucial role of ML and DL in omics data restructuring for GEM development has also been briefly discussed. Finally, the case-study-based assessment of the state-of-the-art method for improving biomedical and metabolic engineering strategies has been elaborated. Therefore, this review demonstrates how integrating experimental and in silico strategies can help map the ever-expanding knowledge of biological systems driven by condition-specific cellular information. This multiview approach will elevate the application of ML-based CBM in the biomedical and bioengineering fields for the betterment of society and the environment.
Collapse
Affiliation(s)
- Pritam Kundu
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Satyajit Beura
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Suman Mondal
- P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Amit Kumar Das
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Amit Ghosh
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India; P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India.
| |
Collapse
|
19
|
Khan MW, Fung DLX, Schroth RJ, Chelikani P, Hu P. A cross-cohort analysis of dental plaque microbiome in early childhood caries. iScience 2024; 27:110447. [PMID: 39104404 PMCID: PMC11298647 DOI: 10.1016/j.isci.2024.110447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 05/05/2024] [Accepted: 07/01/2024] [Indexed: 08/07/2024] Open
Abstract
Early childhood caries (ECC) is a multifactorial disease with a microbiome playing a significant role in caries progression. Understanding changes at the microbiome level in ECC is required to develop diagnostic and preventive strategies. In our study, we combined data from small independent cohorts to compare microbiome composition using a unified pipeline and applied a batch correction to avoid the pitfalls of batch effects. Our meta-analysis identified common biomarker species between different studies. We identified the best machine learning method for the classification of ECC versus caries-free samples and compared the performance of this method using a leave-one-dataset-out approach. Our random forest model was found to be generalizable when used in combination with other studies. While our results highlight the potential microbial species involved in ECC and disease classification, we also mentioned the limitations that can serve as a guide for future researchers to design and use appropriate tools for such analyses.
Collapse
Affiliation(s)
- Mohd Wasif Khan
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada
- Children’s Hospital Research Institute of Manitoba, Winnipeg, MB, Canada
| | | | - Robert J. Schroth
- Children’s Hospital Research Institute of Manitoba, Winnipeg, MB, Canada
- Department of Preventive Dental Science, University of Manitoba, Winnipeg, MB, Canada
- Department of Pediatrics and Child Health, University of Manitoba, Winnipeg, MB, Canada
| | - Prashen Chelikani
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada
- Children’s Hospital Research Institute of Manitoba, Winnipeg, MB, Canada
- Manitoba Chemosensory Biology Research Group, Department of Oral Biology, University of Manitoba, Winnipeg, MB, Canada
| | - Pingzhao Hu
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada
- Children’s Hospital Research Institute of Manitoba, Winnipeg, MB, Canada
- Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada
- Department of Biochemistry, Western University, London, ON, Canada
| |
Collapse
|
20
|
Chibwe K, Sundararaju S, Zhang L, Tsui C, Tang P, Ling F. Intra-hospital microbiome variability is driven by accessibility and clinical activities. Microbiol Spectr 2024; 12:e0029624. [PMID: 38940596 PMCID: PMC11302010 DOI: 10.1128/spectrum.00296-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 05/30/2024] [Indexed: 06/29/2024] Open
Abstract
The hospital environmental microbiome, which can affect patients' and healthcare workers' health, is highly variable and the drivers of this variability are not well understood. In this study, we collected 37 surface samples from the neonatal intensive care unit (NICU) in an inpatient hospital before and after the operation began. Additionally, healthcare workers collected 160 surface samples from five additional areas of the hospital. All samples were analyzed using 16S rRNA gene amplicon sequencing, and the samples collected by healthcare workers were cultured. The NICU samples exhibited similar alpha and beta diversities before and after opening, which indicated that the microbiome there was stable over time. Conversely, the diversities of samples taken after opening varied widely by area. Principal coordinate analysis (PCoA) showed the samples clustered into two distinct groups: high alpha diversity [the pediatric intensive care unit (PICU), pathology lab, and microbiology lab] and low alpha diversity [the NICU, pediatric surgery ward, and infection prevention and control (IPAC) office]. Least absolute shrinkage and selection operator (LASSO) classification models identified 156 informative amplicon sequence variants (ASVs) for predicting the sample's area of origin. The testing accuracy ranged from 86.37% to 100%, which outperformed linear and radial support vector machine (SVM) and random forest models. ASVs of genera that contain emerging pathogens were identified in these models. Culture experiments had identified viable species among the samples, including potential antibiotic-resistant bacteria. Though area type differences were not noted in the culture data, the prevalences and relative abundances of genera detected positively correlated with 16S sequencing data. This study brings to light the microbial community temporal and spatial variation within the hospital and the importance of pathogenic and commensal bacteria to understanding dispersal patterns for infection control. IMPORTANCE We sampled surface samples from a newly built inpatient hospital in multiple areas, including areas accessed by only healthcare workers. Our analysis of the neonatal intensive care unit (NICU) showed that the microbiome was stable before and after the operation began, possibly due to access restrictions. Of the high-touch samples taken after opening, areas with high diversity had more potential external seeds (long-term patients and clinical samples), and areas with low diversity and had fewer (short-term or newborn patients). Classification models performed at high accuracy and identified biomarkers that could be used for more targeted surveillance and infection control. Though culturing data yielded viability and antibiotic-resistance information, it disproportionately detected the presence of genera relative to 16S data. This difference reinforces the utility of 16S sequencing in profiling hospital microbiomes. By examining the microbiome over time and in multiple areas, we identified potential drivers of the microbial variation within a hospital.
Collapse
Affiliation(s)
- Kaseba Chibwe
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri, USA
| | | | - Lin Zhang
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Clement Tsui
- Department of Pathology, Sidra Medicine, Doha, Qatar
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine-Qatar, Doha, Qatar
- Faculty of Medicine, University of British Columbia, Vancouver, Canada
- Infectious Diseases Research Laboratory, National Centre for Infectious Diseases, Singapore
| | - Patrick Tang
- Department of Pathology, Sidra Medicine, Doha, Qatar
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine-Qatar, Doha, Qatar
| | - Fangqiong Ling
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri, USA
- Division of Biological and Biomedical Sciences, Washington University in St. Louis, St. Louis, Missouri, USA
- Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, Missouri, USA
| |
Collapse
|
21
|
Chen J, Zhu Y, Yuan Q. Predicting potential microbe-disease associations based on dual branch graph convolutional network. J Cell Mol Med 2024; 28:e18571. [PMID: 39086148 PMCID: PMC11291560 DOI: 10.1111/jcmm.18571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 06/15/2024] [Accepted: 06/27/2024] [Indexed: 08/02/2024] Open
Abstract
Studying the association between microbes and diseases not only aids in the prevention and diagnosis of diseases, but also provides crucial theoretical support for new drug development and personalized treatment. Due to the time-consuming and costly nature of laboratory-based biological tests to confirm the relationship between microbes and diseases, there is an urgent need for innovative computational frameworks to anticipate new associations between microbes and diseases. Here, we propose a novel computational approach based on a dual branch graph convolutional network (GCN) module, abbreviated as DBGCNMDA, for identifying microbe-disease associations. First, DBGCNMDA calculates the similarity matrix of diseases and microbes by integrating functional similarity and Gaussian association spectrum kernel (GAPK) similarity. Then, semantic information from different biological networks is extracted by two GCN modules from different perspectives. Finally, the scores of microbe-disease associations are predicted based on the extracted features. The main innovation of this method lies in the use of two types of information for microbe/disease similarity assessment. Additionally, we extend the disease nodes to address the issue of insufficient features due to low data dimensionality. We optimize the connectivity between the homogeneous entities using random walk with restart (RWR), and then use the optimized similarity matrix as the initial feature matrix. In terms of network understanding, we design a dual branch GCN module, namely GlobalGCN and LocalGCN, to fine-tune node representations by introducing side information, including homologous neighbour nodes. We evaluate the accuracy of the DBGCNMDA model using five-fold cross-validation (5-fold-CV) technique. The results show that the area under the receiver operating characteristic curve (AUC) and area under the precision versus recall curve (AUPR) of the DBGCNMDA model in the 5-fold-CV are 0.9559 and 0.9630, respectively. The results from the case studies using published experimental data confirm a significant number of predicted associations, indicating that DBGCNMDA is an effective tool for predicting potential microbe-disease associations.
Collapse
Affiliation(s)
- Jing Chen
- School of Electronic and Information EngineeringSuzhou University of Science and TechnologySuzhouChina
| | - Yongjun Zhu
- School of Electronic and Information EngineeringSuzhou University of Science and TechnologySuzhouChina
| | - Qun Yuan
- Department of Respiratory Medicine, The Affiliated Suzhou Hospital of NanjingUniversity Medical SchoolSuzhouChina
| |
Collapse
|
22
|
Wang Z, Peng X, Hülpüsch C, Khan Mirzaei M, Reiger M, Traidl-Hoffmann C, Deng L, Schloter M. Distinct prophage gene profiles of Staphylococcus aureus strains from atopic dermatitis patients and healthy individuals. Microbiol Spectr 2024; 12:e0091524. [PMID: 39012113 DOI: 10.1128/spectrum.00915-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 06/13/2024] [Indexed: 07/17/2024] Open
Abstract
Staphylococcus aureus strains exhibit varying associations with atopic dermatitis (AD), but the genetic determinants underpinning the pathogenicity are yet to be fully characterized. To reveal the genetic differences between S. aureus strains from AD patients and healthy individuals (HE), we developed and employed a random forest classifier to identify potential marker genes responsible for their phenotypic variations. The classifier was able to effectively distinguish strains from AD and HE. We also uncovered strong links between certain marker genes and phage functionalities, with phage holin emerging as the most pivotal differentiating factor. Further examination of S. aureus gene content highlighted the genetic diversity and functional implications of prophages in driving differentiation between strains from AD and HE. The HE group exhibited greater gene content diversity, largely influenced by their prophages. While strains from both AD and HE universally housed prophages, those in the HE group were distinctively higher at the strain level. Moreover, although prophages in the HE group exhibited variously higher enrichment of differential functions, the AD group displayed a notable enrichment of virulence factors within their prophages, underscoring the important contribution of prophages to the pathogenesis of AD-associated strains. Overall, prophages significantly shape the genetic and functional profiles of S. aureus strains, shedding light on their pathogenic potential and elucidating the mechanisms behind the phenotypic variations in AD and HE environments. IMPORTANCE Through a nuanced exploration of Staphylococcus aureus strains obtained from atopic dermatitis (AD) patients and healthy controls (HE), our research unveils pivotal genetic determinants influencing their pathogenic associations. Utilizing a random forest classifier, we illuminate distinct marker genes, with phage holin emerging as a critical differential factor, revealing the profound impact of prophages on genetic and pathogenic profiles. HE strains exhibited a diverse gene content, notably shaped by unique, heightened prophages. Conversely, AD strains emphasized a pronounced enrichment of virulence factors within prophages, signifying their key role in AD pathogenesis. This work crucially highlights prophages as central architects of the genetic and functional attributes of S. aureus strains, providing vital insights into pathogenic mechanisms and phenotypic variations, thereby paving the way for targeted AD therapeutic approaches and management strategies by demystifying specific genetic and pathogenic mechanisms.
Collapse
Affiliation(s)
- Zhongjie Wang
- Research Unit for Comparative Microbiome Analysis, Helmholtz Munich, German Research Center for Environmental Health, Neuherberg, Germany
| | - Xue Peng
- Faculty of Biology, Biocenter, Ludwig Maximilian University of Munich, Munich, Germany
- Institute of Virology, Helmholtz Munich, German Research Centre for Environmental Health, Neuherberg, Germany
| | - Claudia Hülpüsch
- Environmental Medicine, Faculty of Medicine, University of Augsburg, Augsburg, Germany
- Insitute of Environmental Medicine, Helmholtz Munich, German Research Center for Environmental Health, Neuherberg, Germany
- Christine Kühne Center for Allergy Research and Education, Davos, Switzerland
| | - Mohammadali Khan Mirzaei
- Institute of Virology, Helmholtz Munich, German Research Centre for Environmental Health, Neuherberg, Germany
- Chair of Prevention of Microbial Infectious Diseases, Central Institute of Disease Prevention and School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Matthias Reiger
- Environmental Medicine, Faculty of Medicine, University of Augsburg, Augsburg, Germany
- Insitute of Environmental Medicine, Helmholtz Munich, German Research Center for Environmental Health, Neuherberg, Germany
| | - Claudia Traidl-Hoffmann
- Environmental Medicine, Faculty of Medicine, University of Augsburg, Augsburg, Germany
- Insitute of Environmental Medicine, Helmholtz Munich, German Research Center for Environmental Health, Neuherberg, Germany
- Christine Kühne Center for Allergy Research and Education, Davos, Switzerland
| | - Li Deng
- Institute of Virology, Helmholtz Munich, German Research Centre for Environmental Health, Neuherberg, Germany
- Chair of Prevention of Microbial Infectious Diseases, Central Institute of Disease Prevention and School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Michael Schloter
- Research Unit for Comparative Microbiome Analysis, Helmholtz Munich, German Research Center for Environmental Health, Neuherberg, Germany
- Chair of Environmental Microbiology, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| |
Collapse
|
23
|
Yehuala TZ, Agimas MC, Derseh NM, Wubante SM, Fente BM, Yismaw GA, Tesfie TK. Machine learning algorithms to predict healthcare-seeking behaviors of mothers for acute respiratory infections and their determinants among children under five in sub-Saharan Africa. Front Public Health 2024; 12:1362392. [PMID: 38962762 PMCID: PMC11220189 DOI: 10.3389/fpubh.2024.1362392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 06/03/2024] [Indexed: 07/05/2024] Open
Abstract
Background Acute respiratory infections (ARIs) are the leading cause of death in children under the age of 5 globally. Maternal healthcare-seeking behavior may help minimize mortality associated with ARIs since they make decisions about the kind and frequency of healthcare services for their children. Therefore, this study aimed to predict the absence of maternal healthcare-seeking behavior and identify its associated factors among children under the age 5 in sub-Saharan Africa (SSA) using machine learning models. Methods The sub-Saharan African countries' demographic health survey was the source of the dataset. We used a weighted sample of 16,832 under-five children in this study. The data were processed using Python (version 3.9), and machine learning models such as extreme gradient boosting (XGB), random forest, decision tree, logistic regression, and Naïve Bayes were applied. In this study, we used evaluation metrics, including the AUC ROC curve, accuracy, precision, recall, and F-measure, to assess the performance of the predictive models. Result In this study, a weighted sample of 16,832 under-five children was used in the final analysis. Among the proposed machine learning models, the random forest (RF) was the best-predicted model with an accuracy of 88.89%, a precision of 89.5%, an F-measure of 83%, an AUC ROC curve of 95.8%, and a recall of 77.6% in predicting the absence of mothers' healthcare-seeking behavior for ARIs. The accuracy for Naïve Bayes was the lowest (66.41%) when compared to other proposed models. No media exposure, living in rural areas, not breastfeeding, poor wealth status, home delivery, no ANC visit, no maternal education, mothers' age group of 35-49 years, and distance to health facilities were significant predictors for the absence of mothers' healthcare-seeking behaviors for ARIs. On the other hand, undernourished children with stunting, underweight, and wasting status, diarrhea, birth size, married women, being a male or female sex child, and having a maternal occupation were significantly associated with good maternal healthcare-seeking behaviors for ARIs among under-five children. Conclusion The RF model provides greater predictive power for estimating mothers' healthcare-seeking behaviors based on ARI risk factors. Machine learning could help achieve early prediction and intervention in children with high-risk ARIs. This leads to a recommendation for policy direction to reduce child mortality due to ARIs in sub-Saharan countries.
Collapse
Affiliation(s)
- Tirualem Zeleke Yehuala
- Department Health Informatics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia
| | - Muluken Chanie Agimas
- Department of Epidemiology and Biostatistics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia
| | - Nebiyu Mekonnen Derseh
- Department of Epidemiology and Biostatistics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia
| | - Sisay Maru Wubante
- Department Health Informatics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia
| | - Bezawit Melak Fente
- Department of General Midwifery, School of Midwifery, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia
| | - Getaneh Awoke Yismaw
- Department of Epidemiology and Biostatistics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia
| | - Tigabu Kidie Tesfie
- Department of Epidemiology and Biostatistics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia
| |
Collapse
|
24
|
Tamayo M, Olivares M, Ruas-Madiedo P, Margolles A, Espín JC, Medina I, Moreno-Arribas MV, Canals S, Mirasso CR, Ortín S, Beltrán-Sanchez H, Palloni A, Tomás-Barberán FA, Sanz Y. How Diet and Lifestyle Can Fine-Tune Gut Microbiomes for Healthy Aging. Annu Rev Food Sci Technol 2024; 15:283-305. [PMID: 38941492 DOI: 10.1146/annurev-food-072023-034458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2024]
Abstract
Many physical, social, and psychological changes occur during aging that raise the risk of developing chronic diseases, frailty, and dependency. These changes adversely affect the gut microbiota, a phenomenon known as microbe-aging. Those microbiota alterations are, in turn, associated with the development of age-related diseases. The gut microbiota is highly responsive to lifestyle and dietary changes, displaying a flexibility that also provides anactionable tool by which healthy aging can be promoted. This review covers, firstly, the main lifestyle and socioeconomic factors that modify the gut microbiota composition and function during healthy or unhealthy aging and, secondly, the advances being made in defining and promoting healthy aging, including microbiome-informed artificial intelligence tools, personalized dietary patterns, and food probiotic systems.
Collapse
Affiliation(s)
- M Tamayo
- Institute of Agrochemistry and Food Technology, Spanish National Research Council (IATA-CSIC), Valencia, Spain;
- Faculty of Medicine, Autonomous University of Madrid (UAM), Spain
| | - M Olivares
- Institute of Agrochemistry and Food Technology, Spanish National Research Council (IATA-CSIC), Valencia, Spain;
| | | | - A Margolles
- Health Research Institute (ISPA), Asturias, Spain
| | - J C Espín
- Laboratory of Food & Health, Group of Quality, Safety, and Bioactivity of Plant Foods, Centro de Edafología y Biología Aplicada del Segura (CEBAS-CSIC), Murcia, Spain
| | - I Medina
- Instituto de Investigaciones Marinas, Spanish National Research Council (IIM-CSIC), Vigo, Spain
| | | | - S Canals
- Instituto de Neurociencias, Universidad Miguel Hernández-CSIC, Sant Joan d'Alacant, Spain
| | - C R Mirasso
- Instituto de Física Interdisciplinar y Sistemas Complejos IFISC (UIB-CSIC), Campus Universitat de les Illes Balears, Palma de Mallorca, Spain
| | - S Ortín
- Instituto de Física Interdisciplinar y Sistemas Complejos IFISC (UIB-CSIC), Campus Universitat de les Illes Balears, Palma de Mallorca, Spain
| | - H Beltrán-Sanchez
- Department of Community Health Sciences, Fielding School of Public Health and California Center for Population Research, University of California, Los Angeles, California, USA
| | - A Palloni
- Department of Sociology, University of Wisconsin, Madison, Wisconsin, USA
| | - F A Tomás-Barberán
- Laboratory of Food & Health, Group of Quality, Safety, and Bioactivity of Plant Foods, Centro de Edafología y Biología Aplicada del Segura (CEBAS-CSIC), Murcia, Spain
| | - Y Sanz
- Institute of Agrochemistry and Food Technology, Spanish National Research Council (IATA-CSIC), Valencia, Spain;
| |
Collapse
|
25
|
Ayub H, Khan MA, Shehryar Ali Naqvi S, Faseeh M, Kim J, Mehmood A, Kim YJ. Unraveling the Potential of Attentive Bi-LSTM for Accurate Obesity Prognosis: Advancing Public Health towards Sustainable Cities. Bioengineering (Basel) 2024; 11:533. [PMID: 38927769 PMCID: PMC11200407 DOI: 10.3390/bioengineering11060533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 05/13/2024] [Accepted: 05/19/2024] [Indexed: 06/28/2024] Open
Abstract
The global prevalence of obesity presents a pressing challenge to public health and healthcare systems, necessitating accurate prediction and understanding for effective prevention and management strategies. This article addresses the need for improved obesity prediction models by conducting a comprehensive analysis of existing machine learning (ML) and deep learning (DL) approaches. This study introduces a novel hybrid model, Attention-based Bi-LSTM (ABi-LSTM), which integrates attention mechanisms with bidirectional Long Short-Term Memory (Bi-LSTM) networks to enhance interpretability and performance in obesity prediction. Our study fills a crucial gap by bridging healthcare and urban planning domains, offering insights into data-driven approaches to promote healthier living within urban environments. The proposed ABi-LSTM model demonstrates exceptional performance, achieving a remarkable accuracy of 96.5% in predicting obesity levels. Comparative analysis showcases its superiority over conventional approaches, with superior precision, recall, and overall classification balance. This study highlights significant advancements in predictive accuracy and positions the ABi-LSTM model as a pioneering solution for accurate obesity prognosis. The implications extend beyond healthcare, offering a precise tool to address the global obesity epidemic and foster sustainable development in smart cities.
Collapse
Affiliation(s)
- Hina Ayub
- Interdisciplinary Graduate Program in Advance Convergence Technology and Science, Jeju National University, Jeju 63243, Republic of Korea;
| | - Murad-Ali Khan
- Department of Computer Engineering, Jeju National University, Jeju 63243, Republic of Korea;
| | - Syed Shehryar Ali Naqvi
- Department of Electronics Engineering, Jeju National University, Jeju 63243, Republic of Korea; (S.S.A.N.)
| | - Muhammad Faseeh
- Department of Electronics Engineering, Jeju National University, Jeju 63243, Republic of Korea; (S.S.A.N.)
| | - Jungsuk Kim
- Department of Biomedical Engineering, College of IT Convergence, Gachon University, 1342 Seongnamdaero, Sujeong-gu, Seongnam-si 13120, Republic of Korea;
| | - Asif Mehmood
- Department of Biomedical Engineering, College of IT Convergence, Gachon University, 1342 Seongnamdaero, Sujeong-gu, Seongnam-si 13120, Republic of Korea;
| | - Young-Jin Kim
- Medical Device Development Center, Osong Medical Innovation Foundation, Cheongju 28160, Republic of Korea
| |
Collapse
|
26
|
Liu Q, Zhai Y, Hui Y, Chen J, Mi Y, Wang J, Wei H. Identification of red blood cell distribution width as a prognostic factor in acute myeloid leukemia. Exp Hematol 2024; 133:104206. [PMID: 38508299 DOI: 10.1016/j.exphem.2024.104206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 02/27/2024] [Accepted: 02/29/2024] [Indexed: 03/22/2024]
Abstract
Many prognostic factors have been identified in acute myeloid leukemia (AML). In this study, we investigated novel prognostic biomarkers using machine learning and Cox regression models in a prospective cohort of 591 patients with AML and tried to identify potential therapeutic targets based on transcriptomic data. We found that elevated red blood cell distribution width (RDW) at diagnosis was an adverse prognostic factor for AML, independent of the 2022 European LeukemiaNet (ELN2022) genetic risk. As a continuous variable, higher RDW was associated with shorter overall survival (OS) (hazard ratio [HR] 1.087, 95% confidence interval [CI] 1.036-1.139, p < 0.001) and event-free survival (EFS) (HR 1.078, 95% CI 1.033-1.124, p < 0.001). Elevated RDW returned to normal after consolidation therapy, which indicated that leukemia cells resulted in abnormal RDW. We further investigated the relationship between RDW and transcriptome in another cohort of 191 patients with AML and public datasets using gene set enrichment analysis (GSEA) and cell-type identification by estimating relative subsets of RNA transcripts (CIBERSORT). We found that patients in the high-RDW group were significantly enriched in the positive regulation of erythroid differentiation and inflammation-related pathways. Finally, we identified the inflammation-associated gene IL12RB2 and verified its prognostic relevance with patients with AML in public databases, suggesting it as a potential therapy target.
Collapse
MESH Headings
- Humans
- Leukemia, Myeloid, Acute/blood
- Leukemia, Myeloid, Acute/genetics
- Leukemia, Myeloid, Acute/diagnosis
- Leukemia, Myeloid, Acute/mortality
- Erythrocyte Indices
- Female
- Male
- Middle Aged
- Prognosis
- Aged
- Adult
- Biomarkers, Tumor/blood
- Biomarkers, Tumor/genetics
- Transcriptome
- Prospective Studies
Collapse
Affiliation(s)
- Qiaoxue Liu
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China; Tianjin Institutes of Health Science, Tianjin, China
| | - Yujia Zhai
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China; Tianjin Institutes of Health Science, Tianjin, China
| | - Yan Hui
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China; Tianjin Institutes of Health Science, Tianjin, China
| | - Jiayuan Chen
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China; Tianjin Institutes of Health Science, Tianjin, China
| | - Yingchang Mi
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China; Tianjin Institutes of Health Science, Tianjin, China
| | - Jianxiang Wang
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China; Tianjin Institutes of Health Science, Tianjin, China.
| | - Hui Wei
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China; Tianjin Institutes of Health Science, Tianjin, China.
| |
Collapse
|
27
|
Pais N, Ravishanker N, Rajasekaran S, Weinstock G, Tran DB. Randomized feature selection based semi-supervised latent Dirichlet allocation for microbiome analysis. Sci Rep 2024; 14:8855. [PMID: 38632488 PMCID: PMC11024186 DOI: 10.1038/s41598-024-59682-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Accepted: 04/13/2024] [Indexed: 04/19/2024] Open
Abstract
Health and disease are fundamentally influenced by microbial communities and their genes (the microbiome). An in-depth analysis of microbiome structure that enables the classification of individuals based on their health can be crucial in enhancing diagnostics and treatment strategies to improve the overall well-being of an individual. In this paper, we present a novel semi-supervised methodology known as Randomized Feature Selection based Latent Dirichlet Allocation (RFSLDA) to study the impact of the gut microbiome on a subject's health status. Since the data in our study consists of fuzzy health labels, which are self-reported, traditional supervised learning approaches may not be suitable. As a first step, based on the similarity between documents in text analysis and gut-microbiome data, we employ Latent Dirichlet Allocation (LDA), a topic modeling approach which uses microbiome counts as features to group subjects into relatively homogeneous clusters, without invoking any knowledge of observed health status (labels) of subjects. We then leverage information from the observed health status of subjects to associate these clusters with the most similar health status making it a semi-supervised approach. Finally, a feature selection technique is incorporated into the model to improve the overall classification performance. The proposed method provides a semi-supervised topic modelling approach that can help handle the high dimensionality of the microbiome data in association studies. Our experiments reveal that our semi-supervised classification algorithm is effective and efficient in terms of high classification accuracy compared to popular supervised learning approaches like SVM and multinomial logistic model. The RFSLDA framework is attractive because it (i) enhances clustering accuracy by identifying key bacteria types as indicators of health status, (ii) identifies key bacteria types within each group based on estimates of the proportion of bacteria types within the groups, and (iii) computes a measure of within-group similarity to identify highly similar subjects in terms of their health status.
Collapse
Affiliation(s)
- Namitha Pais
- Department of Statistics, University of Connecticut, Storrs, CT, USA.
| | | | | | | | - Dong-Binh Tran
- Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| |
Collapse
|
28
|
Manrique P, Montero I, Fernandez-Gosende M, Martinez N, Cantabrana CH, Rios-Covian D. Past, present, and future of microbiome-based therapies. MICROBIOME RESEARCH REPORTS 2024; 3:23. [PMID: 38841413 PMCID: PMC11149097 DOI: 10.20517/mrr.2023.80] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 03/07/2024] [Accepted: 03/12/2024] [Indexed: 06/07/2024]
Abstract
Technological advances in studying the human microbiome in depth have enabled the identification of microbial signatures associated with health and disease. This confirms the crucial role of microbiota in maintaining homeostasis and the host health status. Nowadays, there are several ways to modulate the microbiota composition to effectively improve host health; therefore, the development of therapeutic treatments based on the gut microbiota is experiencing rapid growth. In this review, we summarize the influence of the gut microbiota on the development of infectious disease and cancer, which are two of the main targets of microbiome-based therapies currently being developed. We analyze the two-way interaction between the gut microbiota and traditional drugs in order to emphasize the influence of gut microbial composition on drug effectivity and treatment response. We explore the different strategies currently available for modulating this ecosystem to our benefit, ranging from 1st generation intervention strategies to more complex 2nd generation microbiome-based therapies and their regulatory framework. Lastly, we finish with a quick overview of what we believe is the future of these strategies, that is 3rd generation microbiome-based therapies developed with the use of artificial intelligence (AI) algorithms.
Collapse
|
29
|
Gradisteanu Pircalabioru G, Raileanu M, Dionisie MV, Lixandru-Petre IO, Iliescu C. Fast detection of bacterial gut pathogens on miniaturized devices: an overview. Expert Rev Mol Diagn 2024; 24:201-218. [PMID: 38347807 DOI: 10.1080/14737159.2024.2316756] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Accepted: 02/06/2024] [Indexed: 03/23/2024]
Abstract
INTRODUCTION Gut microbes pose challenges like colon inflammation, deadly diarrhea, antimicrobial resistance dissemination, and chronic disease onset. Development of early, rapid and specific diagnosis tools is essential for improving infection control. Point-of-care testing (POCT) systems offer rapid, sensitive, low-cost and sample-to-answer methods for microbe detection from various clinical and environmental samples, bringing the advantages of portability, automation, and simple operation. AREAS COVERED Rapid detection of gut microbes can be done using a wide array of techniques including biosensors, immunological assays, electrochemical impedance spectroscopy, mass spectrometry and molecular biology. Inclusion of Internet of Things, machine learning, and smartphone-based point-of-care applications is an important aspect of POCT. In this review, the authors discuss various fast diagnostic platforms for gut pathogens and their main challenges. EXPERT OPINION Developing effective assays for microbe detection can be complex. Assay design must consider factors like target selection, real-time and multiplex detection, sample type, reagent stability and storage, primer/probe design, and optimizing reaction conditions for accuracy and sensitivity. Mitigating these challenges requires interdisciplinary collaboration among scientists, clinicians, engineers, and industry partners. Future efforts are essential to enhance sensitivity, specificity, and versatility of POCT systems for gut microbe detection and quantification, advancing infectious disease diagnostics and management.
Collapse
Affiliation(s)
- Gratiela Gradisteanu Pircalabioru
- eBio-hub Research Centre, National University of Science and Technology "Politehnica" Bucharest, Bucharest, Romania
- Division of Earth, Environmental and Life Sciences, The Research Institute of University of Bucharest (ICUB), Bucharest, Romania
- Academy of Romanian Scientists, Bucharest, Romania
| | - Mina Raileanu
- eBio-hub Research Centre, National University of Science and Technology "Politehnica" Bucharest, Bucharest, Romania
- Department of Life and Environmental Physics, Horia Hulubei National Institute of Physics and Nuclear Engineering, Magurele, Romania
| | - Mihai Viorel Dionisie
- eBio-hub Research Centre, National University of Science and Technology "Politehnica" Bucharest, Bucharest, Romania
| | - Irina-Oana Lixandru-Petre
- eBio-hub Research Centre, National University of Science and Technology "Politehnica" Bucharest, Bucharest, Romania
| | - Ciprian Iliescu
- eBio-hub Research Centre, National University of Science and Technology "Politehnica" Bucharest, Bucharest, Romania
- Academy of Romanian Scientists, Bucharest, Romania
- Microsystems in Biomedical and Environmental Applications, National Research and Development Institute for Microtechnology, Bucharest, Romania
| |
Collapse
|
30
|
Baddal B, Taner F, Uzun Ozsahin D. Harnessing of Artificial Intelligence for the Diagnosis and Prevention of Hospital-Acquired Infections: A Systematic Review. Diagnostics (Basel) 2024; 14:484. [PMID: 38472956 DOI: 10.3390/diagnostics14050484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Revised: 01/23/2024] [Accepted: 02/19/2024] [Indexed: 03/14/2024] Open
Abstract
Healthcare-associated infections (HAIs) are the most common adverse events in healthcare and constitute a major global public health concern. Surveillance represents the foundation for the effective prevention and control of HAIs, yet conventional surveillance is costly and labor intensive. Artificial intelligence (AI) and machine learning (ML) have the potential to support the development of HAI surveillance algorithms for the understanding of HAI risk factors, the improvement of patient risk stratification as well as the prediction and timely detection and prevention of infections. AI-supported systems have so far been explored for clinical laboratory testing and imaging diagnosis, antimicrobial resistance profiling, antibiotic discovery and prediction-based clinical decision support tools in terms of HAIs. This review aims to provide a comprehensive summary of the current literature on AI applications in the field of HAIs and discuss the future potentials of this emerging technology in infection practice. Following the PRISMA guidelines, this study examined the articles in databases including PubMed and Scopus until November 2023, which were screened based on the inclusion and exclusion criteria, resulting in 162 included articles. By elucidating the advancements in the field, we aim to highlight the potential applications of AI in the field, report related issues and shortcomings and discuss the future directions.
Collapse
Affiliation(s)
- Buket Baddal
- Department of Medical Microbiology and Clinical Microbiology, Faculty of Medicine, Near East University, North Cyprus, Mersin 10, 99138 Nicosia, Turkey
- DESAM Research Institute, Near East University, North Cyprus, Mersin 10, 99138 Nicosia, Turkey
| | - Ferdiye Taner
- Department of Medical Microbiology and Clinical Microbiology, Faculty of Medicine, Near East University, North Cyprus, Mersin 10, 99138 Nicosia, Turkey
- DESAM Research Institute, Near East University, North Cyprus, Mersin 10, 99138 Nicosia, Turkey
| | - Dilber Uzun Ozsahin
- Department of Medical Diagnostic Imaging, College of Health Science, University of Sharjah, Sharjah 27272, United Arab Emirates
- Research Institute for Medical and Health Sciences, University of Sharjah, Sharjah 27272, United Arab Emirates
- Operational Research Centre in Healthcare, Near East University, North Cyprus, Mersin 10, 99138 Nicosia, Turkey
| |
Collapse
|
31
|
Iqbal S, Begum F, Ullah I, Jalal N, Shaw P. Peeling off the layers from microbial dark matter (MDM): recent advances, future challenges, and opportunities. Crit Rev Microbiol 2024:1-21. [PMID: 38385313 DOI: 10.1080/1040841x.2024.2319669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 02/10/2024] [Indexed: 02/23/2024]
Abstract
Microbes represent the most common organisms on Earth; however, less than 2% of microbial species in the environment can undergo cultivation for study under laboratory conditions, and the rest of the enigmatic, microbial world remains mysterious, constituting a kind of "microbial dark matter" (MDM). In the last two decades, remarkable progress has been made in culture-dependent and culture-independent techniques. More recently, studies of MDM have relied on culture-independent techniques to recover genetic material through either unicellular genomics or shotgun metagenomics to construct single-amplified genomes (SAGs) and metagenome-assembled genomes (MAGs), respectively, which provide information about evolution and metabolism. Despite the remarkable progress made in the past decades, the functional diversity of MDM still remains uncharacterized. This review comprehensively summarizes the recently developed culture-dependent and culture-independent techniques for characterizing MDM, discussing major challenges, opportunities, and potential applications. These activities contribute to expanding our knowledge of the microbial world and have implications for various fields including Biotechnology, Bioprospecting, Functional genomics, Medicine, Evolutionary and Planetary biology. Overall, this review aims to peel off the layers from MDM, shed light on recent advancements, identify future challenges, and illuminate the exciting opportunities that lie ahead in unraveling the secrets of this intriguing microbial realm.
Collapse
Affiliation(s)
- Sajid Iqbal
- Oujiang Lab (Zhejiang Laboratory for Regenerative Medicine, Vision, and Brain Health), Wenzhou, China
- School of Pharmaceutical Science, Wenzhou Medical University, Wenzhou, China
| | - Farida Begum
- Department of Biochemistry, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Ihsan Ullah
- College of Chemical Engineering, Fuzhou University, Fuzhou, China
| | - Nasir Jalal
- Oujiang Lab (Zhejiang Laboratory for Regenerative Medicine, Vision, and Brain Health), Wenzhou, China
| | - Peter Shaw
- Oujiang Lab (Zhejiang Laboratory for Regenerative Medicine, Vision, and Brain Health), Wenzhou, China
| |
Collapse
|
32
|
Walsh C, Stallard-Olivera E, Fierer N. Nine (not so simple) steps: a practical guide to using machine learning in microbial ecology. mBio 2024; 15:e0205023. [PMID: 38126787 PMCID: PMC10865974 DOI: 10.1128/mbio.02050-23] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023] Open
Abstract
Due to the complex nature of microbiome data, the field of microbial ecology has many current and potential uses for machine learning (ML) modeling. With the increased use of predictive ML models across many disciplines, including microbial ecology, there is extensive published information on the specific ML algorithms available and how those algorithms have been applied. Thus, our goal is not to summarize the breadth of ML models available or compare their performances. Rather, our goal is to provide more concrete and actionable information to guide microbial ecologists in how to select, run, and interpret ML algorithms to predict the taxa or genes associated with particular sample categories or environmental gradients of interest. Such microbial data often have unique characteristics that require careful consideration of how to apply ML models and how to interpret the associated results. This review is intended for practicing microbial ecologists who may be unfamiliar with some of the intricacies of ML models. We provide examples and discuss common opportunities and pitfalls specific to applying ML models to the types of data sets most frequently collected by microbial ecologists.
Collapse
Affiliation(s)
- Corinne Walsh
- Cooperative Institute of Research in Environmental Sciences, CU Boulder, Boulder, Colorado, USA
- Ecology and Evolutionary Biology Department, CU Boulder, Boulder, Colorado, USA
| | - Elías Stallard-Olivera
- Cooperative Institute of Research in Environmental Sciences, CU Boulder, Boulder, Colorado, USA
- Ecology and Evolutionary Biology Department, CU Boulder, Boulder, Colorado, USA
| | - Noah Fierer
- Cooperative Institute of Research in Environmental Sciences, CU Boulder, Boulder, Colorado, USA
- Ecology and Evolutionary Biology Department, CU Boulder, Boulder, Colorado, USA
| |
Collapse
|
33
|
Kumar B, Lorusso E, Fosso B, Pesole G. A comprehensive overview of microbiome data in the light of machine learning applications: categorization, accessibility, and future directions. Front Microbiol 2024; 15:1343572. [PMID: 38419630 PMCID: PMC10900530 DOI: 10.3389/fmicb.2024.1343572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 01/29/2024] [Indexed: 03/02/2024] Open
Abstract
Metagenomics, Metabolomics, and Metaproteomics have significantly advanced our knowledge of microbial communities by providing culture-independent insights into their composition and functional potential. However, a critical challenge in this field is the lack of standard and comprehensive metadata associated with raw data, hindering the ability to perform robust data stratifications and consider confounding factors. In this comprehensive review, we categorize publicly available microbiome data into five types: shotgun sequencing, amplicon sequencing, metatranscriptomic, metabolomic, and metaproteomic data. We explore the importance of metadata for data reuse and address the challenges in collecting standardized metadata. We also, assess the limitations in metadata collection of existing public repositories collecting metagenomic data. This review emphasizes the vital role of metadata in interpreting and comparing datasets and highlights the need for standardized metadata protocols to fully leverage metagenomic data's potential. Furthermore, we explore future directions of implementation of Machine Learning (ML) in metadata retrieval, offering promising avenues for a deeper understanding of microbial communities and their ecological roles. Leveraging these tools will enhance our insights into microbial functional capabilities and ecological dynamics in diverse ecosystems. Finally, we emphasize the crucial metadata role in ML models development.
Collapse
Affiliation(s)
- Bablu Kumar
- Università degli Studi di Milano, Milan, Italy
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
| | - Erika Lorusso
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
- National Research Council, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Bari, Italy
| | - Bruno Fosso
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
| | - Graziano Pesole
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
- National Research Council, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Bari, Italy
| |
Collapse
|
34
|
Li H, Yu Z, Du F, Song L, Gao Y, Shi F. sscNOVA: a semi-supervised convolutional neural network for predicting functional regulatory variants in autoimmune diseases. Front Immunol 2024; 15:1323072. [PMID: 38380333 PMCID: PMC10876991 DOI: 10.3389/fimmu.2024.1323072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 01/15/2024] [Indexed: 02/22/2024] Open
Abstract
Genome-wide association studies (GWAS) have identified thousands of variants in the human genome with autoimmune diseases. However, identifying functional regulatory variants associated with autoimmune diseases remains challenging, largely because of insufficient experimental validation data. We adopt the concept of semi-supervised learning by combining labeled and unlabeled data to develop a deep learning-based algorithm framework, sscNOVA, to predict functional regulatory variants in autoimmune diseases and analyze the functional characteristics of these regulatory variants. Compared to traditional supervised learning methods, our approach leverages more variants' data to explore the relationship between functional regulatory variants and autoimmune diseases. Based on the experimentally curated testing dataset and evaluation metrics, we find that sscNOVA outperforms other state-of-the-art methods. Furthermore, we illustrate that sscNOVA can help to improve the prioritization of functional regulatory variants from lead single-nucleotide polymorphisms and the proxy variants in autoimmune GWAS data.
Collapse
Affiliation(s)
- Haibo Li
- School of Information Engineering, Ningxia University, Yinchuan, China
| | - Zhenhua Yu
- School of Information Engineering, Ningxia University, Yinchuan, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Yinchuan, Ningxia University, Yinchuan, China
| | - Fang Du
- School of Information Engineering, Ningxia University, Yinchuan, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Yinchuan, Ningxia University, Yinchuan, China
| | - Lijuan Song
- School of Information Engineering, Ningxia University, Yinchuan, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Yinchuan, Ningxia University, Yinchuan, China
| | - Yang Gao
- School of Medical Technology, North Minzu University, Yinchuan, China
| | - Fangyuan Shi
- School of Information Engineering, Ningxia University, Yinchuan, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Yinchuan, Ningxia University, Yinchuan, China
| |
Collapse
|
35
|
Ramon E, Obón-Santacana M, Khannous-Lleiffe O, Saus E, Gabaldón T, Guinó E, Bars-Cortina D, Ibáñez-Sanz G, Rodríguez-Alonso L, Mata A, García-Rodríguez A, Moreno V. Performance of a Shotgun Prediction Model for Colorectal Cancer When Using 16S rRNA Sequencing Data. Int J Mol Sci 2024; 25:1181. [PMID: 38256252 PMCID: PMC10816515 DOI: 10.3390/ijms25021181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 01/10/2024] [Accepted: 01/15/2024] [Indexed: 01/24/2024] Open
Abstract
Colorectal cancer (CRC), the third most common cancer globally, has shown links to disturbed gut microbiota. While significant efforts have been made to establish a microbial signature indicative of CRC using shotgun metagenomic sequencing, the challenge lies in validating this signature with 16S ribosomal RNA (16S) gene sequencing. The primary obstacle is reconciling the differing outputs of these two methodologies, which often lead to divergent statistical models and conclusions. In this study, we introduce an algorithm designed to bridge this gap by mapping shotgun-derived taxa to their 16S counterparts. This mapping enables us to assess the predictive performance of a shotgun-based microbiome signature using 16S data. Our results demonstrate a reduction in performance when applying the 16S-mapped taxa in the shotgun prediction model, though it retains statistical significance. This suggests that while an exact match between shotgun and 16S data may not yet be feasible, our approach provides a viable method for comparative analysis and validation in the context of CRC-associated microbiome research.
Collapse
Affiliation(s)
- Elies Ramon
- Colorectal Cancer Group, ONCOBELL Program, Institut de Recerca Biomedica de Bellvitge (IDIBELL), L’Hospitalet de Llobregat, 08908 Barcelona, Spain
- Unit of Biomarkers and Suceptibility (UBS), Oncology Data Analytics Program (ODAP), Catalan Institute of Oncology (ICO), L’Hospitalet del Llobregat, 08908 Barcelona, Spain
| | - Mireia Obón-Santacana
- Colorectal Cancer Group, ONCOBELL Program, Institut de Recerca Biomedica de Bellvitge (IDIBELL), L’Hospitalet de Llobregat, 08908 Barcelona, Spain
- Unit of Biomarkers and Suceptibility (UBS), Oncology Data Analytics Program (ODAP), Catalan Institute of Oncology (ICO), L’Hospitalet del Llobregat, 08908 Barcelona, Spain
- Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), 28029 Madrid, Spain
| | - Olfat Khannous-Lleiffe
- Barcelona Supercomputing Centre (BSC-CNS), 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| | - Ester Saus
- Barcelona Supercomputing Centre (BSC-CNS), 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| | - Toni Gabaldón
- Barcelona Supercomputing Centre (BSC-CNS), 08034 Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
- Centro de Investigación Biomédica En Red de Enfermedades Infecciosas (CIBERINFEC), 08028 Barcelona, Spain
| | - Elisabet Guinó
- Colorectal Cancer Group, ONCOBELL Program, Institut de Recerca Biomedica de Bellvitge (IDIBELL), L’Hospitalet de Llobregat, 08908 Barcelona, Spain
- Unit of Biomarkers and Suceptibility (UBS), Oncology Data Analytics Program (ODAP), Catalan Institute of Oncology (ICO), L’Hospitalet del Llobregat, 08908 Barcelona, Spain
- Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), 28029 Madrid, Spain
| | - David Bars-Cortina
- Colorectal Cancer Group, ONCOBELL Program, Institut de Recerca Biomedica de Bellvitge (IDIBELL), L’Hospitalet de Llobregat, 08908 Barcelona, Spain
- Unit of Biomarkers and Suceptibility (UBS), Oncology Data Analytics Program (ODAP), Catalan Institute of Oncology (ICO), L’Hospitalet del Llobregat, 08908 Barcelona, Spain
| | - Gemma Ibáñez-Sanz
- Colorectal Cancer Group, ONCOBELL Program, Institut de Recerca Biomedica de Bellvitge (IDIBELL), L’Hospitalet de Llobregat, 08908 Barcelona, Spain
- Unit of Biomarkers and Suceptibility (UBS), Oncology Data Analytics Program (ODAP), Catalan Institute of Oncology (ICO), L’Hospitalet del Llobregat, 08908 Barcelona, Spain
- Gastroenterology Department, Bellvitge University Hospital, L’Hospitalet de Llobregat, 08907 Barcelona, Spain
| | - Lorena Rodríguez-Alonso
- Gastroenterology Department, Bellvitge University Hospital, L’Hospitalet de Llobregat, 08907 Barcelona, Spain
| | - Alfredo Mata
- Digestive System Service, Moisés Broggi Hospital, 08970 Sant Joan Despí, Spain
| | - Ana García-Rodríguez
- Endoscopy Unit, Digestive System Service, Viladecans Hospital-IDIBELL, 08840 Viladecans, Spain
| | - Victor Moreno
- Colorectal Cancer Group, ONCOBELL Program, Institut de Recerca Biomedica de Bellvitge (IDIBELL), L’Hospitalet de Llobregat, 08908 Barcelona, Spain
- Unit of Biomarkers and Suceptibility (UBS), Oncology Data Analytics Program (ODAP), Catalan Institute of Oncology (ICO), L’Hospitalet del Llobregat, 08908 Barcelona, Spain
- Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP), 28029 Madrid, Spain
- Department of Clinical Sciences, Faculty of Medicine and Health Sciences, Universitat de Barcelona Institute of Complex Systems (UBICS), University of Barcelona (UB), L’Hospitalet de Llobregat, 08908 Barcelona, Spain
| |
Collapse
|
36
|
Peralta-Marzal LN, Rojas-Velazquez D, Rigters D, Prince N, Garssen J, Kraneveld AD, Perez-Pardo P, Lopez-Rincon A. A robust microbiome signature for autism spectrum disorder across different studies using machine learning. Sci Rep 2024; 14:814. [PMID: 38191575 PMCID: PMC10774349 DOI: 10.1038/s41598-023-50601-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 12/21/2023] [Indexed: 01/10/2024] Open
Abstract
Autism spectrum disorder (ASD) is a highly complex neurodevelopmental disorder characterized by deficits in sociability and repetitive behaviour, however there is a great heterogeneity within other comorbidities that accompany ASD. Recently, gut microbiome has been pointed out as a plausible contributing factor for ASD development as individuals diagnosed with ASD often suffer from intestinal problems and show a differentiated intestinal microbial composition. Nevertheless, gut microbiome studies in ASD rarely agree on the specific bacterial taxa involved in this disorder. Regarding the potential role of gut microbiome in ASD pathophysiology, our aim is to investigate whether there is a set of bacterial taxa relevant for ASD classification by using a sibling-controlled dataset. Additionally, we aim to validate these results across two independent cohorts as several confounding factors, such as lifestyle, influence both ASD and gut microbiome studies. A machine learning approach, recursive ensemble feature selection (REFS), was applied to 16S rRNA gene sequencing data from 117 subjects (60 ASD cases and 57 siblings) identifying 26 bacterial taxa that discriminate ASD cases from controls. The average area under the curve (AUC) of this specific set of bacteria in the sibling-controlled dataset was 81.6%. Moreover, we applied the selected bacterial taxa in a tenfold cross-validation scheme using two independent cohorts (a total of 223 samples-125 ASD cases and 98 controls). We obtained average AUCs of 74.8% and 74%, respectively. Analysis of the gut microbiome using REFS identified a set of bacterial taxa that can be used to predict the ASD status of children in three distinct cohorts with AUC over 80% for the best-performing classifiers. Our results indicate that the gut microbiome has a strong association with ASD and should not be disregarded as a potential target for therapeutic interventions. Furthermore, our work can contribute to use the proposed approach for identifying microbiome signatures across other 16S rRNA gene sequencing datasets.
Collapse
Affiliation(s)
- Lucia N Peralta-Marzal
- Division of Pharmacology, Faculty of Science, Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, The Netherlands
| | - David Rojas-Velazquez
- Division of Pharmacology, Faculty of Science, Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, The Netherlands
- Department of Data Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Douwe Rigters
- Division of Pharmacology, Faculty of Science, Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, The Netherlands
| | - Naika Prince
- Division of Pharmacology, Faculty of Science, Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, The Netherlands
| | - Johan Garssen
- Division of Pharmacology, Faculty of Science, Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, The Netherlands
- Global Centre of Excellence Immunology, Danone Nutricia Research, Utrecht, The Netherlands
| | - Aletta D Kraneveld
- Division of Pharmacology, Faculty of Science, Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, The Netherlands
- Department of Neuroscience, Faculty of Science, VU University, Amsterdam, The Netherlands
| | - Paula Perez-Pardo
- Division of Pharmacology, Faculty of Science, Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, The Netherlands.
| | - Alejandro Lopez-Rincon
- Division of Pharmacology, Faculty of Science, Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, The Netherlands
- Department of Data Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
37
|
Staab S, Cardénas A, Peixoto RS, Schreiber F, Voolstra CR. Coracle-a machine learning framework to identify bacteria associated with continuous variables. Bioinformatics 2024; 40:btad749. [PMID: 38123508 PMCID: PMC10766586 DOI: 10.1093/bioinformatics/btad749] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 11/06/2023] [Accepted: 12/19/2023] [Indexed: 12/23/2023] Open
Abstract
SUMMARY We present Coracle, an artificial intelligence (AI) framework that can identify associations between bacterial communities and continuous variables. Coracle uses an ensemble approach of prominent feature selection methods and machine learning (ML) models to identify features, i.e. bacteria, associated with a continuous variable, e.g. host thermal tolerance. The results are aggregated into a score that incorporates the performances of the different ML models and the respective feature importance, while also considering the robustness of feature selection. Additionally, regression coefficients provide first insights into the direction of the association. We show the utility of Coracle by analyzing associations between bacterial composition data (i.e. 16S rRNA Amplicon Sequence Variants, ASVs) and coral thermal tolerance (i.e. standardized short-term heat stress-derived diagnostics). This analysis identified high-scoring bacterial taxa that were previously found associated with coral thermal tolerance. Coracle scales with feature number and performs well with hundreds to thousands of features, corresponding to the typical size of current datasets. Coracle performs best if run at a higher taxonomic level first (e.g. order or family) to identify groups of interest that can subsequently be run at the ASV level. AVAILABILITY AND IMPLEMENTATION Coracle can be accessed via a dedicated web server that allows free and simple access: http://www.micportal.org/coracle/index. The underlying code is open-source and available via GitHub https://github.com/SebastianStaab/coracle.git.
Collapse
Affiliation(s)
- Sebastian Staab
- Department of Biology, University of Konstanz, Konstanz 78457, Germany
| | - Anny Cardénas
- Department of Biology, University of Konstanz, Konstanz 78457, Germany
- Department of Biology, American University, Washington, DC, 20016, USA
| | - Raquel S Peixoto
- Computational Biology Research Center (CBRC) and Red Sea Research Center (RSRC), Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Falk Schreiber
- Department of Computer and Information Science, University of Konstanz, Konstanz 78457, Germany
- Faculty of Information Technology, Monash University, 3168, Australia
| | | |
Collapse
|
38
|
Lee S, Lee I. Comprehensive assessment of machine learning methods for diagnosing gastrointestinal diseases through whole metagenome sequencing data. Gut Microbes 2024; 16:2375679. [PMID: 38972064 PMCID: PMC11229738 DOI: 10.1080/19490976.2024.2375679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 06/28/2024] [Indexed: 07/09/2024] Open
Abstract
The gut microbiome, linked significantly to host diseases, offers potential for disease diagnosis through machine learning (ML) pipelines. These pipelines, crucial in modeling diseases using high-dimensional microbiome data, involve selecting profile modalities, data preprocessing techniques, and classification algorithms, each impacting the model accuracy and generalizability. Despite whole metagenome shotgun sequencing (WMS) gaining popularity for human gut microbiome profiling, a consensus on the optimal methods for ML pipelines in disease diagnosis using WMS data remains elusive. Addressing this gap, we comprehensively evaluated ML methods for diagnosing Crohn's disease and colorectal cancer, using 2,553 fecal WMS samples from 21 case-control studies. Our study uncovered crucial insights: gut-specific, species-level taxonomic features proved to be the most effective for profiling; batch correction was not consistently beneficial for model performance; compositional data transformations markedly improved the models; and while nonlinear ensemble classification algorithms typically offered superior performance, linear models with proper regularization were found to be more effective for diseases that are linearly separable based on microbiome data. An optimal ML pipeline, integrating the most effective methods, was validated for generalizability using holdout data. This research offers practical guidelines for constructing reliable disease diagnostic ML models with fecal WMS data.
Collapse
Affiliation(s)
- Sungho Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
- POSTECH Biotech Center, Pohang University of Science and Technology (POSTECH), Pohang, Republic of Korea
| |
Collapse
|
39
|
Fonseca DC, Marques Gomes da Rocha I, Depieri Balmant B, Callado L, Aguiar Prudêncio AP, Tepedino Martins Alves J, Torrinhas RS, da Rocha Fernandes G, Linetzky Waitzberg D. Evaluation of gut microbiota predictive potential associated with phenotypic characteristics to identify multifactorial diseases. Gut Microbes 2024; 16:2297815. [PMID: 38235595 PMCID: PMC10798365 DOI: 10.1080/19490976.2023.2297815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Accepted: 12/18/2023] [Indexed: 01/19/2024] Open
Abstract
Gut microbiota has been implicated in various clinical conditions, yet the substantial heterogeneity in gut microbiota research results necessitates a more sophisticated approach than merely identifying statistically different microbial taxa between healthy and unhealthy individuals. Our study seeks to not only select microbial taxa but also explore their synergy with phenotypic host variables to develop novel predictive models for specific clinical conditions. DESIGN We assessed 50 healthy and 152 unhealthy individuals for phenotypic variables (PV) and gut microbiota (GM) composition by 16S rRNA gene sequencing. The entire modeling process was conducted in the R environment using the Random Forest algorithm. Model performance was assessed through ROC curve construction. RESULTS We evaluated 52 bacterial taxa and pre-selected PV (p < 0.05) for their contribution to the final models. Across all diseases, the models achieved their best performance when GM and PV data were integrated. Notably, the integrated predictive models demonstrated exceptional performance for rheumatoid arthritis (AUC = 88.03%), type 2 diabetes (AUC = 96.96%), systemic lupus erythematosus (AUC = 98.4%), and type 1 diabetes (AUC = 86.19%). CONCLUSION Our findings underscore that the selection of bacterial taxa based solely on differences in relative abundance between groups is insufficient to serve as clinical markers. Machine learning techniques are essential for mitigating the considerable variability observed within gut microbiota. In our study, the use of microbial taxa alone exhibited limited predictive power for health outcomes, while the integration of phenotypic variables into predictive models substantially enhanced their predictive capabilities.
Collapse
Affiliation(s)
- Danielle Cristina Fonseca
- Laboratory of Nutrition and Metabolic Surgery of the Digestive System, LIM 35, Department of Gastroenterology, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
- Department of Gastroenterology, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| | - Ilanna Marques Gomes da Rocha
- Laboratory of Nutrition and Metabolic Surgery of the Digestive System, LIM 35, Department of Gastroenterology, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
- Department of Gastroenterology, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| | - Bianca Depieri Balmant
- Laboratory of Nutrition and Metabolic Surgery of the Digestive System, LIM 35, Department of Gastroenterology, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
- Department of Gastroenterology, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| | - Leticia Callado
- Laboratory of Nutrition and Metabolic Surgery of the Digestive System, LIM 35, Department of Gastroenterology, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
- Department of Gastroenterology, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| | - Ana Paula Aguiar Prudêncio
- Laboratory of Nutrition and Metabolic Surgery of the Digestive System, LIM 35, Department of Gastroenterology, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
- Department of Gastroenterology, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| | - Juliana Tepedino Martins Alves
- Laboratory of Nutrition and Metabolic Surgery of the Digestive System, LIM 35, Department of Gastroenterology, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
- Department of Gastroenterology, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| | - Raquel Susana Torrinhas
- Laboratory of Nutrition and Metabolic Surgery of the Digestive System, LIM 35, Department of Gastroenterology, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
- Department of Gastroenterology, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| | - Gabriel da Rocha Fernandes
- Biosystems Informatics and Genomics Group, Instituto René Rachou - Fiocruz Minas, Belo Horizonte, Brazil
| | - Dan Linetzky Waitzberg
- Laboratory of Nutrition and Metabolic Surgery of the Digestive System, LIM 35, Department of Gastroenterology, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
- Department of Gastroenterology, Hospital das Clínicas HCFMUSP, Faculdade de Medicina, Universidade de São Paulo, São Paulo, Brazil
| |
Collapse
|
40
|
Vänni P, Tejesvi MV, Paalanne N, Aagaard K, Ackermann G, Camargo CA, Eggesbø M, Hasegawa K, Hoen AG, Karagas MR, Kolho KL, Laursen MF, Ludvigsson J, Madan J, Ownby D, Stanton C, Stokholm J, Tapiainen T. Machine-learning analysis of cross-study samples according to the gut microbiome in 12 infant cohorts. mSystems 2023; 8:e0036423. [PMID: 37874156 PMCID: PMC10734493 DOI: 10.1128/msystems.00364-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 09/13/2023] [Indexed: 10/25/2023] Open
Abstract
IMPORTANCE There are challenges in merging microbiome data from diverse research groups due to the intricate and multifaceted nature of such data. To address this, we utilized a combination of machine-learning (ML) models to analyze 16S sequencing data from a substantial set of gut microbiome samples, sourced from 12 distinct infant cohorts that were gathered prospectively. Our initial focus was on the mode of delivery due to its prior association with changes in infant gut microbiomes. Through ML analysis, we demonstrated the effective merging and comparison of various gut microbiome data sets, facilitating the identification of robust microbiome biomarkers applicable across varied study populations.
Collapse
Affiliation(s)
- Petri Vänni
- Research Unit of Clinical Medicine, University of Oulu, Oulu, Finland
| | - Mysore V. Tejesvi
- Research Unit of Clinical Medicine, University of Oulu, Oulu, Finland
- Ecology and Genetics, Faculty of Science, University of Oulu, Oulu, Finland
| | - Niko Paalanne
- Research Unit of Clinical Medicine, University of Oulu, Oulu, Finland
- Department of Pediatrics and Adolescent Medicine, Oulu University Hospital, University of Oulu, Oulu, Finland
| | - Kjersti Aagaard
- Department of Obstetrics & Gynecology, Division of Maternal-Fetal Medicine, Baylor College of Medicine and Texas Children’s Hospital, Houston, Texas, USA
| | - Gail Ackermann
- Department of Pediatrics, University of California, San Diego, California, USA
| | - Carlos A. Camargo
- Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Merete Eggesbø
- Department of Climate and Environmental Health, Norwegian Institute of Public Health, Oslo, Norway
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Kohei Hasegawa
- Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Anne G. Hoen
- Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, New Hampshire, USA
| | - Margaret R. Karagas
- Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, New Hampshire, USA
| | - Kaija-Leena Kolho
- Children’s Hospital, University of Helsinki and HUS, Helsinki, Finland
| | - Martin F. Laursen
- National Food Institute, Technical University of Denmark, Lyngby, Denmark
| | - Johnny Ludvigsson
- Crown Princess Victoria Children’s Hospital and Division of Pediatrics, Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Sweden
| | - Juliette Madan
- Department of Psychiatry, Dartmouth Hitchcock Medical Center, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire, USA
- Department of Pediatrics, Dartmouth Hitchcock Medical Center, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire, USA
| | - Dennis Ownby
- Medical College of Georgia, Augusta, Georgia, USA
| | - Catherine Stanton
- Teagasc Food Research Centre & APC Microbiome Ireland, Moorepark, Fermoy, Co. Cork, Ireland
| | - Jakob Stokholm
- Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
- Department of Food Science, University of Copenhagen, Copenhagen, Denmark
| | - Terhi Tapiainen
- Research Unit of Clinical Medicine, University of Oulu, Oulu, Finland
- Department of Obstetrics & Gynecology, Division of Maternal-Fetal Medicine, Baylor College of Medicine and Texas Children’s Hospital, Houston, Texas, USA
- Biocenter Oulu, University of Oulu, Oulu, Finland
| |
Collapse
|
41
|
Feng K, Ren F, Xing Z, Zhao Y, Yang C, Liu J, Shang Q, Wang X, Wang X. Microbiome and its implications in oncogenesis: a Mendelian randomization perspective. Am J Cancer Res 2023; 13:5785-5804. [PMID: 38187050 PMCID: PMC10767327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 12/02/2023] [Indexed: 01/09/2024] Open
Abstract
The human microbiome, an intricate ecological network, has garnered significant attention due to its potential implications in oncogenesis. This paper delves into the multifaceted relationships between the microbiome, its metabolites, and cancer development, emphasizing the human intestinal tract as the primary microbial habitat. Highlighting the potential causative associations between microbial disturbances and cancer progression, we underscore the role of specific bacterial strains in various cancers, such as stomach and colorectal cancer. Traditional causality assessment methods, like randomized controlled trials (RCTs), have limitations. Therefore, we advocate using Mendelian Randomization (MR) as a powerful alternative to study causal relationships, leveraging genetic variants as instrumental variables. With the proliferation of genome-wide association studies, MR harnesses genetic variations to infer causality, which is especially beneficial when addressing confounders like diet and lifestyle that can skew microbial research. We systematically review MR's application in understanding the microbiome-cancer nexus, emphasizing its strengths and challenges. While MR offers a unique perspective on causality, it faces hurdles like horizontal pleiotropy and weak instrumental variable bias. Integrating MR with multi-omics data, encompassing genomics, transcriptomics, proteomics, and metabolomics, holds promise for future research, potentially heralding groundbreaking discoveries in microbiology and genetics. This comprehensive review underscores the critical role of the human microbiome in oncogenesis and champions MR as an indispensable tool for advancing our understanding in this domain.
Collapse
Affiliation(s)
- Kexin Feng
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing 100021, China
| | - Fei Ren
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing 100021, China
| | - Zeyu Xing
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing 100021, China
| | - Yifan Zhao
- School of Engineering, RMIT UniversityBundoora, VIC 3083, Australia
| | - Chenxuan Yang
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing 100021, China
| | - Jiaxiang Liu
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing 100021, China
| | - Qingyao Shang
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing 100021, China
| | - Xin Wang
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing 100021, China
| | - Xiang Wang
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing 100021, China
| |
Collapse
|
42
|
Mouratidis I, Chantzi N, Khan U, Konnaris MA, Chan CSY, Mareboina M, Moeckel C, Georgakopoulos-Soares I. Frequentmers - a novel way to look at metagenomic next generation sequencing data and an application in detecting liver cirrhosis. BMC Genomics 2023; 24:768. [PMID: 38087204 PMCID: PMC10714505 DOI: 10.1186/s12864-023-09861-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 11/29/2023] [Indexed: 12/17/2023] Open
Abstract
Early detection of human disease is associated with improved clinical outcomes. However, many diseases are often detected at an advanced, symptomatic stage where patients are past efficacious treatment periods and can result in less favorable outcomes. Therefore, methods that can accurately detect human disease at a presymptomatic stage are urgently needed. Here, we introduce "frequentmers"; short sequences that are specific and recurrently observed in either patient or healthy control samples, but not in both. We showcase the utility of frequentmers for the detection of liver cirrhosis using metagenomic Next Generation Sequencing data from stool samples of patients and controls. We develop classification models for the detection of liver cirrhosis and achieve an AUC score of 0.91 using ten-fold cross-validation. A small subset of 200 frequentmers can achieve comparable results in detecting liver cirrhosis. Finally, we identify the microbial organisms in liver cirrhosis samples, which are associated with the most predictive frequentmer biomarkers.
Collapse
Affiliation(s)
- Ioannis Mouratidis
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, Penn State College of Medicine, Hershey, PA, USA.
| | - Nikol Chantzi
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, Penn State College of Medicine, Hershey, PA, USA
| | - Umair Khan
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA
| | - Maxwell A Konnaris
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, Penn State College of Medicine, Hershey, PA, USA
- Department of Statistics, Penn State, University Park, PA, USA
- Huck Institutes of the Life Sciences, Penn State, University Park, PA, USA
| | - Candace S Y Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Manvita Mareboina
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, Penn State College of Medicine, Hershey, PA, USA
| | - Camille Moeckel
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, Penn State College of Medicine, Hershey, PA, USA
| | - Ilias Georgakopoulos-Soares
- Department of Biochemistry and Molecular Biology, Institute for Personalized Medicine, Penn State College of Medicine, Hershey, PA, USA.
| |
Collapse
|
43
|
Peng C, May A, Abeel T. Unveiling microbial biomarkers of ruminant methane emission through machine learning. Front Microbiol 2023; 14:1308363. [PMID: 38143860 PMCID: PMC10749206 DOI: 10.3389/fmicb.2023.1308363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 11/20/2023] [Indexed: 12/26/2023] Open
Abstract
Background Enteric methane from cow burps, which results from microbial fermentation of high-fiber feed in the rumen, is a significant contributor to greenhouse gas emissions. A promising strategy to address this problem is microbiome-based precision feed, which involves identifying key microorganisms for methane production. While machine learning algorithms have shown success in associating human gut microbiome with various human diseases, there have been limited efforts to employ these algorithms to establish microbial biomarkers for methane emissions in ruminants. Methods In this study, we aim to identify potential methane biomarkers for methane emission from ruminants by employing regression algorithms commonly used in human microbiome studies, coupled with different feature selection methods. To achieve this, we analyzed the microbiome compositions and identified possible confounding metadata variables in two large public datasets of Holstein cows. Using both the microbiome features and identified metadata variables, we trained different regressors to predict methane emission. With the optimized models, permutation tests were used to determine feature importance to find informative microbial features. Results Among the regression algorithms tested, random forest regression outperformed others and allowed the identification of several crucial microbial taxa for methane emission as members of the native rumen microbiome, including the genera Piromyces, Succinivibrionaceae UCG-002, and Acetobacter. Additionally, our results revealed that certain herd locations and feed composition markers, such as the lipid intake and neutral-detergent fiber intake, are also predictive features for methane emissions. Conclusion We demonstrated that machine learning, particularly regression algorithms, can effectively predict cow methane emissions and identify relevant rumen microorganisms. Our findings offer valuable insights for the development of microbiome-based precision feed strategies aiming at reducing methane emissions.
Collapse
Affiliation(s)
- Chengyao Peng
- Delft Bioinformatics Lab, Delft University of Technology, Delft, Netherlands
| | - Ali May
- dsm-firmenich, Science & Research, Delft, Netherlands
| | - Thomas Abeel
- Delft Bioinformatics Lab, Delft University of Technology, Delft, Netherlands
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States
| |
Collapse
|
44
|
Angelova IY, Kovtun AS, Averina OV, Koshenko TA, Danilenko VN. Unveiling the Connection between Microbiota and Depressive Disorder through Machine Learning. Int J Mol Sci 2023; 24:16459. [PMID: 38003647 PMCID: PMC10671666 DOI: 10.3390/ijms242216459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/13/2023] [Accepted: 11/15/2023] [Indexed: 11/26/2023] Open
Abstract
In the last few years, investigation of the gut-brain axis and the connection between the gut microbiota and the human nervous system and mental health has become one of the most popular topics. Correlations between the taxonomic and functional changes in gut microbiota and major depressive disorder have been shown in several studies. Machine learning provides a promising approach to analyze large-scale metagenomic data and identify biomarkers associated with depression. In this work, machine learning algorithms, such as random forest, elastic net, and You Only Look Once (YOLO), were utilized to detect significant features in microbiome samples and classify individuals based on their disorder status. The analysis was conducted on metagenomic data obtained during the study of gut microbiota of healthy people and patients with major depressive disorder. The YOLO method showed the greatest effectiveness in the analysis of the metagenomic samples and confirmed the experimental results on the critical importance of a reduction in the amount of Faecalibacterium prausnitzii for the manifestation of depression. These findings could contribute to a better understanding of the role of the gut microbiota in major depressive disorder and potentially lead the way for novel diagnostic and therapeutic strategies.
Collapse
Affiliation(s)
- Irina Y. Angelova
- Vavilov Institute of General Genetics, Russian Academy of Sciences (RAS), 119333 Moscow, Russia; (A.S.K.); (O.V.A.); (V.N.D.)
| | | | | | | | | |
Collapse
|
45
|
Ibrahimi E, Lopes MB, Dhamo X, Simeon A, Shigdel R, Hron K, Stres B, D’Elia D, Berland M, Marcos-Zambrano LJ. Overview of data preprocessing for machine learning applications in human microbiome research. Front Microbiol 2023; 14:1250909. [PMID: 37869650 PMCID: PMC10588656 DOI: 10.3389/fmicb.2023.1250909] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 09/22/2023] [Indexed: 10/24/2023] Open
Abstract
Although metagenomic sequencing is now the preferred technique to study microbiome-host interactions, analyzing and interpreting microbiome sequencing data presents challenges primarily attributed to the statistical specificities of the data (e.g., sparse, over-dispersed, compositional, inter-variable dependency). This mini review explores preprocessing and transformation methods applied in recent human microbiome studies to address microbiome data analysis challenges. Our results indicate a limited adoption of transformation methods targeting the statistical characteristics of microbiome sequencing data. Instead, there is a prevalent usage of relative and normalization-based transformations that do not specifically account for the specific attributes of microbiome data. The information on preprocessing and transformations applied to the data before analysis was incomplete or missing in many publications, leading to reproducibility concerns, comparability issues, and questionable results. We hope this mini review will provide researchers and newcomers to the field of human microbiome research with an up-to-date point of reference for various data transformation tools and assist them in choosing the most suitable transformation method based on their research questions, objectives, and data characteristics.
Collapse
Affiliation(s)
- Eliana Ibrahimi
- Department of Biology, Faculty of Natural Sciences, University of Tirana, Tirana, Albania
| | - Marta B. Lopes
- Department of Mathematics, Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
- UNIDEMI, Department of Mechanical and Industrial Engineering, NOVA School of Science and Technology, Caparica, Portugal
| | - Xhilda Dhamo
- Department of Applied Mathematics, Faculty of Natural Sciences, University of Tirana, Tirana, Albania
| | - Andrea Simeon
- BioSense Institute, University of Novi Sad, Novi Sad, Serbia
| | - Rajesh Shigdel
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Karel Hron
- Department of Mathematical Analysis and Applications of Mathematics, Faculty of Science, Palacký University Olomouc, Olomouc, Czechia
| | - Blaž Stres
- Department of Catalysis and Chemical Reaction Engineering, National Institute of Chemistry, Ljubljana, Slovenia
- Faculty of Civil and Geodetic Engineering, Institute of Sanitary Engineering, Ljubljana, Slovenia
- Department of Automation, Biocybernetics and Robotics, Jožef Stefan Institute, Ljubljana, Slovenia
- Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Domenica D’Elia
- Department of Biomedical Sciences, National Research Council, Institute for Biomedical Technologies, Bari, Italy
| | - Magali Berland
- INRAE, MetaGenoPolis, Université Paris-Saclay, Jouy-en-Josas, France
| | - Laura Judith Marcos-Zambrano
- Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
| |
Collapse
|
46
|
Almanza-Aguilera E, Cano A, Gil-Lespinard M, Burguera N, Zamora-Ros R, Agudo A, Farràs M. Mediterranean diet and olive oil, microbiota, and obesity-related cancers. From mechanisms to prevention. Semin Cancer Biol 2023; 95:103-119. [PMID: 37543179 DOI: 10.1016/j.semcancer.2023.08.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 07/02/2023] [Accepted: 08/01/2023] [Indexed: 08/07/2023]
Abstract
Olive oil (OO) is the main source of added fat in the Mediterranean diet (MD). It is a mix of bioactive compounds, including monounsaturated fatty acids, phytosterols, simple phenols, secoiridoids, flavonoids, and terpenoids. There is a growing body of evidence that MD and OO improve obesity-related factors. In addition, obesity has been associated with an increased risk for several cancers: endometrial, oesophageal adenocarcinoma, renal, pancreatic, hepatocellular, gastric cardia, meningioma, multiple myeloma, colorectal, postmenopausal breast, ovarian, gallbladder, and thyroid cancer. However, the epidemiological evidence linking MD and OO with these obesity-related cancers, and their potential mechanisms of action, especially those involving the gut microbiota, are not clearly described or understood. The goals of this review are 1) to update the current epidemiological knowledge on the associations between MD and OO consumption and obesity-related cancers, 2) to identify the gut microbiota mechanisms involved in obesity-related cancers, and 3) to report the effects of MD and OO on these mechanisms.
Collapse
Affiliation(s)
- Enrique Almanza-Aguilera
- Unit of Nutrition and Cancer, Epidemiology Research Program, Catalan Institute of Oncology (ICO), Bellvitge Biomedical Research Institute (IDIBELL), 08908 L'Hospitalet de Llobregat, Spain
| | - Ainara Cano
- Food Research, AZTI, Basque Research and Technology Alliance (BRTA), Parque Tecnológico de Bizkaia, Astondo Bidea, Edificio 609, 48160, Derio, Spain
| | - Mercedes Gil-Lespinard
- Unit of Nutrition and Cancer, Epidemiology Research Program, Catalan Institute of Oncology (ICO), Bellvitge Biomedical Research Institute (IDIBELL), 08908 L'Hospitalet de Llobregat, Spain
| | - Nerea Burguera
- Food Research, AZTI, Basque Research and Technology Alliance (BRTA), Parque Tecnológico de Bizkaia, Astondo Bidea, Edificio 609, 48160, Derio, Spain
| | - Raul Zamora-Ros
- Unit of Nutrition and Cancer, Epidemiology Research Program, Catalan Institute of Oncology (ICO), Bellvitge Biomedical Research Institute (IDIBELL), 08908 L'Hospitalet de Llobregat, Spain; Department of Nutrition, Food Sciences, and Gastronomy, Food Innovation Network (XIA), Institute for Research on Nutrition and Food Safety (INSA), Faculty of Pharmacy and Food Sciences University of Barcelona, Barcelona, Spain.
| | - Antonio Agudo
- Unit of Nutrition and Cancer, Epidemiology Research Program, Catalan Institute of Oncology (ICO), Bellvitge Biomedical Research Institute (IDIBELL), 08908 L'Hospitalet de Llobregat, Spain
| | - Marta Farràs
- Unit of Nutrition and Cancer, Epidemiology Research Program, Catalan Institute of Oncology (ICO), Bellvitge Biomedical Research Institute (IDIBELL), 08908 L'Hospitalet de Llobregat, Spain.
| |
Collapse
|
47
|
Theodosiou AA, Read RC. Artificial intelligence, machine learning and deep learning: Potential resources for the infection clinician. J Infect 2023; 87:287-294. [PMID: 37468046 DOI: 10.1016/j.jinf.2023.07.006] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 07/12/2023] [Indexed: 07/21/2023]
Abstract
BACKGROUND Artificial intelligence (AI), machine learning and deep learning (including generative AI) are increasingly being investigated in the context of research and management of human infection. OBJECTIVES We summarise recent and potential future applications of AI and its relevance to clinical infection practice. METHODS 1617 PubMed results were screened, with priority given to clinical trials, systematic reviews and meta-analyses. This narrative review focusses on studies using prospectively collected real-world data with clinical validation, and on research with translational potential, such as novel drug discovery and microbiome-based interventions. RESULTS There is some evidence of clinical utility of AI applied to laboratory diagnostics (e.g. digital culture plate reading, malaria diagnosis, antimicrobial resistance profiling), clinical imaging analysis (e.g. pulmonary tuberculosis diagnosis), clinical decision support tools (e.g. sepsis prediction, antimicrobial prescribing) and public health outbreak management (e.g. COVID-19). Most studies to date lack any real-world validation or clinical utility metrics. Significant heterogeneity in study design and reporting limits comparability. Many practical and ethical issues exist, including algorithm transparency and risk of bias. CONCLUSIONS Interest in and development of AI-based tools for infection research and management are undoubtedly gaining pace, although the real-world clinical utility to date appears much more modest.
Collapse
Affiliation(s)
- Anastasia A Theodosiou
- Clinical and Experimental Sciences and NIHR Southampton Biomedical Research Centre, University Hospital Southampton, Tremona Road, SO166YD Southampton, United Kingdom.
| | - Robert C Read
- Clinical and Experimental Sciences and NIHR Southampton Biomedical Research Centre, University Hospital Southampton, Tremona Road, SO166YD Southampton, United Kingdom
| |
Collapse
|
48
|
Chang CC, Liu TC, Lu CJ, Chiu HC, Lin WN. Machine learning strategy for identifying altered gut microbiomes for diagnostic screening in myasthenia gravis. Front Microbiol 2023; 14:1227300. [PMID: 37829445 PMCID: PMC10565662 DOI: 10.3389/fmicb.2023.1227300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 09/06/2023] [Indexed: 10/14/2023] Open
Abstract
Myasthenia gravis (MG) is a neuromuscular junction disease with a complex pathophysiology and clinical variation for which no clear biomarker has been discovered. We hypothesized that because changes in gut microbiome composition often occur in autoimmune diseases, the gut microbiome structures of patients with MG would differ from those without, and supervised machine learning (ML) analysis strategy could be trained using data from gut microbiota for diagnostic screening of MG. Genomic DNA from the stool samples of MG and those without were collected and established a sequencing library by constructing amplicon sequence variants (ASVs) and completing taxonomic classification of each representative DNA sequence. Four ML methods, namely least absolute shrinkage and selection operator, extreme gradient boosting (XGBoost), random forest, and classification and regression trees with nested leave-one-out cross-validation were trained using ASV taxon-based data and full ASV-based data to identify key ASVs in each data set. The results revealed XGBoost to have the best predicted performance. Overlapping key features extracted when XGBoost was trained using the full ASV-based and ASV taxon-based data were identified, and 31 high-importance ASVs (HIASVs) were obtained, assigned importance scores, and ranked. The most significant difference observed was in the abundance of bacteria in the Lachnospiraceae and Ruminococcaceae families. The 31 HIASVs were used to train the XGBoost algorithm to differentiate individuals with and without MG. The model had high diagnostic classification power and could accurately predict and identify patients with MG. In addition, the abundance of Lachnospiraceae was associated with limb weakness severity. In this study, we discovered that the composition of gut microbiomes differed between MG and non-MG subjects. In addition, the proposed XGBoost model trained using 31 HIASVs had the most favorable performance with respect to analyzing gut microbiomes. These HIASVs selected by the ML model may serve as biomarkers for clinical use and mechanistic study in the future. Our proposed ML model can identify several taxonomic markers and effectively discriminate patients with MG from those without with a high accuracy, the ML strategy can be applied as a benchmark to conduct noninvasive screening of MG.
Collapse
Affiliation(s)
- Che-Cheng Chang
- PhD Program in Nutrition and Food Science, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Neurology, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City, Taiwan
- Graduate Institute of Biomedical and Pharmaceutical Science, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Tzu-Chi Liu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Hou-Chang Chiu
- School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Neurology, Taipei Medical University, Shuang-Ho Hospital, New Taipei City, Taiwan
| | - Wei-Ning Lin
- Graduate Institute of Biomedical and Pharmaceutical Science, Fu Jen Catholic University, New Taipei City, Taiwan
| |
Collapse
|
49
|
D’Elia D, Truu J, Lahti L, Berland M, Papoutsoglou G, Ceci M, Zomer A, Lopes MB, Ibrahimi E, Gruca A, Nechyporenko A, Frohme M, Klammsteiner T, Pau ECDS, Marcos-Zambrano LJ, Hron K, Pio G, Simeon A, Suharoschi R, Moreno-Indias I, Temko A, Nedyalkova M, Apostol ES, Truică CO, Shigdel R, Telalović JH, Bongcam-Rudloff E, Przymus P, Jordamović NB, Falquet L, Tarazona S, Sampri A, Isola G, Pérez-Serrano D, Trajkovik V, Klucar L, Loncar-Turukalo T, Havulinna AS, Jansen C, Bertelsen RJ, Claesson MJ. Advancing microbiome research with machine learning: key findings from the ML4Microbiome COST action. Front Microbiol 2023; 14:1257002. [PMID: 37808321 PMCID: PMC10558209 DOI: 10.3389/fmicb.2023.1257002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 09/05/2023] [Indexed: 10/10/2023] Open
Abstract
The rapid development of machine learning (ML) techniques has opened up the data-dense field of microbiome research for novel therapeutic, diagnostic, and prognostic applications targeting a wide range of disorders, which could substantially improve healthcare practices in the era of precision medicine. However, several challenges must be addressed to exploit the benefits of ML in this field fully. In particular, there is a need to establish "gold standard" protocols for conducting ML analysis experiments and improve interactions between microbiome researchers and ML experts. The Machine Learning Techniques in Human Microbiome Studies (ML4Microbiome) COST Action CA18131 is a European network established in 2019 to promote collaboration between discovery-oriented microbiome researchers and data-driven ML experts to optimize and standardize ML approaches for microbiome analysis. This perspective paper presents the key achievements of ML4Microbiome, which include identifying predictive and discriminatory 'omics' features, improving repeatability and comparability, developing automation procedures, and defining priority areas for the novel development of ML methods targeting the microbiome. The insights gained from ML4Microbiome will help to maximize the potential of ML in microbiome research and pave the way for new and improved healthcare practices.
Collapse
Affiliation(s)
- Domenica D’Elia
- Department of Biomedical Sciences, National Research Council, Institute for Biomedical Technologies, Bari, Italy
| | - Jaak Truu
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Leo Lahti
- Department of Computing, University of Turku, Turku, Finland
| | - Magali Berland
- Université Paris-Saclay, INRAE, MetaGenoPolis, Jouy-en-Josas, France
| | - Georgios Papoutsoglou
- JADBio Gnosis DA S.A., Science and Technology Park of Crete, Heraklion, Greece
- Department of Computer Science, University of Crete, Heraklion, Greece
| | - Michelangelo Ceci
- Department of Computer Science, University of Bari Aldo Moro, Bari, Italy
| | - Aldert Zomer
- Department of Biomolecular Health Sciences (Infectious Diseases and Immunology), Faculty of Veterinary Medicine, Utrecht University, Utrecht, Netherlands
| | - Marta B. Lopes
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
- UNIDEMI, Department of Mechanical and Industrial Engineering, NOVA School of Science and Technology, Caparica, Portugal
| | - Eliana Ibrahimi
- Department of Biology, University of Tirana, Tirana, Albania
| | - Aleksandra Gruca
- Department of Computer Networks and Systems, Silesian University of Technology, Gliwice, Poland
| | - Alina Nechyporenko
- Systems Engineering Department, Kharkiv National University of Radio Electronics, Kharkiv, Ukraine
- Department of Molecular Biotechnology and Functional Genomics, Technical University of Applied Sciences Wildau, Wildau, Germany
| | - Marcus Frohme
- Department of Molecular Biotechnology and Functional Genomics, Technical University of Applied Sciences Wildau, Wildau, Germany
| | - Thomas Klammsteiner
- Department of Microbiology, Universität Innsbruck, Innsbruck, Austria
- Department of Ecology, Universität Innsbruck, Innsbruck, Austria
| | - Enrique Carrillo-de Santa Pau
- Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, CEI UAM+CSIC, Madrid, Spain
| | - Laura Judith Marcos-Zambrano
- Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, CEI UAM+CSIC, Madrid, Spain
| | - Karel Hron
- Department of Mathematical Analysis and Applications of Mathematics, Faculty of Science, Palacký University, Olomouc, Czechia
| | - Gianvito Pio
- Department of Computer Science, University of Bari Aldo Moro, Bari, Italy
| | - Andrea Simeon
- BioSense Institute, University of Novi Sad, Novi Sad, Serbia
| | - Ramona Suharoschi
- Molecular Nutrition and Proteomics Research Laboratory, Department of Food Science, University of Agricultural Sciences and Veterinary Medicine of Cluj-Napoca, Cluj-Napoca, Romania
| | - Isabel Moreno-Indias
- Department of Endocrinology and Nutrition, Virgen de la Victoria University Hospital, the Biomedical Research Institute of Malaga and Platform in Nanomedicine (IBIMA-BIONAND Platform), University of Malaga, Malaga, Spain
| | - Andriy Temko
- Department of Electrical and Electronic Engineering, University College Cork, Cork, Ireland
| | | | - Elena-Simona Apostol
- Computer Science and Engineering Department, Faculty of Automatic Control and Computers, University Politehnica of Bucharest, Bucharest, Romania
| | - Ciprian-Octavian Truică
- Computer Science and Engineering Department, Faculty of Automatic Control and Computers, University Politehnica of Bucharest, Bucharest, Romania
| | - Rajesh Shigdel
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Jasminka Hasić Telalović
- Department of Computer Science, University Sarajevo School of Science and Technology, Sarajevo, Bosnia and Herzegovina
| | - Erik Bongcam-Rudloff
- Swedish University of Agricultural Sciences, Department of Animal Breeding and Genetics, Uppsala, Sweden
| | | | - Naida Babić Jordamović
- Computational Biology, International Centre for Genetic Engineering and Biotechnology, Trieste, Italy
- Verlab Research Institute for BIomedical Engineering, Medical Devices and Artificial Intelligence, Sarajevo, Bosnia and Herzegovina
| | - Laurent Falquet
- University of Fribourg and Swiss Institute of Bioinformatics, Fribourg, Switzerland
| | - Sonia Tarazona
- Department of Applied Statistics and Operations Research and Quality, Universitat Politècnica de València, València, Spain
| | - Alexia Sampri
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, United Kingdom
| | - Gaetano Isola
- Department of General Surgery and Surgical-Medical Specialties, School of Dentistry, University of Catania, Catania, Italy
| | - David Pérez-Serrano
- Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, CEI UAM+CSIC, Madrid, Spain
| | | | - Lubos Klucar
- Institute of Molecular Biology, Slovak Academy of Sciences, Bratislava, Slovakia
| | | | - Aki S. Havulinna
- Finnish Institute for Health and Welfare, Helsinki, Finland
- Institute for Molecular Medicine Finland, FIMM-HiLIFE, Helsinki, Finland
| | - Christian Jansen
- Biome Diagnostics GmbH, Vienna, Austria
- Institute of Science and Technology Austria (ISTA), Klosterneuburg, Austria
| | | | | |
Collapse
|
50
|
Papoutsoglou G, Tarazona S, Lopes MB, Klammsteiner T, Ibrahimi E, Eckenberger J, Novielli P, Tonda A, Simeon A, Shigdel R, Béreux S, Vitali G, Tangaro S, Lahti L, Temko A, Claesson MJ, Berland M. Machine learning approaches in microbiome research: challenges and best practices. Front Microbiol 2023; 14:1261889. [PMID: 37808286 PMCID: PMC10556866 DOI: 10.3389/fmicb.2023.1261889] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 09/04/2023] [Indexed: 10/10/2023] Open
Abstract
Microbiome data predictive analysis within a machine learning (ML) workflow presents numerous domain-specific challenges involving preprocessing, feature selection, predictive modeling, performance estimation, model interpretation, and the extraction of biological information from the results. To assist decision-making, we offer a set of recommendations on algorithm selection, pipeline creation and evaluation, stemming from the COST Action ML4Microbiome. We compared the suggested approaches on a multi-cohort shotgun metagenomics dataset of colorectal cancer patients, focusing on their performance in disease diagnosis and biomarker discovery. It is demonstrated that the use of compositional transformations and filtering methods as part of data preprocessing does not always improve the predictive performance of a model. In contrast, the multivariate feature selection, such as the Statistically Equivalent Signatures algorithm, was effective in reducing the classification error. When validated on a separate test dataset, this algorithm in combination with random forest modeling, provided the most accurate performance estimates. Lastly, we showed how linear modeling by logistic regression coupled with visualization techniques such as Individual Conditional Expectation (ICE) plots can yield interpretable results and offer biological insights. These findings are significant for clinicians and non-experts alike in translational applications.
Collapse
Affiliation(s)
- Georgios Papoutsoglou
- Department of Computer Science, University of Crete, Heraklion, Greece
- JADBio Gnosis DA S.A., Science and Technology Park of Crete, Heraklion, Greece
| | - Sonia Tarazona
- Department of Applied Statistics and Operations Research and Quality, Polytechnic University of Valencia, Valencia, Spain
| | - Marta B. Lopes
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal
- Research and Development Unit for Mechanical and Industrial Engineering (UNIDEMI), Department of Mechanical and Industrial Engineering, NOVA School of Science and Technology, Caparica, Portugal
| | - Thomas Klammsteiner
- Department of Ecology, Universität Innsbruck, Innsbruck, Austria
- Department of Microbiology, Universität Innsbruck, Innsbruck, Austria
| | - Eliana Ibrahimi
- Department of Biology, University of Tirana, Tirana, Albania
| | - Julia Eckenberger
- School of Microbiology, University College Cork, Cork, Ireland
- APC Microbiome Ireland, Cork, Ireland
| | - Pierfrancesco Novielli
- Department of Soil, Plant, and Food Sciences, University of Bari Aldo Moro, Bari, Italy
- National Institute for Nuclear Physics, Bari Division, Bari, Italy
| | - Alberto Tonda
- UMR 518 MIA-PS, INRAE, Paris-Saclay University, Palaiseau, France
- Complex Systems Institute of Paris Ile-de-France (ISC-PIF) - UAR 3611 CNRS, Paris, France
| | - Andrea Simeon
- BioSense Institute, University of Novi Sad, Novi Sad, Serbia
| | - Rajesh Shigdel
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Stéphane Béreux
- MetaGenoPolis, INRAE, Paris-Saclay University, Jouy-en-Josas, France
- MaIAGE, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| | - Giacomo Vitali
- MetaGenoPolis, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| | - Sabina Tangaro
- Department of Soil, Plant, and Food Sciences, University of Bari Aldo Moro, Bari, Italy
- National Institute for Nuclear Physics, Bari Division, Bari, Italy
| | - Leo Lahti
- Department of Computing, University of Turku, Turku, Finland
| | - Andriy Temko
- Department of Electrical and Electronic Engineering, University College Cork, Cork, Ireland
| | - Marcus J. Claesson
- School of Microbiology, University College Cork, Cork, Ireland
- APC Microbiome Ireland, Cork, Ireland
| | - Magali Berland
- MetaGenoPolis, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| |
Collapse
|