1
|
Oh VKS, Li RW. Wise Roles and Future Visionary Endeavors of Current Emperor: Advancing Dynamic Methods for Longitudinal Microbiome Meta-Omics Data in Personalized and Precision Medicine. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024:e2400458. [PMID: 39535493 DOI: 10.1002/advs.202400458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 09/16/2024] [Indexed: 11/16/2024]
Abstract
Understanding the etiological complexity of diseases requires identifying biomarkers longitudinally associated with specific phenotypes. Advanced sequencing tools generate dynamic microbiome data, providing insights into microbial community functions and their impact on health. This review aims to explore the current roles and future visionary endeavors of dynamic methods for integrating longitudinal microbiome multi-omics data in personalized and precision medicine. This work seeks to synthesize existing research, propose best practices, and highlight innovative techniques. The development and application of advanced dynamic methods, including the unified analytical frameworks and deep learning tools in artificial intelligence, are critically examined. Aggregating data on microbes, metabolites, genes, and other entities offers profound insights into the interactions among microorganisms, host physiology, and external stimuli. Despite progress, the absence of gold standards for validating analytical protocols and data resources of various longitudinal multi-omics studies remains a significant challenge. The interdependence of workflow steps critically affects overall outcomes. This work provides a comprehensive roadmap for best practices, addressing current challenges with advanced dynamic methods. The review underscores the biological effects of clinical, experimental, and analytical protocol settings on outcomes. Establishing consensus on dynamic microbiome inter-studies and advancing reliable analytical protocols are pivotal for the future of personalized and precision medicine.
Collapse
Affiliation(s)
- Vera-Khlara S Oh
- Big Biomedical Data Integration and Statistical Analysis (DIANA) Research Center, Department of Data Science, College of Natural Sciences, Jeju National University, Jeju City, Jeju Do, 63243, South Korea
| | - Robert W Li
- United States Department of Agriculture, Agricultural Research Service, Animal Genomics and Improvement Laboratory, Beltsville, MD, 20705, USA
| |
Collapse
|
2
|
Probul N, Huang Z, Saak CC, Baumbach J, List M. AI in microbiome-related healthcare. Microb Biotechnol 2024; 17:e70027. [PMID: 39487766 PMCID: PMC11530995 DOI: 10.1111/1751-7915.70027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 09/23/2024] [Indexed: 11/04/2024] Open
Abstract
Artificial intelligence (AI) has the potential to transform clinical practice and healthcare. Following impressive advancements in fields such as computer vision and medical imaging, AI is poised to drive changes in microbiome-based healthcare while facing challenges specific to the field. This review describes the state-of-the-art use of AI in microbiome-related healthcare. It points out limitations across topics such as data handling, AI modelling and safeguarding patient privacy. Furthermore, we indicate how these current shortcomings could be overcome in the future and discuss the influence and opportunities of increasingly complex data on microbiome-based healthcare.
Collapse
Affiliation(s)
- Niklas Probul
- Institute for Computational Systems BiologyUniversity of HamburgHamburgGermany
| | - Zihua Huang
- Data Science in Systems Biology, TUM School of Life SciencesTechnical University of MunichFreisingGermany
| | | | - Jan Baumbach
- Institute for Computational Systems BiologyUniversity of HamburgHamburgGermany
- Computational Biomedicine Lab, Department of Mathematics and Computer ScienceUniversity of Southern DenmarkOdenseDenmark
| | - Markus List
- Data Science in Systems Biology, TUM School of Life SciencesTechnical University of MunichFreisingGermany
- Munich Data Science InstituteTechnical University of MunichGarchingGermany
| |
Collapse
|
3
|
Kang M, Kim DK, Le VV, Ko SR, Lee JJ, Choi IC, Shin Y, Kim K, Ahn CY. Microcystis abundance is predictable through ambient bacterial communities: A data-oriented approach. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 368:122128. [PMID: 39126846 DOI: 10.1016/j.jenvman.2024.122128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 08/03/2024] [Accepted: 08/05/2024] [Indexed: 08/12/2024]
Abstract
The number of cyanobacterial harmful algal blooms (cyanoHABs) has increased, leading to the widespread development of prediction models for cyanoHABs. Although bacteria interact closely with cyanobacteria and directly affect cyanoHABs occurrence, related modeling studies have rarely utilized microbial community data compared to environmental data such as water quality. In this study, we built a machine learning model, the multilayer perceptron (MLP), for the prediction of Microcystis dynamics using both bacterial community and weekly water quality data from the Daechung Reservoir and Nakdong River, South Korea. The modeling performance, indicated by the R2 value, improved to 0.97 in the model combining bacterial community data with environmental factors, compared to 0.78 in the model using only environmental factors. This underscores the importance of microbial communities in cyanoHABs prediction. Through the post-hoc analysis of the MLP models, we revealed that nitrogen sources played a more critical role than phosphorus sources in Microcystis blooms, whereas the bacterial amplicon sequence variants did not have significant differences in their contribution to each other. Similar to the MLP model results, bacterial data also had higher predictability in multiple linear regression (MLR) than environmental data. In both the MLP and MLR models, Microscillaceae showed the strongest association with Microcystis. This modeling approach provides a better understanding of the interactions between bacteria and cyanoHABs, facilitating the development of more accurate and reliable models for cyanoHABs prediction using ambient bacterial data.
Collapse
Affiliation(s)
- Mingyeong Kang
- Cell Factory Research Center, Korea Research Institute of Bioscience and Biotechnology, 125 Gwahak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea; Department of Environmental Biotechnology, KRIBB School of Biotechnology, Korea University of Science and Technology (UST), 217 Gajeong-ro, Yuseong-gu, Daejeon, 34113, Republic of Korea
| | - Dong-Kyun Kim
- K-water Research Institute, 169 Yuseong-daero, Yuseong-gu, Daejeon, 34045, Republic of Korea
| | - Ve Van Le
- Cell Factory Research Center, Korea Research Institute of Bioscience and Biotechnology, 125 Gwahak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea; Department of Environmental Biotechnology, KRIBB School of Biotechnology, Korea University of Science and Technology (UST), 217 Gajeong-ro, Yuseong-gu, Daejeon, 34113, Republic of Korea
| | - So-Ra Ko
- Cell Factory Research Center, Korea Research Institute of Bioscience and Biotechnology, 125 Gwahak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Jay Jung Lee
- Geum River Environment Research Center, National Institute of Environmental Research, Chungbuk, 29027, Republic of Korea
| | - In-Chan Choi
- Geum River Environment Research Center, National Institute of Environmental Research, Chungbuk, 29027, Republic of Korea
| | - Yuna Shin
- Water Quality Assessment Research Division, National Institute of Environmental Research, Incheon, 22689, Republic of Korea
| | - Kyunghyun Kim
- Water Quality Assessment Research Division, National Institute of Environmental Research, Incheon, 22689, Republic of Korea
| | - Chi-Yong Ahn
- Cell Factory Research Center, Korea Research Institute of Bioscience and Biotechnology, 125 Gwahak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea; Department of Environmental Biotechnology, KRIBB School of Biotechnology, Korea University of Science and Technology (UST), 217 Gajeong-ro, Yuseong-gu, Daejeon, 34113, Republic of Korea.
| |
Collapse
|
4
|
Raajaraam L, Raman K. Modeling Microbial Communities: Perspective and Challenges. ACS Synth Biol 2024; 13:2260-2270. [PMID: 39148432 DOI: 10.1021/acssynbio.4c00116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Microbial communities are immensely important due to their widespread presence and profound impact on various facets of life. Understanding these complex systems necessitates mathematical modeling, a powerful tool for simulating and predicting microbial community behavior. This review offers a critical analysis of metabolic modeling and highlights key areas that would greatly benefit from broader discussion and collaboration. Moreover, we explore the challenges and opportunities linked to the intricate nature of these communities, spanning data generation, modeling, and validation. We are confident that ongoing advancements in modeling techniques, such as machine learning, coupled with interdisciplinary collaborations, will unlock the full potential of microbial communities across diverse applications.
Collapse
Affiliation(s)
- Lavanya Raajaraam
- Bhupat and Jyoti Mehta School of Biosciences, Department of Biotechnology, Indian Institute of Technology (IIT) Madras, Chennai 600 036, India
- Centre for Integrative Biology and Systems mEdicine, IIT Madras, Chennai 600 036, India
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai 600 036, India
| | - Karthik Raman
- Bhupat and Jyoti Mehta School of Biosciences, Department of Biotechnology, Indian Institute of Technology (IIT) Madras, Chennai 600 036, India
- Centre for Integrative Biology and Systems mEdicine, IIT Madras, Chennai 600 036, India
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai 600 036, India
- Department of Data Science and AI, Wadhwani School of Data Science and Artificial Intelligence, IIT Madras, Chennai 600 036, India
| |
Collapse
|
5
|
Lac L, Leung CK, Hu P. Computational frameworks integrating deep learning and statistical models in mining multimodal omics data. J Biomed Inform 2024; 152:104629. [PMID: 38552994 DOI: 10.1016/j.jbi.2024.104629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 02/26/2024] [Accepted: 03/25/2024] [Indexed: 04/04/2024]
Abstract
BACKGROUND In health research, multimodal omics data analysis is widely used to address important clinical and biological questions. Traditional statistical methods rely on the strong assumptions of distribution. Statistical methods such as testing and differential expression are commonly used in omics analysis. Deep learning, on the other hand, is an advanced computer science technique that is powerful in mining high-dimensional omics data for prediction tasks. Recently, integrative frameworks or methods have been developed for omics studies that combine statistical models and deep learning algorithms. METHODS AND RESULTS The aim of these integrative frameworks is to combine the strengths of both statistical methods and deep learning algorithms to improve prediction accuracy while also providing interpretability and explainability. This review report discusses the current state-of-the-art integrative frameworks, their limitations, and potential future directions in survival and time-to-event longitudinal analysis, dimension reduction and clustering, regression and classification, feature selection, and causal and transfer learning.
Collapse
Affiliation(s)
- Leann Lac
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada; Department of Statistics, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Carson K Leung
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Pingzhao Hu
- Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada; Department of Biochemistry, Western University, London, Ontario, Canada; Department of Computer Science, Western University, London, Ontario, Canada; Department of Oncology, Western University, London, Ontario, Canada; Department of Epidemiology and Biostatistics, Western University, London, Ontario, Canada; The Children's Health Research Institute, Lawson Health Research Institute, London, Ontario, Canada.
| |
Collapse
|
6
|
Roy G, Prifti E, Belda E, Zucker JD. Deep learning methods in metagenomics: a review. Microb Genom 2024; 10:001231. [PMID: 38630611 PMCID: PMC11092122 DOI: 10.1099/mgen.0.001231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/27/2024] [Indexed: 04/19/2024] Open
Abstract
The ever-decreasing cost of sequencing and the growing potential applications of metagenomics have led to an unprecedented surge in data generation. One of the most prevalent applications of metagenomics is the study of microbial environments, such as the human gut. The gut microbiome plays a crucial role in human health, providing vital information for patient diagnosis and prognosis. However, analysing metagenomic data remains challenging due to several factors, including reference catalogues, sparsity and compositionality. Deep learning (DL) enables novel and promising approaches that complement state-of-the-art microbiome pipelines. DL-based methods can address almost all aspects of microbiome analysis, including novel pathogen detection, sequence classification, patient stratification and disease prediction. Beyond generating predictive models, a key aspect of these methods is also their interpretability. This article reviews DL approaches in metagenomics, including convolutional networks, autoencoders and attention-based models. These methods aggregate contextualized data and pave the way for improved patient care and a better understanding of the microbiome's key role in our health.
Collapse
Affiliation(s)
- Gaspar Roy
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
| | - Edi Prifti
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| | - Eugeni Belda
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| | - Jean-Daniel Zucker
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| |
Collapse
|
7
|
Sharma D, Lou W, Xu W. phylaGAN: data augmentation through conditional GANs and autoencoders for improving disease prediction accuracy using microbiome data. Bioinformatics 2024; 40:btae161. [PMID: 38569898 PMCID: PMC11256914 DOI: 10.1093/bioinformatics/btae161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 02/18/2024] [Accepted: 04/01/2024] [Indexed: 04/05/2024] Open
Abstract
MOTIVATION Research is improving our understanding of how the microbiome interacts with the human body and its impact on human health. Existing machine learning methods have shown great potential in discriminating healthy from diseased microbiome states. However, Machine Learning based prediction using microbiome data has challenges such as, small sample size, imbalance between cases and controls and high cost of collecting large number of samples. To address these challenges, we propose a deep learning framework phylaGAN to augment the existing datasets with generated microbiome data using a combination of conditional generative adversarial network (C-GAN) and autoencoder. Conditional generative adversarial networks train two models against each other to compute larger simulated datasets that are representative of the original dataset. Autoencoder maps the original and the generated samples onto a common subspace to make the prediction more accurate. RESULTS Extensive evaluation and predictive analysis was conducted on two datasets, T2D study and Cirrhosis study showing an improvement in mean AUC using data augmentation by 11% and 5% respectively. External validation on a cohort classifying between obese and lean subjects, with a smaller sample size provided an improvement in mean AUC close to 32% when augmented through phylaGAN as compared to using the original cohort. Our findings not only indicate that the generative adversarial networks can create samples that mimic the original data across various diversity metrics, but also highlight the potential of enhancing disease prediction through machine learning models trained on synthetic data. AVAILABILITY AND IMPLEMENTATION https://github.com/divya031090/phylaGAN.
Collapse
Affiliation(s)
- Divya Sharma
- Biostatistics Department, Princess Margaret Cancer Center, University Health Network, Toronto, ON, M5G2C4, Canada
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, M5T3M7, Canada
| | - Wendy Lou
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, M5T3M7, Canada
| | - Wei Xu
- Biostatistics Department, Princess Margaret Cancer Center, University Health Network, Toronto, ON, M5G2C4, Canada
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, M5T3M7, Canada
| |
Collapse
|
8
|
Golob JL, Oskotsky TT, Tang AS, Roldan A, Chung V, Ha CWY, Wong RJ, Flynn KJ, Parraga-Leo A, Wibrand C, Minot SS, Oskotsky B, Andreoletti G, Kosti I, Bletz J, Nelson A, Gao J, Wei Z, Chen G, Tang ZZ, Novielli P, Romano D, Pantaleo E, Amoroso N, Monaco A, Vacca M, De Angelis M, Bellotti R, Tangaro S, Kuntzleman A, Bigcraft I, Techtmann S, Bae D, Kim E, Jeon J, Joe S, Theis KR, Ng S, Lee YS, Diaz-Gimeno P, Bennett PR, MacIntyre DA, Stolovitzky G, Lynch SV, Albrecht J, Gomez-Lopez N, Romero R, Stevenson DK, Aghaeepour N, Tarca AL, Costello JC, Sirota M. Microbiome preterm birth DREAM challenge: Crowdsourcing machine learning approaches to advance preterm birth research. Cell Rep Med 2024; 5:101350. [PMID: 38134931 PMCID: PMC10829755 DOI: 10.1016/j.xcrm.2023.101350] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 09/15/2023] [Accepted: 12/01/2023] [Indexed: 12/24/2023]
Abstract
Every year, 11% of infants are born preterm with significant health consequences, with the vaginal microbiome a risk factor for preterm birth. We crowdsource models to predict (1) preterm birth (PTB; <37 weeks) or (2) early preterm birth (ePTB; <32 weeks) from 9 vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from public raw data via phylogenetic harmonization. The predictive models are validated on two independent unpublished datasets representing 331 samples from 148 pregnant individuals. The top-performing models (among 148 and 121 submissions from 318 teams) achieve area under the receiver operator characteristic (AUROC) curve scores of 0.69 and 0.87 predicting PTB and ePTB, respectively. Alpha diversity, VALENCIA community state types, and composition are important features in the top-performing models, most of which are tree-based methods. This work is a model for translation of microbiome data into clinically relevant predictive models and to better understand preterm birth.
Collapse
Affiliation(s)
- Jonathan L Golob
- Division of Infectious Disease, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA; March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA.
| | - Tomiko T Oskotsky
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA.
| | - Alice S Tang
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA
| | - Alennie Roldan
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA
| | | | - Connie W Y Ha
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Ronald J Wong
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA; March of Dimes Prematurity Research Center at Stanford University, Stanford, CA, USA
| | | | - Antonio Parraga-Leo
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, Obstetrics and Gynaecology, Universidad de Valencia, Valencia, Spain; IVIRMA Global Research Alliance, IVI Foundation, Instituto de Investigación Sanitaria La Fe (IIS La Fe), Valencia, Spain
| | - Camilla Wibrand
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA
| | - Samuel S Minot
- Data Core, Shared Resources, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Boris Oskotsky
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA
| | - Gaia Andreoletti
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA
| | - Idit Kosti
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA
| | | | | | - Jifan Gao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Zhoujingpeng Wei
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Zheng-Zheng Tang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Pierfrancesco Novielli
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy; Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
| | - Donato Romano
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy; Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
| | - Ester Pantaleo
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy; Dipartimento Interateneo di Fisica "M, Merlin", Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Nicola Amoroso
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy; Dipartimento di Farmacia - Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Alfonso Monaco
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy; Dipartimento Interateneo di Fisica "M, Merlin", Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Mirco Vacca
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Maria De Angelis
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Roberto Bellotti
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy; Dipartimento Interateneo di Fisica "M, Merlin", Università degli Studi di Bari Aldo Moro, Bari, Italy
| | - Sabina Tangaro
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, Bari, Italy; Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Bari, Italy
| | - Abigail Kuntzleman
- Department of Biological Sciences, Michigan Technological University, Houghton, MI, USA
| | - Isaac Bigcraft
- Department of Biological Sciences, Michigan Technological University, Houghton, MI, USA
| | - Stephen Techtmann
- Department of Biological Sciences, Michigan Technological University, Houghton, MI, USA
| | - Daehun Bae
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, Republic of Korea
| | - Eunyoung Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, Republic of Korea
| | - Jongbum Jeon
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, Republic of Korea
| | - Soobok Joe
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, Republic of Korea
| | - Kevin R Theis
- Department of Biochemistry, Microbiology and Immunology, Wayne State University, Detroit, MI, USA
| | - Sherrianne Ng
- Imperial College Parturition Research Group, Division of the Institute of Reproductive and Developmental Biology, Imperial College London, London, UK; March of Dimes Prematurity Research Centre at Imperial College London, London, UK
| | - Yun S Lee
- Imperial College Parturition Research Group, Division of the Institute of Reproductive and Developmental Biology, Imperial College London, London, UK; March of Dimes Prematurity Research Centre at Imperial College London, London, UK
| | - Patricia Diaz-Gimeno
- IVIRMA Global Research Alliance, IVI Foundation, Instituto de Investigación Sanitaria La Fe (IIS La Fe), Valencia, Spain
| | - Phillip R Bennett
- Imperial College Parturition Research Group, Division of the Institute of Reproductive and Developmental Biology, Imperial College London, London, UK; March of Dimes Prematurity Research Centre at Imperial College London, London, UK
| | - David A MacIntyre
- Imperial College Parturition Research Group, Division of the Institute of Reproductive and Developmental Biology, Imperial College London, London, UK; March of Dimes Prematurity Research Centre at Imperial College London, London, UK
| | - Gustavo Stolovitzky
- Center for Computational Biology and Bioinformatics, Columbia University, New York, NY, USA; Thomas J. Watson Research Center, IBM, Yorktown Heights, NY, USA; Sema4, Stamford, CT, USA
| | - Susan V Lynch
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA; Division of Gastroenterology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | | | - Nardhy Gomez-Lopez
- Department of Biochemistry, Microbiology and Immunology, Wayne State University, Detroit, MI, USA; Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI, USA
| | - Roberto Romero
- Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI, USA; Department of Obstetrics and Gynecology, University of Michigan, Ann Arbor, MI, USA; Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, USA; Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA; Detroit Medical Center, Detroit, MI, USA; Department of Obstetrics and Gynecology, Florida International University, Miami, FL, USA
| | - David K Stevenson
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA; Center for Academic Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Nima Aghaeepour
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA; Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA; Department of Biomedical Data Sciences, Stanford University School of Medicine, Stanford, CA, USA
| | - Adi L Tarca
- Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI, USA; Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA; Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI, USA; Department of Computer Science, Wayne State University College of Engineering, Detroit, MI, USA
| | - James C Costello
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Marina Sirota
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
9
|
Lyu R, Qu Y, Divaris K, Wu D. Methodological Considerations in Longitudinal Analyses of Microbiome Data: A Comprehensive Review. Genes (Basel) 2023; 15:51. [PMID: 38254941 PMCID: PMC11154524 DOI: 10.3390/genes15010051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 12/22/2023] [Accepted: 12/26/2023] [Indexed: 01/24/2024] Open
Abstract
Biological processes underlying health and disease are inherently dynamic and are best understood when characterized in a time-informed manner. In this comprehensive review, we discuss challenges inherent in time-series microbiome data analyses and compare available approaches and methods to overcome them. Appropriate handling of longitudinal microbiome data can shed light on important roles, functions, patterns, and potential interactions between large numbers of microbial taxa or genes in the context of health, disease, or interventions. We present a comprehensive review and comparison of existing microbiome time-series analysis methods, for both preprocessing and downstream analyses, including differential analysis, clustering, network inference, and trait classification. We posit that the careful selection and appropriate utilization of computational tools for longitudinal microbiome analyses can help advance our understanding of the dynamic host-microbiome relationships that underlie health-maintaining homeostases, progressions to disease-promoting dysbioses, as well as phases of physiologic development like those encountered in childhood.
Collapse
Affiliation(s)
- Ruiqi Lyu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA;
| | - Yixiang Qu
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA;
| | - Kimon Divaris
- Division of Pediatric and Public Health, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA;
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Di Wu
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA;
- Division of Oral and Craniofacial Health Sciences, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
10
|
Azhie A, Sharma D, Sheth P, Qazi-Arisar FA, Zaya R, Naghibzadeh M, Duan K, Fischer S, Patel K, Tsien C, Selzner N, Lilly L, Jaeckel E, Xu W, Bhat M. A deep learning framework for personalised dynamic diagnosis of graft fibrosis after liver transplantation: a retrospective, single Canadian centre, longitudinal study. Lancet Digit Health 2023; 5:e458-e466. [PMID: 37210229 DOI: 10.1016/s2589-7500(23)00068-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 03/13/2023] [Accepted: 03/23/2023] [Indexed: 05/22/2023]
Abstract
BACKGROUND Recurrent graft fibrosis after liver transplantation can threaten both graft and patient survival. Therefore, early detection of fibrosis is essential to avoid disease progression and the need for retransplantation. Non-invasive blood-based biomarkers of fibrosis are limited by moderate accuracy and high cost. We aimed to evaluate the accuracy of machine learning algorithms in detecting graft fibrosis using longitudinal clinical and laboratory data. METHODS In this retrospective, longitudinal study, we trained machine learning algorithms, including our novel weighted long short-term memory (LSTM) model, to predict the risk of significant fibrosis using follow-up data from 1893 adults who had a liver transplantation between Feb 1, 1987, and Dec 30, 2019, with at least one liver biopsy post transplantation. Liver biopsy samples with indefinitive fibrosis stage and those from patients with multiple transplantations were excluded. Longitudinal clinical variables were collected from transplantation to the date of last available liver biopsy. Deep learning models were trained on 70% of the patients as the training set and 30% of the patients as the test set. The algorithms were also separately tested on longitudinal data from patients in a subgroup of patients (n=149) who had transient elastography within 1 year before or after the date of liver biopsy. Weighted LSTM model performance for diagnosing significant fibrosis was compared against LSTM, other deep learning models (recurrent neural network and temporal convolutional network), and machine learning models (Random Forest, Support vector machines, Logistic regression, Lasso regression, and Ridge regression) and aspartate aminotransferase-to-platelet ratio index (APRI), fibrosis-4 index (FIB-4), and transient elastography. FINDINGS 1893 people who had a liver transplantation (1261 [67%] men and 632 [33%] women) with at least one liver biopsy between Jan 1, 1992, and June 30, 2020, were included in the study (591 [31%] cases and 1302 [69%] controls). The median age at liver transplantation was 53·7 years (IQR 47·3-59·0) for cases and 55·3 years (48·0 to 61·2) for controls. The median time interval between transplant and liver biopsy was 21 months (5 to 71). The weighted LSTM model (area under the curve 0·798 [95% CI 0·790 to 0·810]) consistently outperformed other methods, including unweighted LSTM (0·761 [0·750 to 0·769]; p=0·031) Recurrent Neural Network (0·736 [0·721 to 0·744]), Temporal Convolutional Networks (0·700 [0·662 to 0·747], and Random Forest 0·679 [0·652 to 0·707]), FIB-4 (0·650 [0·636 to 0·663]) and APRI (0·682 [0·671 to 0·694]) when diagnosing F2 or worse stage fibrosis. In a subgroup of patients with transient elastography results, weighted LSTM was not significantly better at detecting fibrosis (≥F2; 0·705 [0·687 to 0·724]) than transient elastography (0·685 [0·662 to 0·704]). The top ten variables predictive for significant fibrosis were recipient age, primary indication for transplantation, donor age, and longitudinal data for creatinine, alanine aminotransferase, aspartate aminotransferase, total bilirubin, platelets, white blood cell count, and weight. INTERPRETATION Deep learning algorithms, particularly weighted LSTM, outperform other routinely used non-invasive modalities and could help with the earlier diagnosis of graft fibrosis using longitudinal clinical and laboratory variables. The list of most important predictive variables for the development of fibrosis will enable clinicians to modify their management accordingly to prevent onset of graft cirrhosis. FUNDING Canadian Institute of Health Research, American Society of Transplantation, Toronto General and Western Hospital Foundation, and Paladin Labs.
Collapse
Affiliation(s)
- Amirhossein Azhie
- Ajmera Transplant Program, University Health Network, Toronto, ON, Canada
| | - Divya Sharma
- Department of Biostatistics, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Priya Sheth
- Ajmera Transplant Program, University Health Network, Toronto, ON, Canada
| | - Fakhar Ali Qazi-Arisar
- Ajmera Transplant Program, University Health Network, Toronto, ON, Canada; Division of Gastroenterology, Department of Medicine, University of Toronto, Toronto, ON, Canada; National Institute of Liver & Gastrointestinal Diseases, Dow University of Health Sciences, Karachi, Pakistan
| | - Rita Zaya
- Ajmera Transplant Program, University Health Network, Toronto, ON, Canada
| | - Maryam Naghibzadeh
- Ajmera Transplant Program, University Health Network, Toronto, ON, Canada
| | - Kai Duan
- Department of Pathology, University Health Network, Toronto, ON, Canada
| | - Sandra Fischer
- Department of Pathology, University Health Network, Toronto, ON, Canada
| | - Keyur Patel
- Toronto Centre for Liver Disease, University Health Network, Toronto, ON, Canada; Division of Gastroenterology, Department of Medicine, University of Toronto, Toronto, ON, Canada
| | - Cynthia Tsien
- Ajmera Transplant Program, University Health Network, Toronto, ON, Canada; Division of Gastroenterology, Department of Medicine, University of Toronto, Toronto, ON, Canada
| | - Nazia Selzner
- Ajmera Transplant Program, University Health Network, Toronto, ON, Canada; Division of Gastroenterology, Department of Medicine, University of Toronto, Toronto, ON, Canada
| | - Leslie Lilly
- Ajmera Transplant Program, University Health Network, Toronto, ON, Canada; Division of Gastroenterology, Department of Medicine, University of Toronto, Toronto, ON, Canada
| | - Elmar Jaeckel
- Ajmera Transplant Program, University Health Network, Toronto, ON, Canada; Division of Gastroenterology, Department of Medicine, University of Toronto, Toronto, ON, Canada
| | - Wei Xu
- Department of Biostatistics, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada; Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Mamatha Bhat
- Ajmera Transplant Program, University Health Network, Toronto, ON, Canada; Division of Gastroenterology, Department of Medicine, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
11
|
Fung DLX, Li X, Leung CK, Hu P. A self-knowledge distillation-driven CNN-LSTM model for predicting disease outcomes using longitudinal microbiome data. BIOINFORMATICS ADVANCES 2023; 3:vbad059. [PMID: 37228387 PMCID: PMC10203376 DOI: 10.1093/bioadv/vbad059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 04/03/2023] [Accepted: 05/01/2023] [Indexed: 05/27/2023]
Abstract
Motivation Human microbiome is complex and highly dynamic in nature. Dynamic patterns of the microbiome can capture more information than single point inference as it contains the temporal changes information. However, dynamic information of the human microbiome can be hard to be captured due to the complexity of obtaining the longitudinal data with a large volume of missing data that in conjunction with heterogeneity may provide a challenge for the data analysis. Results We propose using an efficient hybrid deep learning architecture convolutional neural network-long short-term memory, which combines with self-knowledge distillation to create highly accurate models to analyze the longitudinal microbiome profiles to predict disease outcomes. Using our proposed models, we analyzed the datasets from Predicting Response to Standardized Pediatric Colitis Therapy (PROTECT) study and DIABIMMUNE study. We showed the significant improvement in the area under the receiver operating characteristic curve scores, achieving 0.889 and 0.798 on PROTECT study and DIABIMMUNE study, respectively, compared with state-of-the-art temporal deep learning models. Our findings provide an effective artificial intelligence-based tool to predict disease outcomes using longitudinal microbiome profiles from collected patients. Availability and implementation The data and source code can be accessed at https://github.com/darylfung96/UC-disease-TL.
Collapse
Affiliation(s)
- Daryl L X Fung
- Department of Computer Science, University of Manitoba, Winnipeg, MB R3T 2N2, Canada
| | - Xu Li
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5T 3M7, Canada
| | - Carson K Leung
- Department of Computer Science, University of Manitoba, Winnipeg, MB R3T 2N2, Canada
| | | |
Collapse
|
12
|
Golob JL, Oskotsky TT, Tang AS, Roldan A, Chung V, Ha CWY, Wong RJ, Flynn KJ, Parraga-Leo A, Wibrand C, Minot SS, Andreoletti G, Kosti I, Bletz J, Nelson A, Gao J, Wei Z, Chen G, Tang ZZ, Novielli P, Romano D, Pantaleo E, Amoroso N, Monaco A, Vacca M, De Angelis M, Bellotti R, Tangaro S, Kuntzleman A, Bigcraft I, Techtmann S, Bae D, Kim E, Jeon J, Joe S, Theis KR, Ng S, Lee Li YS, Diaz-Gimeno P, Bennett PR, MacIntyre DA, Stolovitzky G, Lynch SV, Albrecht J, Gomez-Lopez N, Romero R, Stevenson DK, Aghaeepour N, Tarca AL, Costello JC, Sirota M. Microbiome Preterm Birth DREAM Challenge: Crowdsourcing Machine Learning Approaches to Advance Preterm Birth Research. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.03.07.23286920. [PMID: 36945505 PMCID: PMC10029035 DOI: 10.1101/2023.03.07.23286920] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/23/2023]
Abstract
Globally, every year about 11% of infants are born preterm, defined as a birth prior to 37 weeks of gestation, with significant and lingering health consequences. Multiple studies have related the vaginal microbiome to preterm birth. We present a crowdsourcing approach to predict: (a) preterm or (b) early preterm birth from 9 publicly available vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from raw sequences via an open-source tool, MaLiAmPi. We validated the crowdsourced models on novel datasets representing 331 samples from 148 pregnant individuals. From 318 DREAM challenge participants we received 148 and 121 submissions for our two separate prediction sub-challenges with top-ranking submissions achieving bootstrapped AUROC scores of 0.69 and 0.87, respectively. Alpha diversity, VALENCIA community state types, and composition (via phylotype relative abundance) were important features in the top performing models, most of which were tree based methods. This work serves as the foundation for subsequent efforts to translate predictive tests into clinical practice, and to better understand and prevent preterm birth.
Collapse
Affiliation(s)
- Jonathan L Golob
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
| | - Tomiko T Oskotsky
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
| | - Alice S Tang
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
| | - Alennie Roldan
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
| | | | - Connie W Y Ha
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
| | - Ronald J Wong
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
| | | | - Antonio Parraga-Leo
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
| | - Camilla Wibrand
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
| | - Samuel S Minot
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
| | - Gaia Andreoletti
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
| | - Idit Kosti
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
| | | | | | - Jifan Gao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Zhoujingpeng Wei
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Zheng-Zheng Tang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Pierfrancesco Novielli
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Donato Romano
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Ester Pantaleo
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
| | - Nicola Amoroso
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
| | - Alfonso Monaco
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Mirco Vacca
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Maria De Angelis
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Roberto Bellotti
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Sabina Tangaro
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Abigail Kuntzleman
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
| | - Isaac Bigcraft
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Stephen Techtmann
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Daehun Bae
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
| | - Eunyoung Kim
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | | | - Soobok Joe
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Kevin R Theis
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
| | - Sherrianne Ng
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
| | - Yun S Lee Li
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Patricia Diaz-Gimeno
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Phillip R Bennett
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - David A MacIntyre
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Gustavo Stolovitzky
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Susan V Lynch
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
| | | | - Nardhy Gomez-Lopez
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Roberto Romero
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - David K Stevenson
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
| | - Nima Aghaeepour
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
| | - Adi L Tarca
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - James C Costello
- Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
- Sage Bionetworks, Seattle, WA. USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California, San Francisco, CA. USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA. USA
- March of Dimes Prematurity Research Center at Stanford University, Stanford, CA USA
- Data Core, Shared Resources, Fred Hutchinson Cancer Center. Seattle, WA. USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI. USA
| | - Marina Sirota
- March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA. USA
- Department of Pediatrics. University of California San Francisco, San Francisco, CA. USA
| |
Collapse
|
13
|
Aldirawi H, Morales FG. Univariate and Multivariate Statistical Analysis of Microbiome Data: An Overview. Appl Microbiol 2023. [DOI: 10.3390/applmicrobiol3020023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
Abstract
Microbiome data is high dimensional, sparse, compositional, and over-dispersed. Therefore, modeling microbiome data is very challenging and it is an active research area. Microbiome analysis has become a progressing area of research as microorganisms constitute a large part of life. Since many methods of microbiome data analysis have been presented, this review summarizes the challenges, methods used, and the advantages and disadvantages of those methods, to serve as an updated guide for those in the field. This review also compared different methods of analysis to progress the development of newer methods.
Collapse
|
14
|
Shtossel O, Isakov H, Turjeman S, Koren O, Louzoun Y. Ordering taxa in image convolution networks improves microbiome-based machine learning accuracy. Gut Microbes 2023; 15:2224474. [PMID: 37345233 PMCID: PMC10288916 DOI: 10.1080/19490976.2023.2224474] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 06/08/2023] [Indexed: 06/23/2023] Open
Abstract
The human gut microbiome is associated with a large number of disease etiologies. As such, it is a natural candidate for machine-learning-based biomarker development for multiple diseases and conditions. The microbiome is often analyzed using 16S rRNA gene sequencing or shotgun metagenomics. However, several properties of microbial sequence-based studies hinder machine learning (ML), including non-uniform representation, a small number of samples compared with the dimension of each sample, and sparsity of the data, with the majority of taxa present in a small subset of samples. We show here using a graph representation that the cladogram structure is as informative as the taxa frequency. We then suggest a novel method to combine information from different taxa and improve data representation for ML using microbial taxonomy. iMic (image microbiome) translates the microbiome to images through an iterative ordering scheme, and applies convolutional neural networks to the resulting image. We show that iMic has a higher precision in static microbiome gene sequence-based ML than state-of-the-art methods. iMic also facilitates the interpretation of the classifiers through an explainable artificial intelligence (AI) algorithm to iMic to detect taxa relevant to each condition. iMic is then extended to dynamic microbiome samples by translating them to movies.
Collapse
Affiliation(s)
- Oshrit Shtossel
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| | - Haim Isakov
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| | - Sondra Turjeman
- The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
| | - Omry Koren
- The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel
| | - Yoram Louzoun
- Department of Mathematics, Bar-Ilan University, Ramat Gan, Israel
| |
Collapse
|
15
|
Acharjee A, Singh U, Choudhury SP, Gkoutos GV. The diagnostic potential and barriers of microbiome based therapeutics. Diagnosis (Berl) 2022; 9:411-420. [PMID: 36000189 DOI: 10.1515/dx-2022-0052] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Accepted: 08/03/2022] [Indexed: 02/07/2023]
Abstract
High throughput technological innovations in the past decade have accelerated research into the trillions of commensal microbes in the gut. The 'omics' technologies used for microbiome analysis are constantly evolving, and large-scale datasets are being produced. Despite of the fact that much of the research is still in its early stages, specific microbial signatures have been associated with the promotion of cancer, as well as other diseases such as inflammatory bowel disease, neurogenerative diareses etc. It has been also reported that the diversity of the gut microbiome influences the safety and efficacy of medicines. The availability and declining sequencing costs has rendered the employment of RNA-based diagnostics more common in the microbiome field necessitating improved data-analytical techniques so as to fully exploit all the resulting rich biological datasets, while accounting for their unique characteristics, such as their compositional nature as well their heterogeneity and sparsity. As a result, the gut microbiome is increasingly being demonstrating as an important component of personalised medicine since it not only plays a role in inter-individual variability in health and disease, but it also represents a potentially modifiable entity or feature that may be addressed by treatments in a personalised way. In this context, machine learning and artificial intelligence-based methods may be able to unveil new insights into biomedical analyses through the generation of models that may be used to predict category labels, and continuous values. Furthermore, diagnostic aspects will add value in the identification of the non invasive markers in the critical diseases like cancer.
Collapse
Affiliation(s)
- Animesh Acharjee
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK.,Institute of Translational Medicine, University of Birmingham, Birmingham, UK.,NIHR Surgical Reconstruction and Microbiology Research Centre, University Hospital Birmingham, Birmingham, UK.,MRC Health Data Research UK (HDR UK), Birmingham, UK
| | - Utpreksha Singh
- Department of Health and Life Sciences, Coventry University, Coventry, UK
| | | | - Georgios V Gkoutos
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK.,Institute of Translational Medicine, University of Birmingham, Birmingham, UK.,NIHR Surgical Reconstruction and Microbiology Research Centre, University Hospital Birmingham, Birmingham, UK.,MRC Health Data Research UK (HDR UK), Birmingham, UK.,NIHR Experimental Cancer Medicine Centre, Birmingham, UK
| |
Collapse
|
16
|
Hernández Medina R, Kutuzova S, Nielsen KN, Johansen J, Hansen LH, Nielsen M, Rasmussen S. Machine learning and deep learning applications in microbiome research. ISME COMMUNICATIONS 2022; 2:98. [PMID: 37938690 PMCID: PMC9723725 DOI: 10.1038/s43705-022-00182-9] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 09/12/2022] [Accepted: 09/16/2022] [Indexed: 05/27/2023]
Abstract
The many microbial communities around us form interactive and dynamic ecosystems called microbiomes. Though concealed from the naked eye, microbiomes govern and influence macroscopic systems including human health, plant resilience, and biogeochemical cycling. Such feats have attracted interest from the scientific community, which has recently turned to machine learning and deep learning methods to interrogate the microbiome and elucidate the relationships between its composition and function. Here, we provide an overview of how the latest microbiome studies harness the inductive prowess of artificial intelligence methods. We start by highlighting that microbiome data - being compositional, sparse, and high-dimensional - necessitates special treatment. We then introduce traditional and novel methods and discuss their strengths and applications. Finally, we discuss the outlook of machine and deep learning pipelines, focusing on bottlenecks and considerations to address them.
Collapse
Affiliation(s)
- Ricardo Hernández Medina
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
| | - Svetlana Kutuzova
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
- Department of Computer Science, University of Copenhagen, DK-2100, Copenhagen Ø, Denmark
| | - Knud Nor Nielsen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
- Department of Plant and Environmental Sciences, University of Copenhagen, DK-1871, Frederiksberg, Denmark
| | - Joachim Johansen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark
| | - Lars Hestbjerg Hansen
- Department of Plant and Environmental Sciences, University of Copenhagen, DK-1871, Frederiksberg, Denmark
| | - Mads Nielsen
- Department of Computer Science, University of Copenhagen, DK-2100, Copenhagen Ø, Denmark.
| | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen N, Denmark.
| |
Collapse
|
17
|
Zeng W, Gautam A, Huson DH. DeepToA: An Ensemble Deep-Learning Approach to Predicting the Theater of Activity of a Microbiome. Bioinformatics 2022; 38:4670-4676. [PMID: 36029249 DOI: 10.1093/bioinformatics/btac584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 07/19/2022] [Accepted: 08/26/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Metagenomics is the study of microbiomes using DNA sequencing. A microbiome consists of an assemblage of microbes that is associated with a "theater of activity" (ToA). An important question is, to what degree does the taxonomic and functional content of the former depend on the (details of the) latter? Here we investigate a related technical question: Given a taxonomic and/or functional profile estimated from metagenomic sequencing data, how to predict the associated ToA? We present a deep-learning approach to this question. We use both taxonomic and functional profiles as input. We apply node2vec to embed hierarchical taxonomic profiles into numerical vectors. We then perform dimension reduction using clustering, to address the sparseness of the taxonomic data and thus make the problem more amenable to deep-learning algorithms. Functional features are combined with textual descriptions of protein families or domains. We present an ensemble deep-learning framework DeepToA for predicting the "theater of activity" of amicrobial community, based on taxonomic and functional profiles. We use SHAP (SHapley Additive exPlanations) values to determine which taxonomic and functional features are important for the prediction. RESULTS Based on 7,560 metagenomic profiles downloaded from MGnify, classified into ten different theaters of activity, we demonstrate that DeepToA has an accuracy of 98.30%. We show that adding textual information to functional features increases the accuracy. AVAILABILITY Our approach is available at http://ab.inf.uni-tuebingen.de/software/deeptoa. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenhuan Zeng
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, 72076, Germany
| | - Anupam Gautam
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, 72076, Germany.,International Max Planck Research School "From Molecules to Organisms", Max Planck Institute for Biology Tübingen, Max-Planck-Ring 5, Tübingen, 72076, Germany
| | - Daniel H Huson
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, 72076, Germany.,International Max Planck Research School "From Molecules to Organisms", Max Planck Institute for Biology Tübingen, Max-Planck-Ring 5, Tübingen, 72076, Germany.,Cluster of Excellence: Controlling Microbes to Fight Infection, Tübingen, Germany
| |
Collapse
|
18
|
Morgan EW, Perdew GH, Patterson AD. Multi-Omics Strategies for Investigating the Microbiome in Toxicology Research. Toxicol Sci 2022; 187:189-213. [PMID: 35285497 PMCID: PMC9154275 DOI: 10.1093/toxsci/kfac029] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Microbial communities on and within the host contact environmental pollutants, toxic compounds, and other xenobiotic compounds. These communities of bacteria, fungi, viruses, and archaea possess diverse metabolic potential to catabolize compounds and produce new metabolites. Microbes alter chemical disposition thus making the microbiome a natural subject of interest for toxicology. Sequencing and metabolomics technologies permit the study of microbiomes altered by acute or long-term exposure to xenobiotics. These investigations have already contributed to and are helping to re-interpret traditional understandings of toxicology. The purpose of this review is to provide a survey of the current methods used to characterize microbes within the context of toxicology. This will include discussion of commonly used techniques for conducting omic-based experiments, their respective strengths and deficiencies, and how forward-looking techniques may address present shortcomings. Finally, a perspective will be provided regarding common assumptions that currently impede microbiome studies from producing causal explanations of toxicologic mechanisms.
Collapse
Affiliation(s)
- Ethan W Morgan
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Gary H Perdew
- Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Andrew D Patterson
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.,Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
19
|
Taneishi K, Tsuchiya Y. Structure-based analyses of gut microbiome-related proteins by neural networks and molecular dynamics simulations. Curr Opin Struct Biol 2022; 73:102336. [DOI: 10.1016/j.sbi.2022.102336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 11/18/2021] [Accepted: 01/14/2022] [Indexed: 11/03/2022]
|
20
|
Zha Y, Ning K. Ontology-aware neural network: a general framework for pattern mining from microbiome data. Brief Bioinform 2022; 23:bbac005. [PMID: 35091743 PMCID: PMC8921649 DOI: 10.1093/bib/bbac005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 12/30/2021] [Accepted: 01/04/2022] [Indexed: 11/23/2022] Open
Abstract
With the rapid accumulation of microbiome data around the world, numerous computational bioinformatics methods have been developed for pattern mining from such paramount microbiome data. Current microbiome data mining methods, such as gene and species mining, rely heavily on sequence comparison. Most of these methods, however, have a clear trade-off, particularly, when it comes to big-data analytical efficiency and accuracy. Microbiome entities are usually organized in ontology structures, and pattern mining methods that have considered ontology structures could offer advantages in mining efficiency and accuracy. Here, we have summarized the ontology-aware neural network (ONN) as a novel framework for microbiome data mining. We have discussed the applications of ONN in multiple contexts, including gene mining, species mining and microbial community dynamic pattern mining. We have then highlighted one of the most important characteristics of ONN, namely, novel knowledge discovery, which makes ONN a standout among all microbiome data mining methods. Finally, we have provided several applications to showcase the advantage of ONN over other methods in microbiome data mining. In summary, ONN represents a paradigm shift for pattern mining from microbiome data: from traditional machine learning approach to ontology-aware and model-based approach, which has found its broad application scenarios in microbiome data mining.
Collapse
Affiliation(s)
- Yuguo Zha
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, Center of AI Biology, College of Life Science and Technology, Huazhong University of Science and Technology, 1037 Luoyu Road Wuhan, Hubei, Wuhan 430074, China
| | - Kang Ning
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Department of Bioinformatics and Systems Biology, Center of AI Biology, College of Life Science and Technology, Huazhong University of Science and Technology, 1037 Luoyu Road Wuhan, Hubei, Wuhan 430074, China
| |
Collapse
|