1
|
Xie W, Yu J, Huang L, For LS, Zheng Z, Chen X, Wang Y, Liu Z, Peng C, Wong KC. DeepSeq2Drug: An expandable ensemble end-to-end anti-viral drug repurposing benchmark framework by multi-modal embeddings and transfer learning. Comput Biol Med 2024; 175:108487. [PMID: 38653064 DOI: 10.1016/j.compbiomed.2024.108487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 03/26/2024] [Accepted: 04/15/2024] [Indexed: 04/25/2024]
Abstract
Drug repurposing is promising in multiple scenarios, such as emerging viral outbreak controls and cost reductions of drug discovery. Traditional graph-based drug repurposing methods are limited to fast, large-scale virtual screens, as they constrain the counts for drugs and targets and fail to predict novel viruses or drugs. Moreover, though deep learning has been proposed for drug repurposing, only a few methods have been used, including a group of pre-trained deep learning models for embedding generation and transfer learning. Hence, we propose DeepSeq2Drug to tackle the shortcomings of previous methods. We leverage multi-modal embeddings and an ensemble strategy to complement the numbers of drugs and viruses and to guarantee the novel prediction. This framework (including the expanded version) involves four modal types: six NLP models, four CV models, four graph models, and two sequence models. In detail, we first make a pipeline and calculate the predictive performance of each pair of viral and drug embeddings. Then, we select the best embedding pairs and apply an ensemble strategy to conduct anti-viral drug repurposing. To validate the effect of the proposed ensemble model, a monkeypox virus (MPV) case study is conducted to reflect the potential predictive capability. This framework could be a benchmark method for further pre-trained deep learning optimization and anti-viral drug repurposing tasks. We also build software further to make the proposed model easier to reuse. The code and software are freely available at http://deepseq2drug.cs.cityu.edu.hk.
Collapse
Affiliation(s)
- Weidun Xie
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
| | - Jixiang Yu
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
| | - Lei Huang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
| | - Lek Shyuen For
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
| | - Zetian Zheng
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
| | - Xingjian Chen
- Cutaneous Biology Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Yuchen Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
| | - Zhichao Liu
- Sir William Dunn School of Pathology, University of Oxford, UK
| | - Chengbin Peng
- College of Information Science and Engineering, Ningbo University, Ningbo, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China; Shenzhen Research Institute, City University of Hong Kong, Shenzhen, China; Hong Kong Institute for Data Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China.
| |
Collapse
|
2
|
Roy G, Prifti E, Belda E, Zucker JD. Deep learning methods in metagenomics: a review. Microb Genom 2024; 10:001231. [PMID: 38630611 PMCID: PMC11092122 DOI: 10.1099/mgen.0.001231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/27/2024] [Indexed: 04/19/2024] Open
Abstract
The ever-decreasing cost of sequencing and the growing potential applications of metagenomics have led to an unprecedented surge in data generation. One of the most prevalent applications of metagenomics is the study of microbial environments, such as the human gut. The gut microbiome plays a crucial role in human health, providing vital information for patient diagnosis and prognosis. However, analysing metagenomic data remains challenging due to several factors, including reference catalogues, sparsity and compositionality. Deep learning (DL) enables novel and promising approaches that complement state-of-the-art microbiome pipelines. DL-based methods can address almost all aspects of microbiome analysis, including novel pathogen detection, sequence classification, patient stratification and disease prediction. Beyond generating predictive models, a key aspect of these methods is also their interpretability. This article reviews DL approaches in metagenomics, including convolutional networks, autoencoders and attention-based models. These methods aggregate contextualized data and pave the way for improved patient care and a better understanding of the microbiome's key role in our health.
Collapse
Affiliation(s)
- Gaspar Roy
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
| | - Edi Prifti
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| | - Eugeni Belda
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| | - Jean-Daniel Zucker
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| |
Collapse
|
3
|
Monshizadeh M, Ye Y. Incorporating metabolic activity, taxonomy and community structure to improve microbiome-based predictive models for host phenotype prediction. Gut Microbes 2024; 16:2302076. [PMID: 38214657 PMCID: PMC10793686 DOI: 10.1080/19490976.2024.2302076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 01/02/2024] [Indexed: 01/13/2024] Open
Abstract
We developed MicroKPNN, a prior-knowledge guided interpretable neural network for microbiome-based human host phenotype prediction. The prior knowledge used in MicroKPNN includes the metabolic activities of different bacterial species, phylogenetic relationships, and bacterial community structure, all in a shallow neural network. Application of MicroKPNN to seven gut microbiome datasets (involving five different human diseases including inflammatory bowel disease, type 2 diabetes, liver cirrhosis, colorectal cancer, and obesity) shows that incorporation of the prior knowledge helped improve the microbiome-based host phenotype prediction. MicroKPNN outperformed fully connected neural network-based approaches in all seven cases, with the most improvement of accuracy in the prediction of type 2 diabetes. MicroKPNN outperformed a recently developed deep-learning based approach DeepMicro, which selects the best combination of autoencoder and machine learning approach to make predictions, in all of the seven cases. Importantly, we showed that MicroKPNN provides a way for interpretation of the predictive models. Using importance scores estimated for the hidden nodes, MicroKPNN could provide explanations for prior research findings by highlighting the roles of specific microbiome components in phenotype predictions. In addition, it may suggest potential future research directions for studying the impacts of microbiome on host health and diseases. MicroKPNN is publicly available at https://github.com/mgtools/MicroKPNN.
Collapse
Affiliation(s)
- Mahsa Monshizadeh
- Computer Science Department, Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA
| | - Yuzhen Ye
- Computer Science Department, Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA
| |
Collapse
|
4
|
Xie W, Chen X, Zheng Z, Wang F, Zhu X, Lin Q, Sun Y, Wong KC. LncRNA-Top: Controlled deep learning approaches for lncRNA gene regulatory relationship annotations across different platforms. iScience 2023; 26:108197. [PMID: 37965148 PMCID: PMC10641498 DOI: 10.1016/j.isci.2023.108197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 08/10/2023] [Accepted: 10/10/2023] [Indexed: 11/16/2023] Open
Abstract
By soaking microRNAs (miRNAs), long non-coding RNAs (lncRNAs) have the potential to regulate gene expression. Few methods have been created based on this mechanism to anticipate the lncRNA-gene relationship prediction. Hence, we present lncRNA-Top to forecast potential lncRNA-gene regulation relationships. Specifically, we constructed controlled deep-learning methods using 12417 lncRNAs and 16127 genes. We have provided retrospective and innovative views among negative sampling, random seeds, cross-validation, metrics, and independent datasets. The AUC, AUPR, and our defined precision@k were leveraged to evaluate performance. In-depth case studies demonstrate that 47 out of 100 projected top unknown pairings were recorded in publications, supporting the predictive power. Our additional software can annotate the scores with target candidates. The lncRNA-Top will be a helpful tool to uncover prospective lncRNA targets and better comprehend the regulatory processes of lncRNAs.
Collapse
Affiliation(s)
- Weidun Xie
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Zetian Zheng
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Xiaowei Zhu
- Department of Neuroscience, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Qiuzhen Lin
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Yanni Sun
- Department of Electrical Engineering, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
- Shenzhen Research Institute, City University of Hong Kong, Shenzhen, China
- Hong Kong Institute for Data Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| |
Collapse
|
5
|
Deschênes T, Tohoundjona FWE, Plante PL, Di Marzo V, Raymond F. Gene-based microbiome representation enhances host phenotype classification. mSystems 2023; 8:e0053123. [PMID: 37404032 PMCID: PMC10469787 DOI: 10.1128/msystems.00531-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 05/24/2023] [Indexed: 07/06/2023] Open
Abstract
With the concomitant advances in both the microbiome and machine learning fields, the gut microbiome has become of great interest for the potential discovery of biomarkers to be used in the classification of the host health status. Shotgun metagenomics data derived from the human microbiome is composed of a high-dimensional set of microbial features. The use of such complex data for the modeling of host-microbiome interactions remains a challenge as retaining de novo content yields a highly granular set of microbial features. In this study, we compared the prediction performances of machine learning approaches according to different types of data representations derived from shotgun metagenomics. These representations include commonly used taxonomic and functional profiles and the more granular gene cluster approach. For the five case-control datasets used in this study (Type 2 diabetes, obesity, liver cirrhosis, colorectal cancer, and inflammatory bowel disease), gene-based approaches, whether used alone or in combination with reference-based data types, allowed improved or similar classification performances as the taxonomic and functional profiles. In addition, we show that using subsets of gene families from specific functional categories of genes highlight the importance of these functions on the host phenotype. This study demonstrates that both reference-free microbiome representations and curated metagenomic annotations can provide relevant representations for machine learning based on metagenomic data. IMPORTANCE Data representation is an essential part of machine learning performance when using metagenomic data. In this work, we show that different microbiome representations provide varied host phenotype classification performance depending on the dataset. In classification tasks, untargeted microbiome gene content can provide similar or improved classification compared to taxonomical profiling. Feature selection based on biological function also improves classification performance for some pathologies. Function-based feature selection combined with interpretable machine learning algorithms can generate new hypotheses that can potentially be assayed mechanistically. This work thus proposes new approaches to represent microbiome data for machine learning that can potentiate the findings associated with metagenomic data.
Collapse
Affiliation(s)
- Thomas Deschênes
- Centre Nutrition, Santé et Société (NUTRISS) – Institut sur la Nutrition et les Aliments Fonctionnels (INAF), Université Laval, Québec, Canada
- Canada Research Excellence Chair on the Microbiome-Endocannabinoidome Axis in Metabolic Health (CERC-MEND), Quebec City, Quebec, Canada
- Institut Intelligence et Données, Université Laval, Québec, Canada
| | - Fred Wilfried Elom Tohoundjona
- Centre Nutrition, Santé et Société (NUTRISS) – Institut sur la Nutrition et les Aliments Fonctionnels (INAF), Université Laval, Québec, Canada
- Canada Research Excellence Chair on the Microbiome-Endocannabinoidome Axis in Metabolic Health (CERC-MEND), Quebec City, Quebec, Canada
| | - Pier-Luc Plante
- Centre Nutrition, Santé et Société (NUTRISS) – Institut sur la Nutrition et les Aliments Fonctionnels (INAF), Université Laval, Québec, Canada
- Canada Research Excellence Chair on the Microbiome-Endocannabinoidome Axis in Metabolic Health (CERC-MEND), Quebec City, Quebec, Canada
- Institut Intelligence et Données, Université Laval, Québec, Canada
| | - Vincenzo Di Marzo
- Centre Nutrition, Santé et Société (NUTRISS) – Institut sur la Nutrition et les Aliments Fonctionnels (INAF), Université Laval, Québec, Canada
- Canada Research Excellence Chair on the Microbiome-Endocannabinoidome Axis in Metabolic Health (CERC-MEND), Quebec City, Quebec, Canada
- École de nutrition, Faculté des sciences de l’agriculture et de l’alimentation (FSAA), Université Laval, Québec, Canada
- Centre de recherche de l’Institut universitaire de cardiologie et de pneumologie de Québec (IUCPQ), Québec, Canada
- Département de médecine, Faculté de Médecine, Université Laval, Québec, Canada
- Joint International Unit on Chemical and Biomolecular Research on the Microbiome and its Impact on Metabolic Health and Nutrition (UMI-MicroMeNu), Quebec City, Canada
| | - Frédéric Raymond
- Centre Nutrition, Santé et Société (NUTRISS) – Institut sur la Nutrition et les Aliments Fonctionnels (INAF), Université Laval, Québec, Canada
- Canada Research Excellence Chair on the Microbiome-Endocannabinoidome Axis in Metabolic Health (CERC-MEND), Quebec City, Quebec, Canada
- Institut Intelligence et Données, Université Laval, Québec, Canada
- École de nutrition, Faculté des sciences de l’agriculture et de l’alimentation (FSAA), Université Laval, Québec, Canada
| |
Collapse
|
6
|
Li B, Wang T, Qian M, Wang S. MKMR: a multi-kernel machine regression model to predict health outcomes using human microbiome data. Brief Bioinform 2023; 24:7142722. [PMID: 37099694 DOI: 10.1093/bib/bbad158] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 03/24/2023] [Accepted: 04/03/2023] [Indexed: 04/28/2023] Open
Abstract
Studies have found that human microbiome is associated with and predictive of human health and diseases. Many statistical methods developed for microbiome data focus on different distance metrics that can capture various information in microbiomes. Prediction models were also developed for microbiome data, including deep learning methods with convolutional neural networks that consider both taxa abundance profiles and taxonomic relationships among microbial taxa from a phylogenetic tree. Studies have also suggested that a health outcome could associate with multiple forms of microbiome profiles. In addition to the abundance of some taxa that are associated with a health outcome, the presence/absence of some taxa is also associated with and predictive of the same health outcome. Moreover, associated taxa may be close to each other on a phylogenetic tree or spread apart on a phylogenetic tree. No prediction models currently exist that use multiple forms of microbiome-outcome associations. To address this, we propose a multi-kernel machine regression (MKMR) method that is able to capture various types of microbiome signals when doing predictions. MKMR utilizes multiple forms of microbiome signals through multiple kernels being transformed from multiple distance metrics for microbiomes and learn an optimal conic combination of these kernels, with kernel weights helping us understand contributions of individual microbiome signal types. Simulation studies suggest a much-improved prediction performance over competing methods with mixture of microbiome signals. Real data applicants to predict multiple health outcomes using throat and gut microbiome data also suggest a better prediction of MKMR than that of competing methods.
Collapse
Affiliation(s)
- Bing Li
- Department of Biostatistics, School of Public Health, Brown University, Providence, Rhode Island, U.S.A
| | - Tian Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, 722 West 168th Street, New York, New York, 10032 U.S.A
| | - Min Qian
- Department of Biostatistics, Mailman School of Public Health, Columbia University, 722 West 168th Street, New York, New York, 10032 U.S.A
| | - Shuang Wang
- Department of Biostatistics, School of Public Health, Brown University, Providence, Rhode Island, U.S.A
| |
Collapse
|
7
|
Li P, Luo H, Ji B, Nielsen J. Machine learning for data integration in human gut microbiome. Microb Cell Fact 2022; 21:241. [PMID: 36419034 PMCID: PMC9685977 DOI: 10.1186/s12934-022-01973-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 11/15/2022] [Indexed: 11/25/2022] Open
Abstract
Recent studies have demonstrated that gut microbiota plays critical roles in various human diseases. High-throughput technology has been widely applied to characterize the microbial ecosystems, which led to an explosion of different types of molecular profiling data, such as metagenomics, metatranscriptomics and metabolomics. For analysis of such data, machine learning algorithms have shown to be useful for identifying key molecular signatures, discovering potential patient stratifications, and particularly for generating models that can accurately predict phenotypes. In this review, we first discuss how dysbiosis of the intestinal microbiota is linked to human disease development and how potential modulation strategies of the gut microbial ecosystem can be used for disease treatment. In addition, we introduce categories and workflows of different machine learning approaches, and how they can be used to perform integrative analysis of multi-omics data. Finally, we review advances of machine learning in gut microbiome applications and discuss related challenges. Based on this we conclude that machine learning is very well suited for analysis of gut microbiome and that these approaches can be useful for development of gut microbe-targeted therapies, which ultimately can help in achieving personalized and precision medicine.
Collapse
Affiliation(s)
- Peishun Li
- grid.5371.00000 0001 0775 6028Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Hao Luo
- grid.5371.00000 0001 0775 6028Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Boyang Ji
- grid.5371.00000 0001 0775 6028Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden ,grid.510909.4BioInnovation Institute, Ole Maaløes Vej 3, DK2200 Copenhagen, Denmark
| | - Jens Nielsen
- grid.5371.00000 0001 0775 6028Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden ,grid.510909.4BioInnovation Institute, Ole Maaløes Vej 3, DK2200 Copenhagen, Denmark
| |
Collapse
|