1
|
Kuang H, Zhang Z, Zeng B, Liu X, Zuo H, Xu X, Wang L. A novel microbe-drug association prediction model based on graph attention networks and bilayer random forest. BMC Bioinformatics 2024; 25:78. [PMID: 38378437 PMCID: PMC10877932 DOI: 10.1186/s12859-024-05687-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 01/31/2024] [Indexed: 02/22/2024] Open
Abstract
BACKGROUND In recent years, the extensive use of drugs and antibiotics has led to increasing microbial resistance. Therefore, it becomes crucial to explore deep connections between drugs and microbes. However, traditional biological experiments are very expensive and time-consuming. Therefore, it is meaningful to develop efficient computational models to forecast potential microbe-drug associations. RESULTS In this manuscript, we proposed a novel prediction model called GARFMDA by combining graph attention networks and bilayer random forest to infer probable microbe-drug correlations. In GARFMDA, through integrating different microbe-drug-disease correlation indices, we constructed two different microbe-drug networks first. And then, based on multiple measures of similarity, we constructed a unique feature matrix for drugs and microbes respectively. Next, we fed these newly-obtained microbe-drug networks together with feature matrices into the graph attention network to extract the low-dimensional feature representations for drugs and microbes separately. Thereafter, these low-dimensional feature representations, along with the feature matrices, would be further inputted into the first layer of the Bilayer random forest model to obtain the contribution values of all features. And then, after removing features with low contribution values, these contribution values would be fed into the second layer of the Bilayer random forest to detect potential links between microbes and drugs. CONCLUSIONS Experimental results and case studies show that GARFMDA can achieve better prediction performance than state-of-the-art approaches, which means that GARFMDA may be a useful tool in the field of microbe-drug association prediction in the future. Besides, the source code of GARFMDA is available at https://github.com/KuangHaiYue/GARFMDA.git.
Collapse
Affiliation(s)
- Haiyue Kuang
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, 410022, China
| | - Zhen Zhang
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, 410022, China.
| | - Bin Zeng
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, 410022, China.
| | - Xin Liu
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, 410022, China.
| | - Hao Zuo
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, 410022, China
| | - Xingye Xu
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, 410022, China
| | - Lei Wang
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, 410022, China.
| |
Collapse
|
2
|
Zhu B, Yu HY, Du BX, Shi JY. DMGL-MDA: A dual-modal graph learning method for microbe-drug association prediction. Methods 2024; 222:51-56. [PMID: 38184219 DOI: 10.1016/j.ymeth.2023.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Revised: 12/26/2023] [Accepted: 12/28/2023] [Indexed: 01/08/2024] Open
Abstract
The interaction between human microbes and drugs can significantly impact human physiological functions. It is crucial to identify potential microbe-drug associations (MDAs) before drug administration. However, conventional biological experiments to predict MDAs are plagued by drawbacks such as time-consuming, high costs, and potential risks. On the contrary, computational approaches can speed up the screening of MDAs at a low cost. Most computational models usually use a drug similarity matrix as the initial feature representation of drugs and stack the graph neural network layers to extract the features of network nodes. However, different calculation methods result in distinct similarity matrices, and message passing in graph neural networks (GNNs) induces phenomena of over-smoothing and over-squashing, thereby impacting the performance of the model. To address these issues, we proposed a novel graph representation learning model, dual-modal graph learning for microbe-drug association prediction (DMGL-MDA). It comprises a dual-modal embedding module, a bipartite graph network embedding module, and a predictor module. To assess the performance of DMGL-MDA, we compared it against state-of-the-art methods using two benchmark datasets. Through cross-validation, we illustrated the superiority of DMGL-MDA. Furthermore, we conducted ablation experiments and case studies to validate the effective performance of the model.
Collapse
Affiliation(s)
- Bei Zhu
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China
| | - Hao-Yang Yu
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China
| | - Bing-Xue Du
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China.
| |
Collapse
|
3
|
Liang M, Liu X, Chen Q, Zeng B, Wang L. NMGMDA: a computational model for predicting potential microbe-drug associations based on minimize matrix nuclear norm and graph attention network. Sci Rep 2024; 14:650. [PMID: 38182635 PMCID: PMC10770326 DOI: 10.1038/s41598-023-50793-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 12/26/2023] [Indexed: 01/07/2024] Open
Abstract
The prediction of potential microbe-drug associations is of great value for drug research and development, especially, methods, based on deep learning, have been achieved significant improvement in bio-medicine. In this manuscript, we proposed a novel computational model named NMGMDA based on the nuclear norm minimization and graph attention network to infer latent microbe-drug associations. Firstly, we created a heterogeneous microbe-drug network in NMGMDA by fusing the drug and microbe similarities with the established drug-microbe associations. After this, by using GAT and NNM to calculate the predict scores. Lastly, we created a fivefold cross validation framework to assess the new model NMGMDA's progressiveness. According to the simulation results, NMGMDA outperforms some of the most advanced methods, with a reliable AUC of 0.9946 on both MDAD and aBioflm databases. Furthermore, case studies on Ciprofloxacin, Moxifoxacin, HIV-1 and Mycobacterium tuberculosis were carried out in order to assess the effectiveness of NMGMDA even more. The experimental results demonstrated that, following the removal of known correlations from the database, 16 and 14 medications as well as 19 and 17 microbes in the top 20 predictions were validated by pertinent literature. This demonstrates the potential of our new model, NMGMDA, to reach acceptable prediction performance.
Collapse
Affiliation(s)
- Mingmin Liang
- School of Information Engineering, Hunan Vocational College of Electronic and Technology, Changsha, 410000, China
| | - Xianzhi Liu
- School of Information Engineering, Hunan Vocational College of Electronic and Technology, Changsha, 410000, China
| | - Qijia Chen
- School of Information Engineering, Hunan Vocational College of Electronic and Technology, Changsha, 410000, China.
| | - Bin Zeng
- School of Information Engineering, Hunan Vocational College of Electronic and Technology, Changsha, 410000, China.
| | - Lei Wang
- School of Information Engineering, Hunan Vocational College of Electronic and Technology, Changsha, 410000, China.
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, 410022, China.
| |
Collapse
|
4
|
Alvarez-Mamani E, Dechant R, Beltran-Castañón CA, Ibáñez AJ. Graph embedding on mass spectrometry- and sequencing-based biomedical data. BMC Bioinformatics 2024; 25:1. [PMID: 38166530 PMCID: PMC10763173 DOI: 10.1186/s12859-023-05612-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 12/11/2023] [Indexed: 01/04/2024] Open
Abstract
Graph embedding techniques are using deep learning algorithms in data analysis to solve problems of such as node classification, link prediction, community detection, and visualization. Although typically used in the context of guessing friendships in social media, several applications for graph embedding techniques in biomedical data analysis have emerged. While these approaches remain computationally demanding, several developments over the last years facilitate their application to study biomedical data and thus may help advance biological discoveries. Therefore, in this review, we discuss the principles of graph embedding techniques and explore the usefulness for understanding biological network data derived from mass spectrometry and sequencing experiments, the current workhorses of systems biology studies. In particular, we focus on recent examples for characterizing protein-protein interaction networks and predicting novel drug functions.
Collapse
Affiliation(s)
- Edwin Alvarez-Mamani
- Engineering Department, Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru
- Institute for Omics Sciences and Applied Biotechnology (ICOBA PUCP), Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru
| | - Reinhard Dechant
- Institute for Omics Sciences and Applied Biotechnology (ICOBA PUCP), Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru
- Calico Life Sciences, 1170 Veterans Blvd, San Francisco, CA, 94080, USA
| | | | - Alfredo J Ibáñez
- Institute for Omics Sciences and Applied Biotechnology (ICOBA PUCP), Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru.
- Science Department, Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru.
| |
Collapse
|
5
|
Salcedo MV, Gravel N, Keshavarzi A, Huang LC, Kochut KJ, Kannan N. Predicting protein and pathway associations for understudied dark kinases using pattern-constrained knowledge graph embedding. PeerJ 2023; 11:e15815. [PMID: 37868056 PMCID: PMC10590106 DOI: 10.7717/peerj.15815] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 07/10/2023] [Indexed: 10/24/2023] Open
Abstract
The 534 protein kinases encoded in the human genome constitute a large druggable class of proteins that include both well-studied and understudied "dark" members. Accurate prediction of dark kinase functions is a major bioinformatics challenge. Here, we employ a graph mining approach that uses the evolutionary and functional context encoded in knowledge graphs (KGs) to predict protein and pathway associations for understudied kinases. We propose a new scalable graph embedding approach, RegPattern2Vec, which employs regular pattern constrained random walks to sample diverse aspects of node context within a KG flexibly. RegPattern2Vec learns functional representations of kinases, interacting partners, post-translational modifications, pathways, cellular localization, and chemical interactions from a kinase-centric KG that integrates and conceptualizes data from curated heterogeneous data resources. By contextualizing information relevant to prediction, RegPattern2Vec improves accuracy and efficiency in comparison to other random walk-based graph embedding approaches. We show that the predictions produced by our model overlap with pathway enrichment data produced using experimentally validated Protein-Protein Interaction (PPI) data from both publicly available databases and experimental datasets not used in training. Our model also has the advantage of using the collected random walks as biological context to interpret the predicted protein-pathway associations. We provide high-confidence pathway predictions for 34 dark kinases and present three case studies in which analysis of meta-paths associated with the prediction enables biological interpretation. Overall, RegPattern2Vec efficiently samples multiple node types for link prediction on biological knowledge graphs and the predicted associations between understudied kinases, pseudokinases, and known pathways serve as a conceptual starting point for hypothesis generation and testing.
Collapse
Affiliation(s)
- Mariah V. Salcedo
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, United States of America
| | - Nathan Gravel
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States of America
| | - Abbas Keshavarzi
- School of Computing, University of Georgia, Athens, GA, United States of America
| | - Liang-Chin Huang
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States of America
| | - Krzysztof J. Kochut
- School of Computing, University of Georgia, Athens, GA, United States of America
| | - Natarajan Kannan
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, United States of America
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States of America
| |
Collapse
|
6
|
Qu J, Song Z, Cheng X, Jiang Z, Zhou J. A new integrated framework for the identification of potential virus-drug associations. Front Microbiol 2023; 14:1179414. [PMID: 37675432 PMCID: PMC10478006 DOI: 10.3389/fmicb.2023.1179414] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 07/31/2023] [Indexed: 09/08/2023] Open
Abstract
Introduction With the increasingly serious problem of antiviral drug resistance, drug repurposing offers a time-efficient and cost-effective way to find potential therapeutic agents for disease. Computational models have the ability to quickly predict potential reusable drug candidates to treat diseases. Methods In this study, two matrix decomposition-based methods, i.e., Matrix Decomposition with Heterogeneous Graph Inference (MDHGI) and Bounded Nuclear Norm Regularization (BNNR), were integrated to predict anti-viral drugs. Moreover, global leave-one-out cross-validation (LOOCV), local LOOCV, and 5-fold cross-validation were implemented to evaluate the performance of the proposed model based on datasets of DrugVirus that consist of 933 known associations between 175 drugs and 95 viruses. Results The results showed that the area under the receiver operating characteristics curve (AUC) of global LOOCV and local LOOCV are 0.9035 and 0.8786, respectively. The average AUC and the standard deviation of the 5-fold cross-validation for DrugVirus datasets are 0.8856 ± 0.0032. We further implemented cross-validation based on MDAD and aBiofilm, respectively, to evaluate the performance of the model. In particle, MDAD (aBiofilm) dataset contains 2,470 (2,884) known associations between 1,373 (1,470) drugs and 173 (140) microbes. In addition, two types of case studies were carried out further to verify the effectiveness of the model based on the DrugVirus and MDAD datasets. The results of the case studies supported the effectiveness of MHBVDA in identifying potential virus-drug associations as well as predicting potential drugs for new microbes.
Collapse
Affiliation(s)
- Jia Qu
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, Jiangsu, China
| | - Zihao Song
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, Jiangsu, China
| | - Xiaolong Cheng
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, Jiangsu, China
| | - Zhibin Jiang
- School of Computer Science and Engineering, Shaoxing University, Shaoxing, Zhejiang, China
| | - Jie Zhou
- School of Computer Science and Engineering, Shaoxing University, Shaoxing, Zhejiang, China
| |
Collapse
|
7
|
Li H, Hou ZJ, Zhang WG, Qu J, Yao HB, Chen Y. Prediction of potential drug-microbe associations based on matrix factorization and a three-layer heterogeneous network. Comput Biol Chem 2023; 104:107857. [PMID: 37018909 DOI: 10.1016/j.compbiolchem.2023.107857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 02/27/2023] [Accepted: 03/28/2023] [Indexed: 04/03/2023]
Abstract
Microbes in the human body are closely linked to many complex human diseases and are emerging as new drug targets. These microbes play a crucial role in drug development and disease treatment. Traditional methods of biological experiments are not only time-consuming but also costly. Using computational methods to predict microbe-drug associations can effectively complement biological experiments. In this experiment, we constructed heterogeneity networks for drugs, microbes, and diseases using multiple biomedical data sources. Then, we developed a model with matrix factorization and a three-layer heterogeneous network (MFTLHNMDA) to predict potential drug-microbe associations. The probability of microbe-drug association was obtained by a global network-based update algorithm. Finally, the performance of MFTLHNMDA was evaluated in the framework of leave-one-out cross-validation (LOOCV) and 5-fold cross-validation (5-fold CV). The results showed that our model performed better than six state-of-the-art methods that had AUC of 0.9396 and 0.9385 + /- 0.0000, respectively. This case study further confirms the effectiveness of MFTLHNMDA in identifying potential drug-microbe associations and new drug-microbe associations.
Collapse
|
8
|
Tian Z, Yu Y, Fang H, Xie W, Guo M. Predicting microbe-drug associations with structure-enhanced contrastive learning and self-paced negative sampling strategy. Brief Bioinform 2023; 24:7009077. [PMID: 36715986 DOI: 10.1093/bib/bbac634] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 12/19/2022] [Accepted: 12/29/2022] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Predicting the associations between human microbes and drugs (MDAs) is one critical step in drug development and precision medicine areas. Since discovering these associations through wet experiments is time-consuming and labor-intensive, computational methods have already been an effective way to tackle this problem. Recently, graph contrastive learning (GCL) approaches have shown great advantages in learning the embeddings of nodes from heterogeneous biological graphs (HBGs). However, most GCL-based approaches don't fully capture the rich structure information in HBGs. Besides, fewer MDA prediction methods could screen out the most informative negative samples for effectively training the classifier. Therefore, it still needs to improve the accuracy of MDA predictions. RESULTS In this study, we propose a novel approach that employs the Structure-enhanced Contrastive learning and Self-paced negative sampling strategy for Microbe-Drug Association predictions (SCSMDA). Firstly, SCSMDA constructs the similarity networks of microbes and drugs, as well as their different meta-path-induced networks. Then SCSMDA employs the representations of microbes and drugs learned from meta-path-induced networks to enhance their embeddings learned from the similarity networks by the contrastive learning strategy. After that, we adopt the self-paced negative sampling strategy to select the most informative negative samples to train the MLP classifier. Lastly, SCSMDA predicts the potential microbe-drug associations with the trained MLP classifier. The embeddings of microbes and drugs learning from the similarity networks are enhanced with the contrastive learning strategy, which could obtain their discriminative representations. Extensive results on three public datasets indicate that SCSMDA significantly outperforms other baseline methods on the MDA prediction task. Case studies for two common drugs could further demonstrate the effectiveness of SCSMDA in finding novel MDA associations. AVAILABILITY The source code is publicly available on GitHub https://github.com/Yue-Yuu/SCSMDA-master.
Collapse
Affiliation(s)
- Zhen Tian
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Yue Yu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Haichuan Fang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Weixin Xie
- Institute of Intelligent System and Bioinformatics, College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, 150000, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, 100044, Beijing, China
| |
Collapse
|
9
|
Shokri Garjan H, Omidi Y, Poursheikhali Asghari M, Ferdousi R. In-silico computational approaches to study microbiota impacts on diseases and pharmacotherapy. Gut Pathog 2023; 15:10. [PMID: 36882861 PMCID: PMC9990230 DOI: 10.1186/s13099-023-00535-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 02/21/2023] [Indexed: 03/09/2023] Open
Abstract
Microorganisms have been linked to a variety of critical human disease, thanks to advances in sequencing technology and microbiology. The growing recognition of human microbe-disease relationships provides crucial insights into the underlying disease process from the perspective of pathogens, which is extremely useful for pathogenesis research, early diagnosis, and precision medicine and therapy. Microbe-based analysis in terms of diseases and related drug discovery can predict new connections/mechanisms and provide new concepts. These phenomena have been studied via various in-silico computational approaches. This review aims to elaborate on the computational works conducted on the microbe-disease and microbe-drug topics, discuss the computational model approaches used for predicting associations and provide comprehensive information on the related databases. Finally, we discussed potential prospects and obstacles in this field of study, while also outlining some recommendations for further enhancing predictive capabilities.
Collapse
Affiliation(s)
- Hassan Shokri Garjan
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Yadollah Omidi
- Department of Pharmaceutical Sciences, Nova Southeastern University, College of Pharmacy, Fort Lauderdale, FL, USA
| | | | - Reza Ferdousi
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran.
| |
Collapse
|
10
|
GACNNMDA: a computational model for predicting potential human microbe-drug associations based on graph attention network and CNN-based classifier. BMC Bioinformatics 2023; 24:35. [PMID: 36732704 PMCID: PMC9893988 DOI: 10.1186/s12859-023-05158-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 01/24/2023] [Indexed: 02/04/2023] Open
Abstract
As new drug targets, human microbes are proven to be closely related to human health. Effective computational methods for inferring potential microbe-drug associations can provide a useful complement to conventional experimental methods and will facilitate drug research and development. However, it is still a challenging work to predict potential interactions for new microbes or new drugs, since the number of known microbe-drug associations is very limited at present. In this manuscript, we first constructed two heterogeneous microbe-drug networks based on multiple measures of similarity of microbes and drugs, and known microbe-drug associations or known microbe-disease-drug associations, respectively. And then, we established two feature matrices for microbes and drugs through concatenating various attributes of microbes and drugs. Thereafter, after taking these two feature matrices and two heterogeneous microbe-drug networks as inputs of a two-layer graph attention network, we obtained low dimensional feature representations for microbes and drugs separately. Finally, through integrating low dimensional feature representations with two feature matrices to form the inputs of a convolutional neural network respectively, a novel computational model named GACNNMDA was designed to predict possible scores of microbe-drug pairs. Experimental results show that the predictive performance of GACNNMDA is superior to existing advanced methods. Furthermore, case studies on well-known microbes and drugs demonstrate the effectiveness of GACNNMDA as well. Source codes and supplementary materials are available at: https://github.com/tyqGitHub/TYQ/tree/master/GACNNMDA.
Collapse
|
11
|
Tan Y, Zou J, Kuang L, Wang X, Zeng B, Zhang Z, Wang L. GSAMDA: a computational model for predicting potential microbe–drug associations based on graph attention network and sparse autoencoder. BMC Bioinformatics 2022; 23:492. [PMID: 36401174 PMCID: PMC9673879 DOI: 10.1186/s12859-022-05053-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 11/14/2022] [Indexed: 11/19/2022] Open
Abstract
Background Clinical studies show that microorganisms are closely related to human health, and the discovery of potential associations between microbes and drugs will facilitate drug research and development. However, at present, few computational methods for predicting microbe–drug associations have been proposed.
Results In this work, we proposed a novel computational model named GSAMDA based on the graph attention network and sparse autoencoder to infer latent microbe–drug associations. In GSAMDA, we first built a heterogeneous network through integrating known microbe–drug associations, microbe similarities and drug similarities. And then, we adopted a GAT-based autoencoder and a sparse autoencoder module respectively to learn topological representations and attribute representations for nodes in the newly constructed heterogeneous network. Finally, based on these two kinds of node representations, we constructed two kinds of feature matrices for microbes and drugs separately, and then, utilized them to calculate possible association scores for microbe–drug pairs. Conclusion A novel computational model is proposed for predicting potential microbe–drug associations based on graph attention network and sparse autoencoder. Compared with other five state-of-the-art competitive methods, the experimental results illustrated that our model can achieve better performance. Moreover, case studies on two categories of representative drugs and microbes further demonstrated the effectiveness of our model as well.
Collapse
|
12
|
Cheng X, Qu J, Song S, Bian Z. Neighborhood-based inference and restricted Boltzmann machine for microbe and drug associations prediction. PeerJ 2022; 10:e13848. [PMID: 35990901 PMCID: PMC9387521 DOI: 10.7717/peerj.13848] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 07/14/2022] [Indexed: 01/18/2023] Open
Abstract
Background Efficient identification of microbe-drug associations is critical for drug development and solving problem of antimicrobial resistance. Traditional wet-lab method requires a lot of money and labor in identifying potential microbe-drug associations. With development of machine learning and publication of large amounts of biological data, computational methods become feasible. Methods In this article, we proposed a computational model of neighborhood-based inference (NI) and restricted Boltzmann machine (RBM) to predict potential microbe-drug association (NIRBMMDA) by using integrated microbe similarity, integrated drug similarity and known microbe-drug associations. First, NI was used to obtain a score matrix of potential microbe-drug associations by using different thresholds to find similar neighbors for drug or microbe. Second, RBM was employed to obtain another score matrix of potential microbe-drug associations based on contrastive divergence algorithm and sigmoid function. Because generalization ability of individual method is poor, we used an ensemble learning to integrate two score matrices for predicting potential microbe-drug associations more accurately. In particular, NI can fully utilize similar (neighbor) information of drug or microbe and RBM can learn potential probability distribution hid in known microbe-drug associations. Moreover, ensemble learning was used to integrate individual predictor for obtaining a stronger predictor. Results In global leave-one-out cross validation (LOOCV), NIRBMMDA gained the area under the receiver operating characteristics curve (AUC) of 0.8666, 0.9413 and 0.9557 for datasets of DrugVirus, MDAD and aBiofilm, respectively. In local LOOCV, AUCs of 0.8512, 0.9204 and 0.9414 were obtained for NIRBMMDA based on datasets of DrugVirus, MDAD and aBiofilm, respectively. For five-fold cross validation, NIRBMMDA acquired AUC and standard deviation of 0.8569 ± -0.0027, 0.9248 ± -0.0014 and 0.9369 ± -0.0020 on the basis of datasets of DrugVirus, MDAD and aBiofilm, respectively. Moreover, case study for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) showed that 13 out of the top 20 predicted drugs were verified by searching literature. The other two case studies indicated that 17 and 17 out of the top 20 predicted microbes for the drug of ciprofloxacin and minocycline were confirmed by identifying published literature, respectively.
Collapse
Affiliation(s)
- Xiaolong Cheng
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, Jiangsu, China
| | - Jia Qu
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, Jiangsu, China
| | - Shuangbao Song
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, Jiangsu, China
| | - Zekang Bian
- School of AI & Computer Science, Jiangnan University, Wuxi, Jiangsu, China
| |
Collapse
|
13
|
Xie W, Zheng Z, Zhang W, Huang L, Lin Q, Wong KC. SRG-vote: Predicting miRNA-gene relationships via embedding and LSTM ensemble. IEEE J Biomed Health Inform 2022; 26:4335-4344. [PMID: 35471879 DOI: 10.1109/jbhi.2022.3169542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
AbstractTargeted therapy for one for a set of genes has made it possible to apply precision medicine for different patients due to the existence of tumor heterogeneity. However, how to regulate those genes are still problematic. One of the natural regulators of genes is microRNAs. Thus, a better understanding of the miRNA-gene interaction mechanism might contribute to future diagnosis, prevention, and cancer therapy. The interactions between microRNA and genes play an essential role in molecular genetics. The in-vivo experiments validating the relationships between them are time-consuming, money-costly, and labor-intensive. With the development of high-throughput technology, we dealt with tons of biological data. However, extracting features from tremendous raw data and making a mathematical model is still a challenging topic. Machine learning and deep learning algorithms have become powerful tools in dealing with biological data. Inspired by this, in this paper, we propose a model that combines features/embedding extraction methods, deep learning algorithms, and a voting system. We leverage doc2vec to generate sequential embedding from molecular sequences. The role2vec, GCN, and GMM for geometrical embedding were generated from the complex network from similarity and pair-wise datasets. For the deep learning algorithms, we leveraged LSTM and Bi-LSTM according to different embedding and features. Finally, we adopted a voting system to balance results from different data sources. The results have shown that our voting system could achieve a higher AUC than the existing benchmark. The case studies demonstrate that our model could reveal potential relationships between miRNAs and genes. The source code, features, and predictive results can be downloaded at https://github.com/Xshelton/SRG-vote.
Collapse
|
14
|
Zhu B, Xu Y, Zhao P, Yiu SM, Yu H, Shi JY. NNAN: Nearest Neighbor Attention Network to Predict Drug–Microbe Associations. Front Microbiol 2022; 13:846915. [PMID: 35479616 PMCID: PMC9035839 DOI: 10.3389/fmicb.2022.846915] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Accepted: 02/14/2022] [Indexed: 11/13/2022] Open
Abstract
Many drugs can be metabolized by human microbes; the drug metabolites would significantly alter pharmacological effects and result in low therapeutic efficacy for patients. Hence, it is crucial to identify potential drug–microbe associations (DMAs) before the drug administrations. Nevertheless, traditional DMA determination cannot be applied in a wide range due to the tremendous number of microbe species, high costs, and the fact that it is time-consuming. Thus, predicting possible DMAs in computer technology is an essential topic. Inspired by other issues addressed by deep learning, we designed a deep learning-based model named Nearest Neighbor Attention Network (NNAN). The proposed model consists of four components, namely, a similarity network constructor, a nearest-neighbor aggregator, a feature attention block, and a predictor. In brief, the similarity block contains a microbe similarity network and a drug similarity network. The nearest-neighbor aggregator generates the embedding representations of drug–microbe pairs by integrating drug neighbors and microbe neighbors of each drug–microbe pair in the network. The feature attention block evaluates the importance of each dimension of drug–microbe pair embedding by a set of ordinary multi-layer neural networks. The predictor is an ordinary fully-connected deep neural network that functions as a binary classifier to distinguish potential DMAs among unlabeled drug–microbe pairs. Several experiments on two benchmark databases are performed to evaluate the performance of NNAN. First, the comparison with state-of-the-art baseline approaches demonstrates the superiority of NNAN under cross-validation in terms of predicting performance. Moreover, the interpretability inspection reveals that a drug tends to associate with a microbe if it finds its top-l most similar neighbors that associate with the microbe.
Collapse
Affiliation(s)
- Bei Zhu
- School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
| | - Yi Xu
- School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
| | - Pengcheng Zhao
- School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
| | - Siu-Ming Yiu
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Hui Yu
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
- *Correspondence: Hui Yu,
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
- Jian-Yu Shi,
| |
Collapse
|
15
|
Pi J, Jiao P, Zhang Y, Li J. MDGNN: Microbial Drug Prediction Based on Heterogeneous Multi-Attention Graph Neural Network. Front Microbiol 2022; 13:819046. [PMID: 35464940 PMCID: PMC9021438 DOI: 10.3389/fmicb.2022.819046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Accepted: 03/07/2022] [Indexed: 11/14/2022] Open
Abstract
Human beings are now facing one of the largest public health crises in history with the outbreak of COVID-19. Traditional drug discovery could not keep peace with newly discovered infectious diseases. The prediction of drug-virus associations not only provides insights into the mechanism of drug–virus interactions, but also guides the screening of potential antiviral drugs. We develop a deep learning algorithm based on the graph convolutional networks (MDGNN) to predict potential antiviral drugs. MDGNN is consisted of new node-level attention and feature-level attention mechanism and shows its effectiveness compared with other comparative algorithms. MDGNN integrates the global information of the graph in the process of information aggregation by introducing the attention at node and feature level to graph convolution. Comparative experiments show that MDGNN achieves state-of-the-art performance with an area under the curve (AUC) of 0.9726 and an area under the PR curve (AUPR) of 0.9112. In this case study, two drugs related to SARS-CoV-2 were successfully predicted and verified by the relevant literature. The data and code are open source and can be accessed from https://github.com/Pijiangsheng/MDGNN.
Collapse
Affiliation(s)
- Jiangsheng Pi
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
| | - Peishun Jiao
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
| | - Yang Zhang
- College of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, China
- *Correspondence: Yang Zhang,
| | - Junyi Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
- Junyi Li,
| |
Collapse
|
16
|
Wang L, Tan Y, Yang X, Kuang L, Ping P. Review on predicting pairwise relationships between human microbes, drugs and diseases: from biological data to computational models. Brief Bioinform 2022; 23:6553604. [PMID: 35325024 DOI: 10.1093/bib/bbac080] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 02/14/2022] [Accepted: 02/15/2022] [Indexed: 12/11/2022] Open
Abstract
In recent years, with the rapid development of techniques in bioinformatics and life science, a considerable quantity of biomedical data has been accumulated, based on which researchers have developed various computational approaches to discover potential associations between human microbes, drugs and diseases. This paper provides a comprehensive overview of recent advances in prediction of potential correlations between microbes, drugs and diseases from biological data to computational models. Firstly, we introduced the widely used datasets relevant to the identification of potential relationships between microbes, drugs and diseases in detail. And then, we divided a series of a lot of representative computing models into five major categories including network, matrix factorization, matrix completion, regularization and artificial neural network for in-depth discussion and comparison. Finally, we analysed possible challenges and opportunities in this research area, and at the same time we outlined some suggestions for further improvement of predictive performances as well.
Collapse
Affiliation(s)
- Lei Wang
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China.,Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, Hunan, China
| | - Yaqin Tan
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China.,Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, Hunan, China
| | - Xiaoyu Yang
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China.,Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, Hunan, China
| | - Linai Kuang
- Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, Hunan, China
| | - Pengyao Ping
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China
| |
Collapse
|
17
|
Deng L, Huang Y, Liu X, Liu H. Graph2MDA: a multi-modal variational graph embedding model for predicting microbe-drug associations. Bioinformatics 2022; 38:1118-1125. [PMID: 34864873 DOI: 10.1093/bioinformatics/btab792] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 10/22/2021] [Accepted: 11/17/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Accumulated clinical studies show that microbes living in humans interact closely with human hosts, and get involved in modulating drug efficacy and drug toxicity. Microbes have become novel targets for the development of antibacterial agents. Therefore, screening of microbe-drug associations can benefit greatly drug research and development. With the increase of microbial genomic and pharmacological datasets, we are greatly motivated to develop an effective computational method to identify new microbe-drug associations. RESULTS In this article, we proposed a novel method, Graph2MDA, to predict microbe-drug associations by using variational graph autoencoder (VGAE). We constructed multi-modal attributed graphs based on multiple features of microbes and drugs, such as molecular structures, microbe genetic sequences and function annotations. Taking as input the multi-modal attribute graphs, VGAE was trained to learn the informative and interpretable latent representations of each node and the whole graph, and then a deep neural network classifier was used to predict microbe-drug associations. The hyperparameter analysis and model ablation studies showed the sensitivity and robustness of our model. We evaluated our method on three independent datasets and the experimental results showed that our proposed method outperformed six existing state-of-the-art methods. We also explored the meaning of the learned latent representations of drugs and found that the drugs show obvious clustering patterns that are significantly consistent with drug ATC classification. Moreover, we conducted case studies on two microbes and two drugs and found 75-95% predicted associations have been reported in PubMed literature. Our extensive performance evaluations validated the effectiveness of our proposed method. AVAILABILITY AND IMPLEMENTATION Source codes and preprocessed data are available at https://github.com/moen-hyb/Graph2MDA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yibiao Huang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Xuejun Liu
- School of Computer Science and Technology, Nanjing Tech University, Nanjing 211816, China
| | - Hui Liu
- School of Computer Science and Technology, Nanjing Tech University, Nanjing 211816, China
| |
Collapse
|
18
|
Heterogeneous graph attention networks for drug virus association prediction. Methods 2021; 198:11-18. [PMID: 34419588 PMCID: PMC8376526 DOI: 10.1016/j.ymeth.2021.08.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 08/11/2021] [Indexed: 12/11/2022] Open
Abstract
Coronavirus Disease-19 (COVID-19) has lead global epidemics with high morbidity and mortality. However, there are currently no proven effective drugs targeting COVID-19. Identifying drug-virus associations can not only provide insights into the understanding of drug-virus interaction mechanism, but also guide and facilitate the screening of compound candidates for antiviral drug discovery. Since conventional experiment methods are time-consuming, laborious and expensive, computational methods to identify potential drug candidates for viruses (e.g., COVID-19) provide an alternative strategy. In this work, we propose a novel framework of Heterogeneous Graph Attention Networks for Drug-Virus Association predictions, named HGATDVA. First, we fully incorporate multiple sources of biomedical data, e.g., drug chemical information, virus genome sequences and viral protein sequences, to construct abundant features for drugs and viruses. Second, we construct two drug-virus heterogeneous graphs. For each graph, we design a self-enhanced graph attention network (SGAT) to explicitly model the dependency between a node and its local neighbors and derive the graph-specific representations for nodes. Third, we further develop a neural network architecture with tri-aggregator to aggregate the graph-specific representations to generate the final node representations. Extensive experiments were conducted on two datasets, i.e., DrugVirus and MDAD, and the results demonstrated that our model outperformed 7 state-of-the-art methods. Case study on SARS-CoV-2 validated the effectiveness of our model in identifying potential drugs for viruses.
Collapse
|
19
|
Li W, Wang S, Xu J. An Ensemble Matrix Completion Model for Predicting Potential Drugs Against SARS-CoV-2. Front Microbiol 2021; 12:694534. [PMID: 34367094 PMCID: PMC8334363 DOI: 10.3389/fmicb.2021.694534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 06/22/2021] [Indexed: 11/13/2022] Open
Abstract
Because of the catastrophic outbreak of global coronavirus disease 2019 (COVID-19) and its strong infectivity and possible persistence, computational repurposing of existing approved drugs will be a promising strategy that facilitates rapid clinical treatment decisions and provides reasonable justification for subsequent clinical trials and regulatory reviews. Since the effects of a small number of conditionally marketed vaccines need further clinical observation, there is still an urgent need to quickly and effectively repurpose potentially available drugs before the next disease peak. In this work, we have manually collected a set of experimentally confirmed virus-drug associations through the publicly published database and literature, consisting of 175 drugs and 95 viruses, as well as 933 virus-drug associations. Then, because the samples are extremely sparse and unbalanced, negative samples cannot be easily obtained. We have developed an ensemble model, EMC-Voting, based on matrix completion and weighted soft voting, a semi-supervised machine learning model for computational drug repurposing. Finally, we have evaluated the prediction performance of EMC-Voting by fivefold crossing-validation and compared it with other baseline classifiers and prediction models. The case study for the virus SARS-COV-2 included in the dataset demonstrates that our model achieves the outperforming AUPR value of 0.934 in virus-drug association's prediction.
Collapse
Affiliation(s)
| | - Shulin Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | | |
Collapse
|
20
|
Ma Y, Liu L, Chen Q, Ma Y. An Inductive Logistic Matrix Factorization Model for Predicting Drug-Metabolite Association With Vicus Regularization. Front Microbiol 2021; 12:650366. [PMID: 33868209 PMCID: PMC8047063 DOI: 10.3389/fmicb.2021.650366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 03/08/2021] [Indexed: 11/28/2022] Open
Abstract
Metabolites are closely related to human disease. The interaction between metabolites and drugs has drawn increasing attention in the field of pharmacomicrobiomics. However, only a small portion of the drug-metabolite interactions were experimentally observed due to the fact that experimental validation is labor-intensive, costly, and time-consuming. Although a few computational approaches have been proposed to predict latent associations for various bipartite networks, such as miRNA-disease, drug-target interaction networks, and so on, to our best knowledge the associations between drugs and metabolites have not been reported on a large scale. In this study, we propose a novel algorithm, namely inductive logistic matrix factorization (ILMF) to predict the latent associations between drugs and metabolites. Specifically, the proposed ILMF integrates drug-drug interaction, metabolite-metabolite interaction, and drug-metabolite interaction into this framework, to model the probability that a drug would interact with a metabolite. Moreover, we exploit inductive matrix completion to guide the learning of projection matrices U and V that depend on the low-dimensional feature representation matrices of drugs and metabolites: Fm and Fd . These two matrices can be obtained by fusing multiple data sources. Thus, Fd U and Fm V can be viewed as drug-specific and metabolite-specific latent representations, different from classical LMF. Furthermore, we utilize the Vicus spectral matrix that reveals the refined local geometrical structure inherent in the original data to encode the relationships between drugs and metabolites. Extensive experiments are conducted on a manually curated "DrugMetaboliteAtlas" dataset. The experimental results show that ILMF can achieve competitive performance compared with other state-of-the-art approaches, which demonstrates its effectiveness in predicting potential drug-metabolite associations.
Collapse
Affiliation(s)
- Yuanyuan Ma
- School of Computer and Information Engineering, Anyang Normal University, Anyang, China
| | - Lifang Liu
- School of Education, Anyang Normal University, Anyang, China
| | - Qianjun Chen
- School of Computer, Central China Normal University, Wuhan, China
| | - Yingjun Ma
- School of Applied Mathematics, Xiamen University of Technology, Xiamen, China
| |
Collapse
|