1
|
Ben Atitallah S, Ben Rabah C, Driss M, Boulila W, Koubaa A. Self-supervised learning for graph-structured data in healthcare applications: A comprehensive review. Comput Biol Med 2025; 188:109874. [PMID: 39999496 DOI: 10.1016/j.compbiomed.2025.109874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2024] [Revised: 01/26/2025] [Accepted: 02/13/2025] [Indexed: 02/27/2025]
Abstract
The increasing complexity and interconnectedness of healthcare data present numerous opportunities to improve prediction, diagnosis, and treatment. Graph-structured data, which represents entities and their relationships, is well-suited for modeling these complex connections. However, effectively utilizing this data often requires strong and efficient learning algorithms, especially when dealing with limited labeled data. Self-supervised learning (SSL) has emerged as a powerful paradigm for leveraging unlabeled data to learn effective representations. This paper presents a comprehensive review of SSL approaches specifically designed for graph-structured data in healthcare applications. We explore the challenges and opportunities associated with healthcare data and assess the effectiveness of SSL techniques in real-world healthcare applications. Our discussion encompasses various healthcare settings, such as disease prediction, medical image analysis, and drug discovery. We critically evaluate the performance of different SSL methods across these tasks, highlighting their strengths, limitations, and potential future research directions. To the best of our knowledge, this is the first comprehensive review of SSL applied to graph data in healthcare, providing valuable guidance for researchers and practitioners looking to leverage these techniques to enhance outcomes and drive progress in the field.
Collapse
Affiliation(s)
- Safa Ben Atitallah
- Robotics and Internet of Things Laboratory, Prince Sultan University, Riyadh, 12435, Saudi Arabia; RIADI Laboratory, National School of Computer Science, University of Manouba, Manouba, 2010, Tunisia.
| | - Chaima Ben Rabah
- RIADI Laboratory, National School of Computer Science, University of Manouba, Manouba, 2010, Tunisia
| | - Maha Driss
- Robotics and Internet of Things Laboratory, Prince Sultan University, Riyadh, 12435, Saudi Arabia; RIADI Laboratory, National School of Computer Science, University of Manouba, Manouba, 2010, Tunisia
| | - Wadii Boulila
- Robotics and Internet of Things Laboratory, Prince Sultan University, Riyadh, 12435, Saudi Arabia; RIADI Laboratory, National School of Computer Science, University of Manouba, Manouba, 2010, Tunisia
| | - Anis Koubaa
- Robotics and Internet of Things Laboratory, Prince Sultan University, Riyadh, 12435, Saudi Arabia
| |
Collapse
|
2
|
Santangelo BE, Bada M, Hunter LE, Lozupone C. Hypothesizing mechanistic links between microbes and disease using knowledge graphs. Sci Rep 2025; 15:6905. [PMID: 40011529 PMCID: PMC11865272 DOI: 10.1038/s41598-025-91230-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2024] [Accepted: 02/19/2025] [Indexed: 02/28/2025] Open
Abstract
Knowledge graphs have been a useful tool for many biomedical applications because of their effective representation of biological concepts. Plentiful evidence exists linking the gut microbiome to disease in a correlative context, but uncovering the mechanistic explanation for those associations remains a challenge. Here we demonstrate the potential of knowledge graphs to hypothesize plausible mechanistic accounts of host-microbe interactions in disease. We have constructed a knowledge graph of linked microbes, genes and metabolites called MGMLink, and, using a shortest path or template-based search through the graph and a novel path-prioritization methodology based on the structure of the knowledge graph, we show that this knowledge supports inference of mechanistic hypotheses that explain observed relationships between microbes and disease phenotypes. We discuss specific applications of this methodology in inflammatory bowel disease and Parkinson's disease. This approach enables mechanistic hypotheses surrounding the complex interactions between gut microbes and disease to be generated in a scalable and comprehensive manner.
Collapse
Affiliation(s)
- Brook E Santangelo
- Department of Biomedical Informatics, University of Colorado Denver Anschutz Medical Campus, Aurora, CO, USA.
| | - Michael Bada
- Department of Pediatrics, University of Chicago, Chicago, IL, USA
| | | | - Catherine Lozupone
- Department of Biomedical Informatics, University of Colorado Denver Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
3
|
Liu T, Wang S, Pang S, Tan X. Truncated Arctangent Rank Minimization and Double-Strategy Neighborhood Constraint Graph Inference for Drug-Disease Association Prediction. J Chem Inf Model 2025; 65:2158-2172. [PMID: 39889248 DOI: 10.1021/acs.jcim.4c02276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2025]
Abstract
Accurately identifying new therapeutic uses for drugs is essential to advancing pharmaceutical research and development. Graph inference techniques have shown great promise in predicting drug-disease associations, offering both high convergence accuracy and efficiency. However, most existing methods fail to sufficiently address the issue of numerous missing information in drug-disease association networks. Moreover, existing methods are often constrained by local or single-directional reasoning. To overcome these limitations, we propose a novel approach, truncated arctangent rank minimization and double-strategy neighborhood constraint graph inference (TARMDNGI), for drug-disease association prediction. First, we calculate Gaussian kernel and Laplace kernel similarities for both drugs and diseases, which are then integrated using nonlinear fusion techniques. We introduce a new matrix completion technique, referred to as TARM. TARM takes the adjacency matrix of drug-disease heterogeneous networks as the target matrix and enhances the robustness and formability of the edges of DDA networks by truncated arctangent rank minimization. Additionally, we propose a double-strategy neighborhood constrained graph inference method to predict drug-disease associations. This technique focuses on the neighboring nodes of drugs and diseases, filtering out potential noise from more distant nodes. Furthermore, the DNGI method employs both top-down and bottom-up strategies to infer associations using the entire drug-disease heterogeneous network. The synergy of the dual strategies can enhance the comprehensive processing of complex structures and cross-domain associations in heterogeneous graphs, ensuring that the rich information in the network is fully utilized. Experimental results consistently demonstrate that TARMDNGI outperforms state-of-the-art models across two drug-disease datasets, one lncRNA-disease dataset, and one microbe-disease dataset.
Collapse
Affiliation(s)
- Tiyao Liu
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
- State Key Laboratory of Chemical Safety, Qingdao 266580, China
- Shandong Key Laboratory of Intelligent Oil & Gas Industrial Software, Qingdao 266580, China
| | - Shudong Wang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
- State Key Laboratory of Chemical Safety, Qingdao 266580, China
- Shandong Key Laboratory of Intelligent Oil & Gas Industrial Software, Qingdao 266580, China
| | - Shanchen Pang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
- State Key Laboratory of Chemical Safety, Qingdao 266580, China
- Shandong Key Laboratory of Intelligent Oil & Gas Industrial Software, Qingdao 266580, China
| | - Xiaodong Tan
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
- Shandong Key Laboratory of Intelligent Oil & Gas Industrial Software, Qingdao 266580, China
| |
Collapse
|
4
|
Wu J, Xiao L, Fan L, Wang L, Zhu X. Dual graph-embedded fusion network for predicting potential microbe-disease associations with sequence learning. Front Genet 2025; 16:1511521. [PMID: 40008230 PMCID: PMC11850361 DOI: 10.3389/fgene.2025.1511521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Accepted: 01/15/2025] [Indexed: 02/27/2025] Open
Abstract
Recent studies indicate that microorganisms are crucial for maintaining human health. Dysbiosis, or an imbalance in these microbial communities, is strongly linked to a variety of human diseases. Therefore, understanding the impact of microbes on disease is essential. The DuGEL model leverages the strengths of graph convolutional neural network (GCN) and graph attention network (GAT), ensuring that both local and global relationships within the microbe-disease association network are captured. The integration of the Long Short-Term Memory Network (LSTM) further enhances the model's ability to understand sequential dependencies in the feature representations. This comprehensive approach allows DuGEL to achieve a high level of accuracy in predicting potential microbe-disease associations, making it a valuable tool for biomedical research and the discovery of new therapeutic targets. By combining advanced graph-based and sequence-based learning techniques, DuGEL addresses the limitations of existing methods and provides a robust framework for the prediction of microbe-disease associations. To evaluate the performance of DuGEL, we conducted comprehensive comparative experiments and case studies based on two databases, HMDAD, and Disbiome to demonstrate that DuGEL can effectively predict potential microbe-disease associations.
Collapse
Affiliation(s)
- Junlong Wu
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
| | - Liqi Xiao
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
| | - Liu Fan
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
| | - Lei Wang
- Technology Innovation Center of Changsha, Changsha University, Changsha, China
| | - Xianyou Zhu
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
- Hunan Engineering Research Center of Cyberspace Security Technology and Applications, Hengyang Normal University, Hengyang, China
| |
Collapse
|
5
|
Zhang X, Liu H, Li Y, Wen Y, Xu T, Chen C, Hao S, Hu J, Nie S, Gao F, Jia G. Linking dietary fiber to human malady through cumulative profiling of microbiota disturbance. IMETA 2025; 4:e70004. [PMID: 40027480 PMCID: PMC11865338 DOI: 10.1002/imt2.70004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Revised: 01/27/2025] [Accepted: 01/29/2025] [Indexed: 03/05/2025]
Abstract
Dietary fiber influences the composition and metabolic activity of microbial communities, impacting disease development. Current understanding of the intricate fiber-microbe-disease tripartite relationship remains fragmented and elusive, urging a systematic investigation. Here, we focused on microbiota disturbance as a robust index to mitigate various confounding factors and developed the Bio-taxonomic Hierarchy Weighted Aggregation (BHWA) algorithm to integrate multi-taxonomy microbiota disturbance data, thereby illuminating the complex relationships among dietary fiber, microbiota, and disease. By leveraging microbiota disturbance similarities, we (1) classified 32 types of dietary fibers into six functional subgroups, revealing correlations with fiber solubility; (2) established associations among 161 diseases, uncovering shared microbiota disturbance patterns that explain disease co-occurrence (e.g., type II diabetes and kidney diseases) and distinct microbiota patterns that discern symptomatically similar diseases (e.g., inflammatory bowel disease and irritable bowel syndrome); (3) designed a body-site-specific microbiota disturbance scoring scheme, computing a disturbance score (DS) for each disease and highlighting the pronounced capacity of Crohn's disease to disturb gut microbiota (DS = 14.01) in contrast with food allergy's minimal capacity (DS = 0.74); (4) identified 1659 fiber-disease associations, predicting the potential of dietary fiber to modulate specific microbiota changes associated with diseases of interest; (5) established murine models of inflammatory bowel disease to validate the preventive and therapeutic effects of arabinoxylan that notably perturbed the Bacteroidetes and Firmicutes phyla, as well as the Bacteroidetes and Lactobacillus genera, aligning with our model predictions. To enhance data accessibility and facilitate targeted dietary intervention development, we launched an interactive webtool-mDiFiBank at https://mdifibank.org.cn/.
Collapse
Affiliation(s)
- Xin Zhang
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhenChina
| | - Huan Liu
- State Key Laboratory of Food Science and ResourcesChina‐Canada Joint Lab of Food Science and Technology (Nanchang), Key Laboratory of Bioactive Polysaccharides of Jiangxi Province, Nanchang UniversityNanchangChina
| | - Yu Li
- Department of Computer Science and EngineeringThe Chinese University of Hong KongHong KongChina
| | - Yanlong Wen
- State Key Laboratory of Food Science and ResourcesChina‐Canada Joint Lab of Food Science and Technology (Nanchang), Key Laboratory of Bioactive Polysaccharides of Jiangxi Province, Nanchang UniversityNanchangChina
| | - Tianxin Xu
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhenChina
| | - Chen Chen
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhenChina
| | - Shuxia Hao
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhenChina
| | - Jielun Hu
- State Key Laboratory of Food Science and ResourcesChina‐Canada Joint Lab of Food Science and Technology (Nanchang), Key Laboratory of Bioactive Polysaccharides of Jiangxi Province, Nanchang UniversityNanchangChina
| | - Shaoping Nie
- State Key Laboratory of Food Science and ResourcesChina‐Canada Joint Lab of Food Science and Technology (Nanchang), Key Laboratory of Bioactive Polysaccharides of Jiangxi Province, Nanchang UniversityNanchangChina
| | - Fei Gao
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhenChina
- Comparative Pediatrics and Nutrition, Department of Veterinary and Animal Sciences, Faculty of Health and Medical SciencesUniversity of CopenhagenCopenhagenDenmark
| | - Gengjie Jia
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhenChina
| |
Collapse
|
6
|
Wu C, Lin B, Zhang H, Xu D, Gao R, Song R, Liu ZP, De Marinis Y. GCNPMDA: Human microbe-disease association prediction by hierarchical graph convolutional network with layer attention. Biomed Signal Process Control 2025; 100:107004. [DOI: 10.1016/j.bspc.2024.107004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2025]
|
7
|
Asim MN, Ibrahim MA, Asif T, Dengel A. RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models. Heliyon 2025; 11:e41488. [PMID: 39897847 PMCID: PMC11783440 DOI: 10.1016/j.heliyon.2024.e41488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 12/23/2024] [Accepted: 12/24/2024] [Indexed: 02/04/2025] Open
Abstract
Deciphering information of RNA sequences reveals their diverse roles in living organisms, including gene regulation and protein synthesis. Aberrations in RNA sequence such as dysregulation and mutations can drive a diverse spectrum of diseases including cancers, genetic disorders, and neurodegenerative conditions. Furthermore, researchers are harnessing RNA's therapeutic potential for transforming traditional treatment paradigms into personalized therapies through the development of RNA-based drugs and gene therapies. To gain insights of biological functions and to detect diseases at early stages and develop potent therapeutics, researchers are performing diverse types RNA sequence analysis tasks. RNA sequence analysis through conventional wet-lab methods is expensive, time-consuming and error prone. To enable large-scale RNA sequence analysis, empowerment of wet-lab experimental methods with Artificial Intelligence (AI) applications necessitates scientists to have a comprehensive knowledge of both DNA and AI fields. While molecular biologists encounter challenges in understanding AI methods, computer scientists often lack basic foundations of RNA sequence analysis tasks. Considering the absence of a comprehensive literature that bridges this research gap and promotes the development of AI-driven RNA sequence analysis applications, the contributions of this manuscript are manifold: It equips AI researchers with biological foundations of 47 distinct RNA sequence analysis tasks. It sets a stage for development of benchmark datasets related to 47 distinct RNA sequence analysis tasks by facilitating cruxes of 64 different biological databases. It presents word embeddings and language models applications across 47 distinct RNA sequence analysis tasks. It streamlines the development of new predictors by providing a comprehensive survey of 58 word embeddings and 70 language models based predictive pipelines performance values as well as top performing traditional sequence encoding based predictors and their performances across 47 RNA sequence analysis tasks.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Tayyaba Asif
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Andreas Dengel
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| |
Collapse
|
8
|
Gu J, Zhang T, Gao Y, Chen S, Zhang Y, Cui H, Xuan P. Neighborhood Topology-Aware Knowledge Graph Learning and Microbial Preference Inferring for Drug-Microbe Association Prediction. J Chem Inf Model 2025; 65:435-445. [PMID: 39745733 DOI: 10.1021/acs.jcim.4c01544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2025]
Abstract
The human microbiota may influence the effectiveness of drug therapy by activating or inactivating the pharmacological properties of drugs. Computational methods have demonstrated their ability to screen reliable microbe-drug associations and uncover the mechanism by which drugs exert their functions. However, the previous prediction methods failed to completely exploit the neighborhood topologies of the microbe and drug entities and the diverse correlations between the microbe-drug entity pair and the other entities. In addition, they ignored the case that a microbe prefers to associate with its own specific drugs. A novel prediction method, PCMDA, was proposed by learning the neighborhood topologies of entities, inferring the association preferences, and integrating the features of each entity pair based on multiple biological premises. First, a knowledge graph consisting of microbe, disease, and drug entities is established to help the subsequent integration of the topological structure of entities and the similarity, interaction, and association relationship between any two entities. We generate various topological embeddings for each microbe (or drug) entity through random walks with neighborhood restarts on the microbe-disease-drug knowledge graph. Distance-level attention is designed to adaptively fuse neighborhood topologies covering multiple ranges. Second, the topological embeddings of entities imply the latent topological relationships between entities, while the relational embeddings of entities are derived from the semantics of connections among the entities. The topological structure and relational semantics of entities are fused by a designed knowledge graph learning module based on multilayer perceptron networks. Third, considering the preference that each microbe tends to especially associate with a group of drugs, information-level attention is designed to integrate the dependency between microbial preference and the candidate drug. Finally, a dual-gated network is established to encode the features of a microbe-drug entity pair from multiple biological perspectives. The comparative experiments with seven state-of-the-art methods demonstrate PCMDA's superior performance for microbe-drug association prediction. The case studies on three drugs and the recall rate evaluation for the top-ranked candidates indicate that PCMDA has the capability of discovering reliable candidate microbes associated with a drug. The datasets and source codes are freely available at https://github.com/pingxuan-hlju/pcmda.
Collapse
Affiliation(s)
- Jing Gu
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Tiangang Zhang
- School of Cyberspace Security, Hainan University, Haikou 570228, China
| | - Yihang Gao
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Sentao Chen
- Department of Computer Science and Technology, Shantou University, Shantou 515063, China
| | - Yuxin Zhang
- Department of Computer Science and Technology, Shantou University, Shantou 515063, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Victoria 3083, Australia
| | - Ping Xuan
- Department of Computer Science and Technology, Shantou University, Shantou 515063, China
| |
Collapse
|
9
|
Lu Y, Hui F, Zhou G, Xia J. MicrobiomeNet: exploring microbial associations and metabolic profiles for mechanistic insights. Nucleic Acids Res 2025; 53:D789-D796. [PMID: 39441071 PMCID: PMC11701532 DOI: 10.1093/nar/gkae944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 09/30/2024] [Accepted: 10/08/2024] [Indexed: 10/25/2024] Open
Abstract
The growing volumes of microbiome studies over the past decade have revealed a wide repertoire of microbial associations under diverse conditions. Microbes produce small molecules to interact with each other as well as to modulate their environments. Their metabolic profiles hold the key to understanding these association patterns for translational applications. Based on this concept, we developed MicrobiomeNet, a comprehensive database that integrates microbial associations with their metabolic profiles for mechanistic insights. It currently contains a total of ∼5.8 million known microbial associations, coupled with >12 400 genome-scale metabolic models (GEMs) covering ∼6000 microbial species. Users can intuitively explore microbial associations and compare their corresponding metabolic profiles. Our case studies show that MicrobiomeNet can provide mechanistic insights that are consistent with the literature. MicrobiomeNet is freely available at https://www.microbiomenet.com/.
Collapse
Affiliation(s)
- Yao Lu
- Institute of Parasitology, McGill University, Quebec, Canada
- Department of Microbiology and Immunology, McGill University, Quebec, Canada
| | - Fiona Hui
- Institute of Parasitology, McGill University, Quebec, Canada
| | - Guangyan Zhou
- Institute of Parasitology, McGill University, Quebec, Canada
| | - Jianguo Xia
- Institute of Parasitology, McGill University, Quebec, Canada
- Department of Microbiology and Immunology, McGill University, Quebec, Canada
| |
Collapse
|
10
|
Jiang C, Feng J, Shan B, Chen Q, Yang J, Wang G, Peng X, Li X. Predicting microbe-disease associations via graph neural network and contrastive learning. Front Microbiol 2024; 15:1483983. [PMID: 39735180 PMCID: PMC11671253 DOI: 10.3389/fmicb.2024.1483983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2024] [Accepted: 10/14/2024] [Indexed: 12/31/2024] Open
Abstract
In the contemporary field of life sciences, researchers have gradually recognized the critical role of microbes in maintaining human health. However, traditional biological experimental methods for validating the association between microbes and diseases are both time-consuming and costly. Therefore, developing effective computational methods to predict potential associations between microbes and diseases is an important and urgent task. In this study, we propose a novel computational framework, called GCATCMDA, for forecasting potential associations between microbes and diseases. Firstly, we construct Gaussian kernel similarity networks for microbes and diseases using known microbe-disease association data. Then, we design a feature encoder that combines graph convolutional network and graph attention mechanism to learn the node features of networks, and propose a feature dual-fusion module to effectively integrate node features from each layer's output. Next, we apply the feature encoder separately to the microbe similarity network, disease similarity network, and microbe-disease association network, and enhance the consistency of features for the same nodes across different association networks through contrastive learning. Finally, we pass the microbe and disease features into an inner product decoder to obtain the association scores between them. Experimental results demonstrate that the GCATCMDA model achieves superior predictive performance compared to previous methods. Furthermore, case studies confirm that GCATCMDA is an effective tool for predicting microbe-disease associations in real situations.
Collapse
Affiliation(s)
- Cong Jiang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
- National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, China
| | - Junxuan Feng
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
- National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, China
| | - Bingshen Shan
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
- National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, China
| | - Qiyue Chen
- College of Management, Shenzhen University, Shenzhen, China
| | - Jian Yang
- Beijing Key Laboratory of Mental Disorders, National Clinical Research Center for Mental Disorders and National Center for Mental Disorders, Beijing Anding Hospital, Capital Medical University, Beijing, China
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, China
| | - Gang Wang
- Beijing Key Laboratory of Mental Disorders, National Clinical Research Center for Mental Disorders and National Center for Mental Disorders, Beijing Anding Hospital, Capital Medical University, Beijing, China
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, China
| | - Xiaogang Peng
- National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, China
| | - Xiaozheng Li
- College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, China
- JCY Biotech Ltd., Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, China
| |
Collapse
|
11
|
Yang M, Wang Z, Yan Z, Wang W, Zhu Q, Jin C. DNASimCLR: a contrastive learning-based deep learning approach for gene sequence data classification. BMC Bioinformatics 2024; 25:328. [PMID: 39402441 PMCID: PMC11476100 DOI: 10.1186/s12859-024-05955-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 10/09/2024] [Indexed: 10/19/2024] Open
Abstract
BACKGROUND The rapid advancements in deep neural network models have significantly enhanced the ability to extract features from microbial sequence data, which is critical for addressing biological challenges. However, the scarcity and complexity of labeled microbial data pose substantial difficulties for supervised learning approaches. To address these issues, we propose DNASimCLR, an unsupervised framework designed for efficient gene sequence data feature extraction. RESULTS DNASimCLR leverages convolutional neural networks and the SimCLR framework, based on contrastive learning, to extract intricate features from diverse microbial gene sequences. Pre-training was conducted on two classic large scale unlabelled datasets encompassing metagenomes and viral gene sequences. Subsequent classification tasks were performed by fine-tuning the pretrained model using the previously acquired model. Our experiments demonstrate that DNASimCLR is at least comparable to state-of-the-art techniques for gene sequence classification. For convolutional neural network-based approaches, DNASimCLR surpasses the latest existing methods, clearly establishing its superiority over the state-of-the-art CNN-based feature extraction techniques. Furthermore, the model exhibits superior performance across diverse tasks in analyzing biological sequence data, showcasing its robust adaptability. CONCLUSIONS DNASimCLR represents a robust and database-agnostic solution for gene sequence classification. Its versatility allows it to perform well in scenarios involving novel or previously unseen gene sequences, making it a valuable tool for diverse applications in genomics.
Collapse
Affiliation(s)
- Minghao Yang
- Shandong University, Weihai, People's Republic of China
- Beijing Research Institute of Automation for Machinery Industry, Beijing, People's Republic of China
| | - Zehua Wang
- Beijing Research Institute of Automation for Machinery Industry, Beijing, People's Republic of China
| | - Zizhuo Yan
- Beijing Research Institute of Automation for Machinery Industry, Beijing, People's Republic of China
| | - Wenxiang Wang
- Beijing Research Institute of Automation for Machinery Industry, Beijing, People's Republic of China
| | - Qian Zhu
- Shandong University, Weihai, People's Republic of China
| | - Changlong Jin
- Shandong University, Weihai, People's Republic of China.
| |
Collapse
|
12
|
Wang S, Liu JX, Li F, Wang J, Gao YL. M 3HOGAT: A Multi-View Multi-Modal Multi-Scale High-Order Graph Attention Network for Microbe-Disease Association Prediction. IEEE J Biomed Health Inform 2024; 28:6259-6267. [PMID: 39012741 DOI: 10.1109/jbhi.2024.3429128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2024]
Abstract
Numerous scientific studies have found a link between diverse microorganisms in the human body and complex human diseases. Because traditional experimental approaches are time-consuming and expensive, using computational methods to identify microbes correlated with diseases is critical. In this paper, a new microbe-disease association prediction model is proposed that combines a multi-view multi-modal network and a multi-scale feature fusion mechanism, called M3HOGAT. Firstly, a microbe-disease association network and multiple similarity views are constructed based on multi-source information. Then, consider that neighbor information from disparate orders might be more adept at learning node representations. Consequently, the higher-order graph attention network (HOGAT) is devised to aggregate neighbor information from disparate orders to extract microbe and disease features from different networks and views. Given that the embedding features of microbe and disease from different views possess varying importance, a multi-scale feature fusion mechanism is employed to learn their interaction information, thereby generating the final feature of microbes and diseases. Finally, an inner product decoder is used to reconstruct the microbe-disease association matrix. Compared with five state-of-the-art methods on the HMDAD and Disbiome datasets, the results of 5-fold cross-validations show that M3HOGAT achieves the best performance. Furthermore, case studies on asthma and obesity confirm the effectiveness of M3HOGAT in identifying potential disease-related microbes.
Collapse
|
13
|
He L, Zou Q, Dai Q, Cheng S, Wang Y. Adversarial regularized autoencoder graph neural network for microbe-disease associations prediction. Brief Bioinform 2024; 25:bbae584. [PMID: 39528423 PMCID: PMC11554402 DOI: 10.1093/bib/bbae584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 10/09/2024] [Accepted: 10/30/2024] [Indexed: 11/16/2024] Open
Abstract
BACKGROUND Microorganisms inhabit various regions of the human body and significantly contribute to numerous diseases. Predicting the associations between microbes and diseases is crucial for understanding pathogenic mechanisms and informing prevention and treatment strategies. Biological experiments to determine these associations are time-consuming and costly. Therefore, integrating deep learning with biological networks can efficiently identify potential microbe-disease associations on a large scale. METHODS We propose an adversarial regularized autoencoder graph neural network algorithm, named Stacked Adversarial Regularization for Microbe-Disease Associations Prediction (SARMDA), for predicting associations between microbes and diseases. First, we integrate topological structural similarity and functional similarity metrics of microbes and diseases to construct a heterogeneous network. Then, utilizing an autoencoder based on GraphSAGE, we learn both the topological and attribute representations of nodes within the constructed network. Finally, we introduce an adversarial regularized autoencoder graph neural network embedding model to address the inherent limitations of traditional GraphSAGE autoencoders in capturing global information. RESULTS Under the five-fold cross-validation on microbe-disease pairs, SARMDA was compared with eight advanced methods using the Human Microbe-Disease Association Database (HMDAD) and Disbiome databases. The best area under the ROC curve (AUC) achieved by SARMDA on HMDAD was 0.9891$\pm$0.0057, and the best area under the precision-recall curve (AUPR) was 0.9902$\pm$0.0128. On the Disbiome dataset, the AUC was 0.9328$\pm$0.0072, and the best AUPR was 0.9233$\pm$0.0089, outperforming the other eight MDAs prediction methods. Furthermore, the effectiveness of our model was demonstrated through a detailed analysis of asthma and inflammatory bowel disease cases.
Collapse
Affiliation(s)
- Limuxuan He
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Qingshuihe Campus, 2006 Xiyuan Avenue, West District, High-tech Zone, Chengdu, Sichuan 610054, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Qingshuihe Campus, 2006 Xiyuan Avenue, West District, High-tech Zone, Chengdu, Sichuan 610054, China
- School of Information Technology and Administration, Hunan University of Finance and Economics, 139, 2nd Fenglin Road, Yuelu District, Changsha, Hunan 410205, China
| | - Qi Dai
- College of Life Science and Medicine, Zhejiang Sci-Tech University, No. 5 Second Avenue, Xiasha Higher Education Zone, Hangzhou, Zhejiang 310018, PR China
| | - Shuang Cheng
- Institute of Materials, China Academy of Engineering Physics, Huafeng Xincun No. 9, Jiangyou, Mianyang, Sichuan 621907, China
| | - Yansu Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Qingshuihe Campus, 2006 Xiyuan Avenue, West District, High-tech Zone, Chengdu, Sichuan 610054, China
| |
Collapse
|
14
|
Chen H, Chen K. Predicting disease-associated microbes based on similarity fusion and deep learning. Brief Bioinform 2024; 25:bbae550. [PMID: 39504483 PMCID: PMC11540060 DOI: 10.1093/bib/bbae550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 09/15/2024] [Accepted: 10/14/2024] [Indexed: 11/08/2024] Open
Abstract
Increasing studies have revealed the critical roles of human microbiome in a wide variety of disorders. Identification of disease-associated microbes might improve our knowledge and understanding of disease pathogenesis and treatment. Computational prediction of microbe-disease associations would provide helpful guidance for further biomedical screening, which has received lots of research interest in bioinformatics. In this study, a deep learning-based computational approach entitled SGJMDA is presented for predicting microbe-disease associations. Specifically, SGJMDA first fuses multiple similarities of microbes and diseases using a nonlinear strategy, and extracts feature information from homogeneous networks composed of the fused similarities via a graph convolution network. Second, a heterogeneous microbe-disease network is built to further capture the structural information of microbes and diseases by employing multi-neighborhood graph convolution network and jumping knowledge network. Finally, potential microbe-disease associations are inferred through computing the linear correlation coefficients of their embeddings. Results from cross-validation experiments show that SGJMDA outperforms 6 state-of-the-art computational methods. Furthermore, we carry out case studies on three important diseases using SGJMDA, in which 19, 20, and 11 predictions out of their top 20 results are successfully checked by the latest databases, respectively. The excellent performance of SGJMDA suggests that it could be a valuable and promising tool for inferring disease-associated microbes.
Collapse
Affiliation(s)
- Hailin Chen
- School of Information and Software Engineering, East China Jiaotong University, Nanchang 330013, China
| | - Kuan Chen
- School of Information and Software Engineering, East China Jiaotong University, Nanchang 330013, China
| |
Collapse
|
15
|
Shi K, Huang K, Li L, Liu Q, Zhang Y, Zheng H. Predicting microbe-disease association based on graph autoencoder and inductive matrix completion with multi-similarities fusion. Front Microbiol 2024; 15:1438942. [PMID: 39355422 PMCID: PMC11443509 DOI: 10.3389/fmicb.2024.1438942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Accepted: 08/02/2024] [Indexed: 10/03/2024] Open
Abstract
Background Clinical studies have demonstrated that microbes play a crucial role in human health and disease. The identification of microbe-disease interactions can provide insights into the pathogenesis and promote the diagnosis, treatment, and prevention of disease. Although a large number of computational methods are designed to screen novel microbe-disease associations, the accurate and efficient methods are still lacking due to data inconsistence, underutilization of prior information, and model performance. Methods In this study, we proposed an improved deep learning-based framework, named GIMMDA, to identify latent microbe-disease associations, which is based on graph autoencoder and inductive matrix completion. By co-training the information from microbe and disease space, the new representations of microbes and diseases are used to reconstruct microbe-disease association in the end-to-end framework. In particular, a similarity fusion strategy is conducted to improve prediction performance. Results The experimental results show that the performance of GIMMDA is competitive with that of existing state-of-the-art methods on 3 datasets (i.e., HMDAD, Disbiome, and multiMDA). In particular, it performs best with the area under the receiver operating characteristic curve (AUC) of 0.9735, 0.9156, 0.9396 on abovementioned 3 datasets, respectively. And the result also confirms that different similarity fusions can improve the prediction performance. Furthermore, case studies on two diseases, i.e., asthma and obesity, validate the effectiveness and reliability of our proposed model. Conclusion The proposed GIMMDA model show a strong capability in predicting microbe-disease associations. We expect that GPUDMDA will help identify potential microbe-related diseases in the future.
Collapse
Affiliation(s)
- Kai Shi
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, China
- Guangxi Key Laboratory of Embedded Technology and Intelligent Systems, Guilin University of Technology, Guilin, China
| | - Kai Huang
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, China
| | - Lin Li
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, China
| | - Qiaohui Liu
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, China
| | - Yi Zhang
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, China
| | - Huilin Zheng
- College of Computer Science and Engineering, Guilin University of Technology, Guilin, China
| |
Collapse
|
16
|
Chen Z, Zhang L, Li J, Chen H. Microbe-disease associations prediction by graph regularized non-negative matrix factorization with L 2 , 1 $$ {L}_{2,1} $$ norm regularization terms. J Cell Mol Med 2024; 28:e18553. [PMID: 39239860 PMCID: PMC11377990 DOI: 10.1111/jcmm.18553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 06/19/2024] [Accepted: 07/09/2024] [Indexed: 09/07/2024] Open
Abstract
Microbes are involved in a wide range of biological processes and are closely associated with disease. Inferring potential disease-associated microbes as the biomarkers or drug targets may help prevent, diagnose and treat complex human diseases. However, biological experiments are time-consuming and expensive. In this study, we introduced a new method called iPALM-GLMF, which modelled microbe-disease association prediction as a problem of non-negative matrix factorization with graph dual regularization terms andL 2 , 1 $$ {L}_{2,1} $$ norm regularization terms. The graph dual regularization terms were used to capture potential features in the microbe and disease space, and theL 2 , 1 $$ {L}_{2,1} $$ norm regularization terms were used to ensure the sparsity of the feature matrices obtained from the non-negative matrix factorization and to improve the interpretability. To solve the model, iPALM-GLMF used a non-negative double singular value decomposition to initialize the matrix factorization and adopted an inertial Proximal Alternating Linear Minimization iterative process to obtain the final matrix factorization results. As a result, iPALM-GLMF performed better than other existing methods in leave-one-out cross-validation and fivefold cross-validation. In addition, case studies of different diseases demonstrated that iPALM-GLMF could effectively predict potential microbial-disease associations. iPALM-GLMF is publicly available at https://github.com/LiangzheZhang/iPALM-GLMF.
Collapse
Affiliation(s)
- Ziwei Chen
- School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing, China
| | - Liangzhe Zhang
- School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing, China
| | - Jingyi Li
- School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing, China
| | - Hang Chen
- School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing, China
| |
Collapse
|
17
|
Zhu H, Hao H, Yu L. Identification of microbe-disease signed associations via multi-scale variational graph autoencoder based on signed message propagation. BMC Biol 2024; 22:172. [PMID: 39148051 PMCID: PMC11328394 DOI: 10.1186/s12915-024-01968-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 08/01/2024] [Indexed: 08/17/2024] Open
Abstract
BACKGROUND Plenty of clinical and biomedical research has unequivocally highlighted the tremendous significance of the human microbiome in relation to human health. Identifying microbes associated with diseases is crucial for early disease diagnosis and advancing precision medicine. RESULTS Considering that the information about changes in microbial quantities under fine-grained disease states helps to enhance a comprehensive understanding of the overall data distribution, this study introduces MSignVGAE, a framework for predicting microbe-disease sign associations using signed message propagation. MSignVGAE employs a graph variational autoencoder to model noisy signed association data and extends the multi-scale concept to enhance representation capabilities. A novel strategy for propagating signed message in signed networks addresses heterogeneity and consistency among nodes connected by signed edges. Additionally, we utilize the idea of denoising autoencoder to handle the noise in similarity feature information, which helps overcome biases in the fused similarity data. MSignVGAE represents microbe-disease associations as a heterogeneous graph using similarity information as node features. The multi-class classifier XGBoost is utilized to predict sign associations between diseases and microbes. CONCLUSIONS MSignVGAE achieves AUROC and AUPR values of 0.9742 and 0.9601, respectively. Case studies on three diseases demonstrate that MSignVGAE can effectively capture a comprehensive distribution of associations by leveraging signed information.
Collapse
Affiliation(s)
- Huan Zhu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Hongxia Hao
- School of Computer Science and Technology, Xidian University, Xi'an, China.
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China.
| |
Collapse
|
18
|
Jacob T, Sindhu S, Hasan A, Malik MZ, Arefanian H, Al-Rashed F, Nizam R, Kochumon S, Thomas R, Bahman F, Shenouda S, Wilson A, Akther N, Al-Roub A, Abukhalaf N, Albeloushi S, Abu-Farha M, Al Madhoun A, Alzaid F, Thanaraj TA, Koistinen HA, Tuomilehto J, Al-Mulla F, Ahmad R. Soybean oil-based HFD induces gut dysbiosis that leads to steatosis, hepatic inflammation and insulin resistance in mice. Front Microbiol 2024; 15:1407258. [PMID: 39165573 PMCID: PMC11334085 DOI: 10.3389/fmicb.2024.1407258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 07/09/2024] [Indexed: 08/22/2024] Open
Abstract
High-fat diets (HFDs) shape the gut microbiome and promote obesity, inflammation, and liver steatosis. Fish and soybean are part of a healthy diet; however, the impact of these fats, in the absence of sucrose, on gut microbial dysbiosis and its association with liver steatosis remains unclear. Here, we investigated the effect of sucrose-free soybean oil-and fish oil-based high fat diets (HFDs) (SF-Soy-HFD and SF-Fish-HFD, respectively) on gut dysbiosis, obesity, steatosis, hepatic inflammation, and insulin resistance. C57BL/6 mice were fed these HFDs for 24 weeks. Both diets had comparable effects on liver and total body weights. But 16S-rRNA sequencing of the gut content revealed induction of gut dysbiosis at different taxonomic levels. The microbial communities were clearly separated, showing differential dysbiosis between the two HFDs. Compared with the SF-Fish-HFD control group, the SF-Soy-HFD group had an increased abundance of Bacteroidetes, Firmicutes, and Deferribacteres, but a lower abundance of Verrucomicrobia. The Clostridia/Bacteroidia (C/B) ratio was higher in the SF-Soy-HFD group (3.11) than in the SF-Fish-HFD group (2.5). Conversely, the Verrucomicrobiacae/S24_7 (also known as Muribaculaceae family) ratio was lower in the SF-Soy-HFD group (0.02) than that in the SF-Fish-HFD group (0.75). The SF-Soy-HFD group had a positive association with S24_7, Clostridiales, Allobaculum, Coriobacteriaceae, Adlercreutzia, Christensenellaceae, Lactococcus, and Oscillospira, but was related to a lower abundance of Akkermansia, which maintains gut barrier integrity. The gut microbiota in the SF-Soy-HFD group had predicted associations with host genes related to fatty liver and inflammatory pathways. Mice fed the SF-Soy-HFD developed liver steatosis and showed increased transcript levels of genes associated with de novo lipogenesis (Acaca, Fasn, Scd1, Elovl6) and cholesterol synthesis (Hmgcr) pathways compared to those in the SF-Fish-HFD-group. No differences were observed in the expression of fat uptake genes (Cd36 and Fabp1). The expression of the fat efflux gene (Mttp) was reduced in the SF-Soy-HFD group. Moreover, hepatic inflammation markers (Tnfa and Il1b) were notably expressed in SF-Soy-HFD-fed mice. In conclusion, SF-Soy-HFD feeding induced gut dysbiosis in mice, leading to steatosis, hepatic inflammation, and impaired glucose homeostasis.
Collapse
Affiliation(s)
- Texy Jacob
- Dasman Diabetes Institute, Dasman, Kuwait
| | | | - Amal Hasan
- Dasman Diabetes Institute, Dasman, Kuwait
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Fawaz Alzaid
- Dasman Diabetes Institute, Dasman, Kuwait
- INSERM UMR-S1151, CNRS UMR-S8253, Institut Necker Enfants Malades, Université Paris Cité, Paris, France
| | | | - Heikki A Koistinen
- Department of Medicine, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
- Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
- Minerva Foundation Institute for Medical Research, Helsinki, Finland
| | - Jaakko Tuomilehto
- Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
- Department of Public Health, University of Helsinki, Helsinki, Finland
| | | | | |
Collapse
|
19
|
Chen J, Zhu Y, Yuan Q. Predicting potential microbe-disease associations based on dual branch graph convolutional network. J Cell Mol Med 2024; 28:e18571. [PMID: 39086148 PMCID: PMC11291560 DOI: 10.1111/jcmm.18571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 06/15/2024] [Accepted: 06/27/2024] [Indexed: 08/02/2024] Open
Abstract
Studying the association between microbes and diseases not only aids in the prevention and diagnosis of diseases, but also provides crucial theoretical support for new drug development and personalized treatment. Due to the time-consuming and costly nature of laboratory-based biological tests to confirm the relationship between microbes and diseases, there is an urgent need for innovative computational frameworks to anticipate new associations between microbes and diseases. Here, we propose a novel computational approach based on a dual branch graph convolutional network (GCN) module, abbreviated as DBGCNMDA, for identifying microbe-disease associations. First, DBGCNMDA calculates the similarity matrix of diseases and microbes by integrating functional similarity and Gaussian association spectrum kernel (GAPK) similarity. Then, semantic information from different biological networks is extracted by two GCN modules from different perspectives. Finally, the scores of microbe-disease associations are predicted based on the extracted features. The main innovation of this method lies in the use of two types of information for microbe/disease similarity assessment. Additionally, we extend the disease nodes to address the issue of insufficient features due to low data dimensionality. We optimize the connectivity between the homogeneous entities using random walk with restart (RWR), and then use the optimized similarity matrix as the initial feature matrix. In terms of network understanding, we design a dual branch GCN module, namely GlobalGCN and LocalGCN, to fine-tune node representations by introducing side information, including homologous neighbour nodes. We evaluate the accuracy of the DBGCNMDA model using five-fold cross-validation (5-fold-CV) technique. The results show that the area under the receiver operating characteristic curve (AUC) and area under the precision versus recall curve (AUPR) of the DBGCNMDA model in the 5-fold-CV are 0.9559 and 0.9630, respectively. The results from the case studies using published experimental data confirm a significant number of predicted associations, indicating that DBGCNMDA is an effective tool for predicting potential microbe-disease associations.
Collapse
Affiliation(s)
- Jing Chen
- School of Electronic and Information EngineeringSuzhou University of Science and TechnologySuzhouChina
| | - Yongjun Zhu
- School of Electronic and Information EngineeringSuzhou University of Science and TechnologySuzhouChina
| | - Qun Yuan
- Department of Respiratory Medicine, The Affiliated Suzhou Hospital of NanjingUniversity Medical SchoolSuzhouChina
| |
Collapse
|
20
|
Zhang C, Zhang Z, Zhang F, Zeng B, Liu X, Wang L. A computational model for potential microbe-disease association detection based on improved graph convolutional networks and multi-channel autoencoders. Front Microbiol 2024; 15:1435408. [PMID: 39144226 PMCID: PMC11322764 DOI: 10.3389/fmicb.2024.1435408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Accepted: 07/05/2024] [Indexed: 08/16/2024] Open
Abstract
Introduction Accumulating evidence shows that human health and disease are closely related to the microbes in the human body. Methods In this manuscript, a new computational model based on graph attention networks and sparse autoencoders, called GCANCAE, was proposed for inferring possible microbe-disease associations. In GCANCAE, we first constructed a heterogeneous network by combining known microbe-disease relationships, disease similarity, and microbial similarity. Then, we adopted the improved GCN and the CSAE to extract neighbor relations in the adjacency matrix and novel feature representations in heterogeneous networks. After that, in order to estimate the likelihood of a potential microbe associated with a disease, we integrated these two types of representations to create unique eigenmatrices for diseases and microbes, respectively, and obtained predicted scores for potential microbe-disease associations by calculating the inner product of these two types of eigenmatrices. Results and discussion Based on the baseline databases such as the HMDAD and the Disbiome, intensive experiments were conducted to evaluate the prediction ability of GCANCAE, and the experimental results demonstrated that GCANCAE achieved better performance than state-of-the-art competitive methods under the frameworks of both 2-fold and 5-fold CV. Furthermore, case studies of three categories of common diseases, such as asthma, irritable bowel syndrome (IBS), and type 2 diabetes (T2D), confirmed the efficiency of GCANCAE.
Collapse
Affiliation(s)
| | - Zhen Zhang
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China
| | | | | | - Xin Liu
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China
| | - Lei Wang
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China
| |
Collapse
|
21
|
Kochumon S, Malik MZ, Sindhu S, Arefanian H, Jacob T, Bahman F, Nizam R, Hasan A, Thomas R, Al-Rashed F, Shenouda S, Wilson A, Albeloushi S, Almansour N, Alhamar G, Al Madhoun A, Alzaid F, Thanaraj TA, Koistinen HA, Tuomilehto J, Al-Mulla F, Ahmad R. Gut Dysbiosis Shaped by Cocoa Butter-Based Sucrose-Free HFD Leads to Steatohepatitis, and Insulin Resistance in Mice. Nutrients 2024; 16:1929. [PMID: 38931284 PMCID: PMC11207001 DOI: 10.3390/nu16121929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 06/05/2024] [Accepted: 06/07/2024] [Indexed: 06/28/2024] Open
Abstract
BACKGROUND High-fat diets cause gut dysbiosis and promote triglyceride accumulation, obesity, gut permeability changes, inflammation, and insulin resistance. Both cocoa butter and fish oil are considered to be a part of healthy diets. However, their differential effects on gut microbiome perturbations in mice fed high concentrations of these fats, in the absence of sucrose, remains to be elucidated. The aim of the study was to test whether the sucrose-free cocoa butter-based high-fat diet (C-HFD) feeding in mice leads to gut dysbiosis that associates with a pathologic phenotype marked by hepatic steatosis, low-grade inflammation, perturbed glucose homeostasis, and insulin resistance, compared with control mice fed the fish oil based high-fat diet (F-HFD). RESULTS C57BL/6 mice (5-6 mice/group) were fed two types of high fat diets (C-HFD and F-HFD) for 24 weeks. No significant difference was found in the liver weight or total body weight between the two groups. The 16S rRNA sequencing of gut bacterial samples displayed gut dysbiosis in C-HFD group, with differentially-altered microbial diversity or relative abundances. Bacteroidetes, Firmicutes, and Proteobacteria were highly abundant in C-HFD group, while the Verrucomicrobia, Saccharibacteria (TM7), Actinobacteria, and Tenericutes were more abundant in F-HFD group. Other taxa in C-HFD group included the Bacteroides, Odoribacter, Sutterella, Firmicutes bacterium (AF12), Anaeroplasma, Roseburia, and Parabacteroides distasonis. An increased Firmicutes/Bacteroidetes (F/B) ratio in C-HFD group, compared with F-HFD group, indicated the gut dysbiosis. These gut bacterial changes in C-HFD group had predicted associations with fatty liver disease and with lipogenic, inflammatory, glucose metabolic, and insulin signaling pathways. Consistent with its microbiome shift, the C-HFD group showed hepatic inflammation and steatosis, high fasting blood glucose, insulin resistance, increased hepatic de novo lipogenesis (Acetyl CoA carboxylases 1 (Acaca), Fatty acid synthase (Fasn), Stearoyl-CoA desaturase-1 (Scd1), Elongation of long-chain fatty acids family member 6 (Elovl6), Peroxisome proliferator-activated receptor-gamma (Pparg) and cholesterol synthesis (β-(hydroxy β-methylglutaryl-CoA reductase (Hmgcr). Non-significant differences were observed regarding fatty acid uptake (Cluster of differentiation 36 (CD36), Fatty acid binding protein-1 (Fabp1) and efflux (ATP-binding cassette G1 (Abcg1), Microsomal TG transfer protein (Mttp) in C-HFD group, compared with F-HFD group. The C-HFD group also displayed increased gene expression of inflammatory markers including Tumor necrosis factor alpha (Tnfa), C-C motif chemokine ligand 2 (Ccl2), and Interleukin-12 (Il12), as well as a tendency for liver fibrosis. CONCLUSION These findings suggest that the sucrose-free C-HFD feeding in mice induces gut dysbiosis which associates with liver inflammation, steatosis, glucose intolerance and insulin resistance.
Collapse
Affiliation(s)
- Shihab Kochumon
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| | - Md. Zubbair Malik
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| | - Sardar Sindhu
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| | - Hossein Arefanian
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| | - Texy Jacob
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| | - Fatemah Bahman
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| | - Rasheeba Nizam
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| | - Amal Hasan
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| | - Reeby Thomas
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| | - Fatema Al-Rashed
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| | - Steve Shenouda
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| | - Ajit Wilson
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| | - Shaima Albeloushi
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| | - Nourah Almansour
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| | - Ghadeer Alhamar
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| | - Ashraf Al Madhoun
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| | - Fawaz Alzaid
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
- Université Paris Cité, INSERM UMR-S1151, CNRS UMR-S8253, Institut Necker Enfants Malades, F-75015 Paris, France
| | - Thangavel Alphonse Thanaraj
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| | - Heikki A. Koistinen
- Department of Medicine, University of Helsinki and Helsinki University Hospital, 00029 Helsinki, Finland;
- Department of Public Health and Welfare, Finnish Institute for Health and Welfare, P.O. Box 30, 00271 Helsinki, Finland;
- Minerva Foundation Institute for Medical Research, 00290 Helsinki, Finland
| | - Jaakko Tuomilehto
- Department of Public Health and Welfare, Finnish Institute for Health and Welfare, P.O. Box 30, 00271 Helsinki, Finland;
- Department of Public Health, University of Helsinki, 00014 Helsinki, Finland
| | - Fahd Al-Mulla
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| | - Rasheed Ahmad
- Dasman Diabetes Institute, Dasman 15462, Kuwait; (S.K.); (M.Z.M.); (S.S.); (H.A.); (T.J.); (F.B.); (R.N.); (A.H.); (R.T.); (F.A.-R.); (S.S.); (A.W.); (S.A.); (N.A.); (G.A.); (A.A.M.); (F.A.); (T.A.T.); (F.A.-M.)
| |
Collapse
|
22
|
Chen R, Xie G, Lin Z, Gu G, Yu Y, Yu J, Liu Z. Predicting Microbe-Disease Associations Based on a Linear Neighborhood Label Propagation Method with Multi-order Similarity Fusion Learning. Interdiscip Sci 2024; 16:345-360. [PMID: 38436840 DOI: 10.1007/s12539-024-00607-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 01/04/2024] [Accepted: 01/05/2024] [Indexed: 03/05/2024]
Abstract
Computational approaches employed for predicting potential microbe-disease associations often rely on similarity information between microbes and diseases. Therefore, it is important to obtain reliable similarity information by integrating multiple types of similarity information. However, existing similarity fusion methods do not consider multi-order fusion of similarity networks. To address this problem, a novel method of linear neighborhood label propagation with multi-order similarity fusion learning (MOSFL-LNP) is proposed to predict potential microbe-disease associations. Multi-order fusion learning comprises two parts: low-order global learning and high-order feature learning. Low-order global learning is used to obtain common latent features from multiple similarity sources. High-order feature learning relies on the interactions between neighboring nodes to identify high-order similarities and learn deeper interactive network structures. Coefficients are assigned to different high-order feature learning modules to balance the similarities learned from different orders and enhance the robustness of the fusion network. Overall, by combining low-order global learning with high-order feature learning, multi-order fusion learning can capture both the shared and unique features of different similarity networks, leading to more accurate predictions of microbe-disease associations. In comparison to six other advanced methods, MOSFL-LNP exhibits superior prediction performance in the leave-one-out cross-validation and 5-fold validation frameworks. In the case study, the predicted 10 microbes associated with asthma and type 1 diabetes have an accuracy rate of up to 90% and 100%, respectively.
Collapse
Affiliation(s)
- Ruibin Chen
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Guobo Xie
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Zhiyi Lin
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Guosheng Gu
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Yi Yu
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Junrui Yu
- School of Computer, Guangdong University of Technology, Guangzhou, 510000, China
| | - Zhenguo Liu
- Department of Thoracic Surgery, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, 510080, China.
| |
Collapse
|
23
|
Chen Z, Zhang L, Li J, Fu M. MLFLHMDA: predicting human microbe-disease association based on multi-view latent feature learning. Front Microbiol 2024; 15:1353278. [PMID: 38371933 PMCID: PMC10869561 DOI: 10.3389/fmicb.2024.1353278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Accepted: 01/17/2024] [Indexed: 02/20/2024] Open
Abstract
Introduction A growing body of research indicates that microorganisms play a crucial role in human health. Imbalances in microbial communities are closely linked to human diseases, and identifying potential relationships between microbes and diseases can help elucidate the pathogenesis of diseases. However, traditional methods based on biological or clinical experiments are costly, so the use of computational models to predict potential microbe-disease associations is of great importance. Methods In this paper, we present a novel computational model called MLFLHMDA, which is based on a Multi-View Latent Feature Learning approach to predict Human potential Microbe-Disease Associations. Specifically, we compute Gaussian interaction profile kernel similarity between diseases and microbes based on the known microbe-disease associations from the Human Microbe-Disease Association Database and perform a preprocessing step on the resulting microbe-disease association matrix, namely, weighting K nearest known neighbors (WKNKN) to reduce the sparsity of the microbe-disease association matrix. To obtain unobserved associations in the microbe and disease views, we extract different latent features based on the geometrical structure of microbes and diseases, and project multi-modal latent features into a common subspace. Next, we introduce graph regularization to preserve the local manifold structure of Gaussian interaction profile kernel similarity and add L p , q -norms to the projection matrix to ensure the interpretability and sparsity of the model. Results The AUC values for global leave-one-out cross-validation and 5-fold cross validation implemented by MLFLHMDA are 0.9165 and 0.8942+/-0.0041, respectively, which perform better than other existing methods. In addition, case studies of different diseases have demonstrated the superiority of the predictive power of MLFLHMDA. The source code of our model and the data are available on https://github.com/LiangzheZhang/MLFLHMDA_master.
Collapse
|
24
|
Zhu H, Hao H, Yu L. Identifying disease-related microbes based on multi-scale variational graph autoencoder embedding Wasserstein distance. BMC Biol 2023; 21:294. [PMID: 38115088 PMCID: PMC10731776 DOI: 10.1186/s12915-023-01796-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 12/05/2023] [Indexed: 12/21/2023] Open
Abstract
BACKGROUND Enormous clinical and biomedical researches have demonstrated that microbes are crucial to human health. Identifying associations between microbes and diseases can not only reveal potential disease mechanisms, but also facilitate early diagnosis and promote precision medicine. Due to the data perturbation and unsatisfactory latent representation, there is a significant room for improvement. RESULTS In this work, we proposed a novel framework, Multi-scale Variational Graph AutoEncoder embedding Wasserstein distance (MVGAEW) to predict disease-related microbes, which had the ability to resist data perturbation and effectively generate latent representations for both microbes and diseases from the perspective of distribution. First, we calculated multiple similarities and integrated them through similarity network confusion. Subsequently, we obtained node latent representations by improved variational graph autoencoder. Ultimately, XGBoost classifier was employed to predict potential disease-related microbes. We also introduced multi-order node embedding reconstruction to enhance the representation capacity. We also performed ablation studies to evaluate the contribution of each section of our model. Moreover, we conducted experiments on common drugs and case studies, including Alzheimer's disease, Crohn's disease, and colorectal neoplasms, to validate the effectiveness of our framework. CONCLUSIONS Significantly, our model exceeded other currently state-of-the-art methods, exhibiting a great improvement on the HMDAD database.
Collapse
Affiliation(s)
- Huan Zhu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Hongxia Hao
- School of Computer Science and Technology, Xidian University, Xi'an, China.
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China.
| |
Collapse
|
25
|
Lu S, Liang Y, Li L, Miao R, Liao S, Zou Y, Yang C, Ouyang D. Predicting potential microbe-disease associations based on auto-encoder and graph convolution network. BMC Bioinformatics 2023; 24:476. [PMID: 38097930 PMCID: PMC10722760 DOI: 10.1186/s12859-023-05611-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 12/11/2023] [Indexed: 12/17/2023] Open
Abstract
The increasing body of research has consistently demonstrated the intricate correlation between the human microbiome and human well-being. Microbes can impact the efficacy and toxicity of drugs through various pathways, as well as influence the occurrence and metastasis of tumors. In clinical practice, it is crucial to elucidate the association between microbes and diseases. Although traditional biological experiments accurately identify this association, they are time-consuming, expensive, and susceptible to experimental conditions. Consequently, conducting extensive biological experiments to screen potential microbe-disease associations becomes challenging. The computational methods can solve the above problems well, but the previous computational methods still have the problems of low utilization of node features and the prediction accuracy needs to be improved. To address this issue, we propose the DAEGCNDF model predicting potential associations between microbes and diseases. Our model calculates four similar features for each microbe and disease. These features are fused to obtain a comprehensive feature matrix representing microbes and diseases. Our model first uses the graph convolutional network module to extract low-rank features with graph information of microbes and diseases, and then uses a deep sparse Auto-Encoder to extract high-rank features of microbe-disease pairs, after which the low-rank and high-rank features are spliced to improve the utilization of node features. Finally, Deep Forest was used for microbe-disease potential relationship prediction. The experimental results show that combining low-rank and high-rank features helps to improve the model performance and Deep Forest has better classification performance than the baseline model.
Collapse
Affiliation(s)
- Shanghui Lu
- Faculty of Innovation Enginee, Macau University of Science and Technology, Avenida Wai Long, Taipa, 999078, Macao, Macao Special Administrative Region of China, China
- School of Mathematics and Physics, Hechi University, No. 42, Longjiang, Hechi, 546300, Guangxi, China
| | - Yong Liang
- Faculty of Innovation Enginee, Macau University of Science and Technology, Avenida Wai Long, Taipa, 999078, Macao, Macao Special Administrative Region of China, China.
- Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China.
| | - Le Li
- Faculty of Innovation Enginee, Macau University of Science and Technology, Avenida Wai Long, Taipa, 999078, Macao, Macao Special Administrative Region of China, China
| | - Rui Miao
- Basic Teaching Department, Zhuhai Campus of Zunyi Medical University, Zhuhai, 519041, Guangdong, China
| | - Shuilin Liao
- Faculty of Innovation Enginee, Macau University of Science and Technology, Avenida Wai Long, Taipa, 999078, Macao, Macao Special Administrative Region of China, China
| | - Yongfu Zou
- School of Mathematics and Physics, Hechi University, No. 42, Longjiang, Hechi, 546300, Guangxi, China
| | - Chengjun Yang
- School of Artificial Intelligence and Manufacturing, Hechi University, No. 42, Longjiang, Hechi, 546300, Guangxi, China
| | - Dong Ouyang
- School of Biomedical Engineering, Guangdong Medical University, No. 1, Xincheng, Zhanjiang, 523808, Guangdong, China
| |
Collapse
|
26
|
Santangelo B, Bada M, Hunter L, Lozupone C. Hypothesizing mechanistic links between microbes and disease using knowledge graphs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.01.569645. [PMID: 38106100 PMCID: PMC10723325 DOI: 10.1101/2023.12.01.569645] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Knowledge graphs have found broad biomedical applications, providing useful representations of complex knowledge. Although plentiful evidence exists linking the gut microbiome to disease, mechanistic understanding of those relationships remains generally elusive. Here we demonstrate the potential of knowledge graphs to hypothesize plausible mechanistic accounts of host-microbe interactions in disease. To do so, we constructed a knowledge graph of linked microbes, genes and metabolites called MGMLink. Using a semantically constrained shortest path search through the graph and a novel path prioritization methodology based on cosine similarity, we show that this knowledge supports inference of mechanistic hypotheses that explain observed relationships between microbes and disease phenotypes. We discuss specific applications of this methodology in inflammatory bowel disease and Parkinson's disease. This approach enables mechanistic hypotheses surrounding the complex interactions between gut microbes and disease to be generated in a scalable and comprehensive manner.
Collapse
|
27
|
Sánchez-Valle J, Valencia A. Molecular bases of comorbidities: present and future perspectives. Trends Genet 2023; 39:773-786. [PMID: 37482451 DOI: 10.1016/j.tig.2023.06.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 06/12/2023] [Accepted: 06/12/2023] [Indexed: 07/25/2023]
Abstract
Co-occurrence of diseases decreases patient quality of life, complicates treatment choices, and increases mortality. Analyses of electronic health records present a complex scenario of comorbidity relationships that vary by age, sex, and cohort under study. The study of similarities between diseases using 'omics data, such as genes altered in diseases, gene expression, proteome, and microbiome, are fundamental to uncovering the origin of, and potential treatment for, comorbidities. Recent studies have produced a first generation of genetic interpretations for as much as 46% of the comorbidities described in large cohorts. Integrating different sources of molecular information and using artificial intelligence (AI) methods are promising approaches for the study of comorbidities. They may help to improve the treatment of comorbidities, including the potential repositioning of drugs.
Collapse
Affiliation(s)
- Jon Sánchez-Valle
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona, 08034, Spain.
| | - Alfonso Valencia
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona, 08034, Spain; ICREA, Barcelona, 08010, Spain.
| |
Collapse
|
28
|
Shen Y, Gao Y, Shi J, Huang Z, Dai R, Fu Y, Zhou Y, Kong W, Cui Q. MicroRNA-disease Network Analysis Repurposes Methotrexate for the Treatment of Abdominal Aortic Aneurysm in Mice. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:1030-1042. [PMID: 36030000 PMCID: PMC10928436 DOI: 10.1016/j.gpb.2022.08.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 07/15/2022] [Accepted: 08/19/2022] [Indexed: 06/15/2023]
Abstract
Abdominal aortic aneurysm (AAA) is a permanent dilatation of the abdominal aorta and is highly lethal. The main purpose of the current study is to search for noninvasive medical therapies for AAA, for which there is currently no effective drug therapy. Network medicine represents a cutting-edge technology, as analysis and modeling of disease networks can provide critical clues regarding the etiology of specific diseases and therapeutics that may be effective. Here, we proposed a novel algorithm to quantify disease relations based on a large accumulated microRNA-disease association dataset and then built a disease network covering 15 disease classes and 304 diseases. Analysis revealed some patterns for these diseases. For instance, diseases tended to be clustered and coherent in the network. Surprisingly, we found that AAA showed the strongest similarity with rheumatoid arthritis and systemic lupus erythematosus, both of which are autoimmune diseases, suggesting that AAA could be one type of autoimmune diseases in etiology. Based on this observation, we further hypothesized that drugs for autoimmune diseases could be repurposed for the prevention and therapy of AAA. Finally, animal experiments confirmed that methotrexate, a drug for autoimmune diseases, was able to alleviate the formation and development of AAA.
Collapse
Affiliation(s)
- Yicong Shen
- Department of Physiology and Pathophysiology, School of Basic Medical Sciences, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing 100191, China
| | - Yuanxu Gao
- Department of Physiology and Pathophysiology, School of Basic Medical Sciences, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing 100191, China; State Key Laboratory of Lunar and Planetary Sciences, Macau University of Science and Technology, Macao Special Administrative Region 999078, China; Department of Biomedical Informatics, Center for Noncoding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing 100191, China
| | - Jiangcheng Shi
- Department of Physiology and Pathophysiology, School of Basic Medical Sciences, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing 100191, China; Department of Biomedical Informatics, Center for Noncoding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing 100191, China
| | - Zhou Huang
- Department of Physiology and Pathophysiology, School of Basic Medical Sciences, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing 100191, China; Department of Biomedical Informatics, Center for Noncoding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing 100191, China
| | - Rongbo Dai
- Department of Physiology and Pathophysiology, School of Basic Medical Sciences, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing 100191, China
| | - Yi Fu
- Department of Physiology and Pathophysiology, School of Basic Medical Sciences, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing 100191, China
| | - Yuan Zhou
- Department of Physiology and Pathophysiology, School of Basic Medical Sciences, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing 100191, China; Department of Biomedical Informatics, Center for Noncoding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing 100191, China
| | - Wei Kong
- Department of Physiology and Pathophysiology, School of Basic Medical Sciences, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing 100191, China.
| | - Qinghua Cui
- Department of Physiology and Pathophysiology, School of Basic Medical Sciences, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing 100191, China; Department of Biomedical Informatics, Center for Noncoding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing 100191, China.
| |
Collapse
|
29
|
Hu X, Liu D, Zhang J, Fan Y, Ouyang T, Luo Y, Zhang Y, Deng L. A comprehensive review and evaluation of graph neural networks for non-coding RNA and complex disease associations. Brief Bioinform 2023; 24:bbad410. [PMID: 37985451 DOI: 10.1093/bib/bbad410] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 10/07/2023] [Accepted: 10/25/2023] [Indexed: 11/22/2023] Open
Abstract
Non-coding RNAs (ncRNAs) play a critical role in the occurrence and development of numerous human diseases. Consequently, studying the associations between ncRNAs and diseases has garnered significant attention from researchers in recent years. Various computational methods have been proposed to explore ncRNA-disease relationships, with Graph Neural Network (GNN) emerging as a state-of-the-art approach for ncRNA-disease association prediction. In this survey, we present a comprehensive review of GNN-based models for ncRNA-disease associations. Firstly, we provide a detailed introduction to ncRNAs and GNNs. Next, we delve into the motivations behind adopting GNNs for predicting ncRNA-disease associations, focusing on data structure, high-order connectivity in graphs and sparse supervision signals. Subsequently, we analyze the challenges associated with using GNNs in predicting ncRNA-disease associations, covering graph construction, feature propagation and aggregation, and model optimization. We then present a detailed summary and performance evaluation of existing GNN-based models in the context of ncRNA-disease associations. Lastly, we explore potential future research directions in this rapidly evolving field. This survey serves as a valuable resource for researchers interested in leveraging GNNs to uncover the complex relationships between ncRNAs and diseases.
Collapse
Affiliation(s)
- Xiaowen Hu
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Dayun Liu
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Jiaxuan Zhang
- Department of Electrical and Computer Engineering, University of California, San Diego,92093 CA, USA
| | - Yanhao Fan
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Tianxiang Ouyang
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Yue Luo
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Yuanpeng Zhang
- school of software, Xinjiang University, 830046 Urumqi, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| |
Collapse
|
30
|
Peng L, Huang L, Tian G, Wu Y, Li G, Cao J, Wang P, Li Z, Duan L. Predicting potential microbe-disease associations with graph attention autoencoder, positive-unlabeled learning, and deep neural network. Front Microbiol 2023; 14:1244527. [PMID: 37789848 PMCID: PMC10543759 DOI: 10.3389/fmicb.2023.1244527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 08/16/2023] [Indexed: 10/05/2023] Open
Abstract
Background Microbes have dense linkages with human diseases. Balanced microorganisms protect human body against physiological disorders while unbalanced ones may cause diseases. Thus, identification of potential associations between microbes and diseases can contribute to the diagnosis and therapy of various complex diseases. Biological experiments for microbe-disease association (MDA) prediction are expensive, time-consuming, and labor-intensive. Methods We developed a computational MDA prediction method called GPUDMDA by combining graph attention autoencoder, positive-unlabeled learning, and deep neural network. First, GPUDMDA computes disease similarity and microbe similarity matrices by integrating their functional similarity and Gaussian association profile kernel similarity, respectively. Next, it learns the feature representation of each microbe-disease pair using graph attention autoencoder based on the obtained disease similarity and microbe similarity matrices. Third, it selects a few reliable negative MDAs based on positive-unlabeled learning. Finally, it takes the learned MDA features and the selected negative MDAs as inputs and designed a deep neural network to predict potential MDAs. Results GPUDMDA was compared with four state-of-the-art MDA identification models (i.e., MNNMDA, GATMDA, LRLSHMDA, and NTSHMDA) on the HMDAD and Disbiome databases under five-fold cross validations on microbes, diseases, and microbe-disease pairs. Under the three five-fold cross validations, GPUDMDA computed the best AUCs of 0.7121, 0.9454, and 0.9501 on the HMDAD database and 0.8372, 0.8908, and 0.8948 on the Disbiome database, respectively, outperforming the other four MDA prediction methods. Asthma is the most common chronic respiratory condition and affects ~339 million people worldwide. Inflammatory bowel disease is a class of globally chronic intestinal disease widely existed in the gut and gastrointestinal tract and extraintestinal organs of patients. Particularly, inflammatory bowel disease severely affects the growth and development of children. We used the proposed GPUDMDA method and found that Enterobacter hormaechei had potential associations with both asthma and inflammatory bowel disease and need further biological experimental validation. Conclusion The proposed GPUDMDA demonstrated the powerful MDA prediction ability. We anticipate that GPUDMDA helps screen the therapeutic clues for microbe-related diseases.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
- College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, China
| | - Liangliang Huang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Geng Tian
- Geneis (Beijing) Co. Ltd., Beijing, China
| | - Yan Wu
- Geneis (Beijing) Co. Ltd., Beijing, China
| | - Guang Li
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
| | - Jianying Cao
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
| | - Peng Wang
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Lian Duan
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
| |
Collapse
|
31
|
Peng W, Liu M, Dai W, Chen T, Fu Y, Pan Y. Multi-View Feature Aggregation for Predicting Microbe-Disease Association. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2748-2758. [PMID: 34871177 DOI: 10.1109/tcbb.2021.3132611] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Microbes play a crucial role in human health and disease. Figuring out the relationship between microbes and diseases leads to significant potential applications in disease treatments. It is an urgent need to devise robust and effective computational methods for identifying disease-related microbes. This work proposes a Multi-View Feature Aggregation (MVFA) scheme that integrates the linear and nonlinear features to identify disease-related microbes. We introduce a non-negative matrix tri-factorization (NMTF) model to extract linear features for diseases and microbes. Then we learn another type of linear feature by utilizing a bi-random walk model. The nonlinear feature is obtained by inputting the two kinds of linear features into a capsule neural network. These three types of features describe the associations between diseases and microbes from different views. Finally, considering the complementary of these features, we leverage a logistic regression model to combine the NMTF model predictions, bi-random walk model predictions, and the capsule neural network predictions to obtain the final microbe-disease pair scores. We apply our method to predict human microbe-disease associations on two datasets. Experimental results show that our multi-view model outperforms the state-of-the-art models in recovering missing microbe-disease associations and predicting associations for new microbes. The ablation study shows that aggregating multi-view linear and nonlinear features can improve the prediction performance. Case studies on two diseases, i.e. Type 1 diabetes and Liver cirrhosis, further validate our method effectiveness.
Collapse
|
32
|
Li J, Wei C, Zhou T, Mo C, Wang G, He F, Wang P, Qin L, Peng F. A display and analysis platform for gut microbiomes of minority people and phenotypic data in China. Sci Rep 2023; 13:14247. [PMID: 37648696 PMCID: PMC10469205 DOI: 10.1038/s41598-023-36754-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 06/09/2023] [Indexed: 09/01/2023] Open
Abstract
The minority people panmicrobial community database (MPPCD website: http://mppmcdb.cloudna.cn/ ) is the first microbe-disease association database of Chinese ethnic minorities. To research the relationships between intestinal microbes and diseases/health in the ethnic minorities, we collected the microbes of the Han people for comparison. Based on the data, such as age, among the different ethnic groups of the different regions of Sichuan Province, MPPCD not only provided the gut microbial composition but also presented the relative abundance value at the phylum, class, order, family and genus levels in different groups. In addition, differential analysis was performed in different microbes in the two different groups, which contributed to exploring the difference in intestinal microbe structures between the two groups. Meanwhile, a series of related factors, including age, sex, body mass index, ethnicity, physical condition, and living altitude, were included in the MPPCD, with special focus on living altitude. To date, this is the first intestinal microbe database to introduce altitude features. In conclusion, we hope that MPPCD will serve as a fundamental research support for the relationship between human gut microbes and host health and disease, especially in ethnic minorities.
Collapse
Affiliation(s)
- Jun Li
- Department of Gastroenterology, The First Affiliated Hospital of Chengdu Medical College, 278# Bao Guang Road, Xindu District, Chengdu, 610000, Sichuan, People's Republic of China.
| | - Chunxue Wei
- Department of Gastroenterology, The First Affiliated Hospital of Chengdu Medical College, 278# Bao Guang Road, Xindu District, Chengdu, 610000, Sichuan, People's Republic of China
| | - Ting Zhou
- Department of Gastroenterology, The Sixth People's Hospital of Chengdu, Chengdu, Sichuan, China
| | - Chunfen Mo
- Department of Immunology, School of Basic Medical Sciences, Chengdu Medical College, Chengdu, Sichuan, China
| | - Guanjun Wang
- Department of Gastroenterology, The First Affiliated Hospital of Chengdu Medical College, 278# Bao Guang Road, Xindu District, Chengdu, 610000, Sichuan, People's Republic of China
| | - Feng He
- Department of Gastroenterology, The First Affiliated Hospital of Chengdu Medical College, 278# Bao Guang Road, Xindu District, Chengdu, 610000, Sichuan, People's Republic of China
| | - Pengyu Wang
- College of Pharmacy, Chengdu Medical College, Chengdu, Sichuan, China
| | - Ling Qin
- Department of Gastroenterology, The First Affiliated Hospital of Chengdu Medical College, 278# Bao Guang Road, Xindu District, Chengdu, 610000, Sichuan, People's Republic of China
| | - Fujun Peng
- Institute of Basic Medicine, Weifang Medical University, 7166# Baotong West Road, Weifang, 261053, Shandong, People's Republic of China.
| |
Collapse
|
33
|
Wang CY, Kuang X, Wang QQ, Zhang GQ, Cheng ZS, Deng ZX, Guo FB. GMMAD: a comprehensive database of human gut microbial metabolite associations with diseases. BMC Genomics 2023; 24:482. [PMID: 37620754 PMCID: PMC10464125 DOI: 10.1186/s12864-023-09599-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 08/17/2023] [Indexed: 08/26/2023] Open
Abstract
BACKGROUND The natural products, metabolites, of gut microbes are crucial effect factors on diseases. Comprehensive identification and annotation of relationships among disease, metabolites, and microbes can provide efficient and targeted solutions towards understanding the mechanism of complex disease and development of new markers and drugs. RESULTS We developed Gut Microbial Metabolite Association with Disease (GMMAD), a manually curated database of associations among human diseases, gut microbes, and metabolites of gut microbes. Here, this initial release (i) contains 3,836 disease-microbe associations and 879,263 microbe-metabolite associations, which were extracted from literatures and available resources and then experienced our manual curation; (ii) defines an association strength score and a confidence score. With these two scores, GMMAD predicted 220,690 disease-metabolite associations, where the metabolites all belong to the gut microbes. We think that the positive effective (with both scores higher than suggested thresholds) associations will help identify disease marker and understand the pathogenic mechanism from the sense of gut microbes. The negative effective associations would be taken as biomarkers and have the potential as drug candidates. Literature proofs supported our proposal with experimental consistence; (iii) provides a user-friendly web interface that allows users to browse, search, and download information on associations among diseases, metabolites, and microbes. The resource is freely available at http://guolab.whu.edu.cn/GMMAD . CONCLUSIONS As the online-available unique resource for gut microbial metabolite-disease associations, GMMAD is helpful for researchers to explore mechanisms of disease- metabolite-microbe and screen the drug and marker candidates for different diseases.
Collapse
Affiliation(s)
- Cheng-Yu Wang
- Department of Respiratory and Critical Care Medicine, Zhongnan Hospital of Wuhan University, Wuhan, China
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education and School of Pharmaceutical Sciences, Wuhan University, Wuhan, China
| | - Xia Kuang
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education and School of Pharmaceutical Sciences, Wuhan University, Wuhan, China
| | - Qiao-Qiao Wang
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education and School of Pharmaceutical Sciences, Wuhan University, Wuhan, China
| | - Gu-Qin Zhang
- Department of Respiratory and Critical Care Medicine, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Zhen-Shun Cheng
- Department of Respiratory and Critical Care Medicine, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Zi-Xin Deng
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education and School of Pharmaceutical Sciences, Wuhan University, Wuhan, China
| | - Feng-Biao Guo
- Department of Respiratory and Critical Care Medicine, Zhongnan Hospital of Wuhan University, Wuhan, China.
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education and School of Pharmaceutical Sciences, Wuhan University, Wuhan, China.
| |
Collapse
|
34
|
Wang L, Wang Y, Xuan C, Zhang B, Wu H, Gao J. Predicting potential microbe-disease associations based on multi-source features and deep learning. Brief Bioinform 2023; 24:bbad255. [PMID: 37406190 DOI: 10.1093/bib/bbad255] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 05/30/2023] [Accepted: 06/20/2023] [Indexed: 07/07/2023] Open
Abstract
Studies have confirmed that the occurrence of many complex diseases in the human body is closely related to the microbial community, and microbes can affect tumorigenesis and metastasis by regulating the tumor microenvironment. However, there are still large gaps in the clinical observation of the microbiota in disease. Although biological experiments are accurate in identifying disease-associated microbes, they are also time-consuming and expensive. The computational models for effective identification of diseases related microbes can shorten this process, and reduce capital and time costs. Based on this, in the paper, a model named DSAE_RF is presented to predict latent microbe-disease associations by combining multi-source features and deep learning. DSAE_RF calculates four similarities between microbes and diseases, which are then used as feature vectors for the disease-microbe pairs. Later, reliable negative samples are screened by k-means clustering, and a deep sparse autoencoder neural network is further used to extract effective features of the disease-microbe pairs. In this foundation, a random forest classifier is presented to predict the associations between microbes and diseases. To assess the performance of the model in this paper, 10-fold cross-validation is implemented on the same dataset. As a result, the AUC and AUPR of the model are 0.9448 and 0.9431, respectively. Furthermore, we also conduct a variety of experiments, including comparison of negative sample selection methods, comparison with different models and classifiers, Kolmogorov-Smirnov test and t-test, ablation experiments, robustness analysis, and case studies on Covid-19 and colorectal cancer. The results fully demonstrate the reliability and availability of our model.
Collapse
Affiliation(s)
- Liugen Wang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Yan Wang
- School of Science, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Chenxu Xuan
- School of Science, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Bai Zhang
- School of Science, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Hanwen Wu
- School of Science, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Jie Gao
- School of Science, Jiangnan University, Wuxi, Jiangsu 214122, China
| |
Collapse
|
35
|
Karkera N, Acharya S, Palaniappan SK. Leveraging pre-trained language models for mining microbiome-disease relationships. BMC Bioinformatics 2023; 24:290. [PMID: 37468830 PMCID: PMC10357883 DOI: 10.1186/s12859-023-05411-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 07/13/2023] [Indexed: 07/21/2023] Open
Abstract
BACKGROUND The growing recognition of the microbiome's impact on human health and well-being has prompted extensive research into discovering the links between microbiome dysbiosis and disease (healthy) states. However, this valuable information is scattered in unstructured form within biomedical literature. The structured extraction and qualification of microbe-disease interactions are important. In parallel, recent advancements in deep-learning-based natural language processing algorithms have revolutionized language-related tasks such as ours. This study aims to leverage state-of-the-art deep-learning language models to extract microbe-disease relationships from biomedical literature. RESULTS In this study, we first evaluate multiple pre-trained large language models within a zero-shot or few-shot learning context. In this setting, the models performed poorly out of the box, emphasizing the need for domain-specific fine-tuning of these language models. Subsequently, we fine-tune multiple language models (specifically, GPT-3, BioGPT, BioMedLM, BERT, BioMegatron, PubMedBERT, BioClinicalBERT, and BioLinkBERT) using labeled training data and evaluate their performance. Our experimental results demonstrate the state-of-the-art performance of these fine-tuned models ( specifically GPT-3, BioMedLM, and BioLinkBERT), achieving an average F1 score, precision, and recall of over [Formula: see text] compared to the previous best of 0.74. CONCLUSION Overall, this study establishes that pre-trained language models excel as transfer learners when fine-tuned with domain and problem-specific data, enabling them to achieve state-of-the-art results even with limited training data for extracting microbiome-disease interactions from scientific publications.
Collapse
Affiliation(s)
| | - Sathwik Acharya
- The Systems Biology Institute, Tokyo, Japan
- PES University, Bengaluru, India
| | - Sucheendra K Palaniappan
- The Systems Biology Institute, Tokyo, Japan.
- Iom Bioworks Pvt Ltd., Bengaluru, India.
- SBX Corporation, Tokyo, Japan.
| |
Collapse
|
36
|
Wang F, Yang H, Wu Y, Peng L, Li X. SAELGMDA: Identifying human microbe-disease associations based on sparse autoencoder and LightGBM. Front Microbiol 2023; 14:1207209. [PMID: 37415823 PMCID: PMC10320730 DOI: 10.3389/fmicb.2023.1207209] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 05/18/2023] [Indexed: 07/08/2023] Open
Abstract
Introduction Identification of complex associations between diseases and microbes is important to understand the pathogenesis of diseases and design therapeutic strategies. Biomedical experiment-based Microbe-Disease Association (MDA) detection methods are expensive, time-consuming, and laborious. Methods Here, we developed a computational method called SAELGMDA for potential MDA prediction. First, microbe similarity and disease similarity are computed by integrating their functional similarity and Gaussian interaction profile kernel similarity. Second, one microbe-disease pair is presented as a feature vector by combining the microbe and disease similarity matrices. Next, the obtained feature vectors are mapped to a low-dimensional space based on a Sparse AutoEncoder. Finally, unknown microbe-disease pairs are classified based on Light Gradient boosting machine. Results The proposed SAELGMDA method was compared with four state-of-the-art MDA methods (MNNMDA, GATMDA, NTSHMDA, and LRLSHMDA) under five-fold cross validations on diseases, microbes, and microbe-disease pairs on the HMDAD and Disbiome databases. The results show that SAELGMDA computed the best accuracy, Matthews correlation coefficient, AUC, and AUPR under the majority of conditions, outperforming the other four MDA prediction models. In particular, SAELGMDA obtained the best AUCs of 0.8358 and 0.9301 under cross validation on diseases, 0.9838 and 0.9293 under cross validation on microbes, and 0.9857 and 0.9358 under cross validation on microbe-disease pairs on the HMDAD and Disbiome databases. Colorectal cancer, inflammatory bowel disease, and lung cancer are diseases that severely threat human health. We used the proposed SAELGMDA method to find possible microbes for the three diseases. The results demonstrate that there are potential associations between Clostridium coccoides and colorectal cancer and one between Sphingomonadaceae and inflammatory bowel disease. In addition, Veillonella may associate with autism. The inferred MDAs need further validation. Conclusion We anticipate that the proposed SAELGMDA method contributes to the identification of new MDAs.
Collapse
Affiliation(s)
- Feixiang Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Huandong Yang
- Department of Gastrointestinal Surgery, Yidu Central Hospital of Weifang, Weifang, China
| | - Yan Wu
- Geneis (Beijing) Co., Ltd., Beijing, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Xiaoling Li
- The Second Department of Oncology, Beidahuang Industry Group General Hospital, Harbin, China
- The Second Department of Oncology, Heilongjiang Second Cancer Hospital, Harbin, China
| |
Collapse
|
37
|
Shen K, Din AU, Sinha B, Zhou Y, Qian F, Shen B. Translational informatics for human microbiota: data resources, models and applications. Brief Bioinform 2023; 24:7152256. [PMID: 37141135 DOI: 10.1093/bib/bbad168] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 04/07/2023] [Accepted: 04/11/2023] [Indexed: 05/05/2023] Open
Abstract
With the rapid development of human intestinal microbiology and diverse microbiome-related studies and investigations, a large amount of data have been generated and accumulated. Meanwhile, different computational and bioinformatics models have been developed for pattern recognition and knowledge discovery using these data. Given the heterogeneity of these resources and models, we aimed to provide a landscape of the data resources, a comparison of the computational models and a summary of the translational informatics applied to microbiota data. We first review the existing databases, knowledge bases, knowledge graphs and standardizations of microbiome data. Then, the high-throughput sequencing techniques for the microbiome and the informatics tools for their analyses are compared. Finally, translational informatics for the microbiome, including biomarker discovery, personalized treatment and smart healthcare for complex diseases, are discussed.
Collapse
Affiliation(s)
- Ke Shen
- Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| | - Ahmad Ud Din
- Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| | - Baivab Sinha
- Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| | - Yi Zhou
- Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| | - Fuliang Qian
- Center for Systems Biology, Suzhou Medical College of Soochow University, Suzhou 215123, China
- Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Suzhou 215123, China
| | - Bairong Shen
- Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| |
Collapse
|
38
|
Shokri Garjan H, Omidi Y, Poursheikhali Asghari M, Ferdousi R. In-silico computational approaches to study microbiota impacts on diseases and pharmacotherapy. Gut Pathog 2023; 15:10. [PMID: 36882861 PMCID: PMC9990230 DOI: 10.1186/s13099-023-00535-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 02/21/2023] [Indexed: 03/09/2023] Open
Abstract
Microorganisms have been linked to a variety of critical human disease, thanks to advances in sequencing technology and microbiology. The growing recognition of human microbe-disease relationships provides crucial insights into the underlying disease process from the perspective of pathogens, which is extremely useful for pathogenesis research, early diagnosis, and precision medicine and therapy. Microbe-based analysis in terms of diseases and related drug discovery can predict new connections/mechanisms and provide new concepts. These phenomena have been studied via various in-silico computational approaches. This review aims to elaborate on the computational works conducted on the microbe-disease and microbe-drug topics, discuss the computational model approaches used for predicting associations and provide comprehensive information on the related databases. Finally, we discussed potential prospects and obstacles in this field of study, while also outlining some recommendations for further enhancing predictive capabilities.
Collapse
Affiliation(s)
- Hassan Shokri Garjan
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Yadollah Omidi
- Department of Pharmaceutical Sciences, Nova Southeastern University, College of Pharmacy, Fort Lauderdale, FL, USA
| | | | - Reza Ferdousi
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran.
| |
Collapse
|
39
|
Jiang C, Tang M, Jin S, Huang W, Liu X. KGNMDA: A Knowledge Graph Neural Network Method for Predicting Microbe-Disease Associations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1147-1155. [PMID: 35724280 DOI: 10.1109/tcbb.2022.3184362] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Accumulated studies discovered that various microbes in human bodies were closely related to complex human diseases and could provide new insight into drug development. Multiple computational methods were constructed to predict microbes that were potentially associated with diseases. However, most previous methods were based on single characteristics of microbes or diseases, that lacked important biological information related to microorganisms or diseases. Therefore, we constructed a knowledge graph centered on microorganisms and diseases from several existed databases to provide knowledgeable information for microbes and diseases. Then, we adopted a graph neural network method to learn representations of microbes and diseases from the constructed knowledge graph. After that, we introduced the Gaussian kernel similarity features of microbes and diseases to generate final representations of microbes and diseases. At last, we proposed a score function on final representations of microbes and diseases to predict scores of microbe-disease associations. Comprehensive experiments on the Human Microbe-Disease Association Database (HMDAD) dataset had demonstrated that our approach outperformed baseline methods. Furthermore, we implemented case studies on two important diseases (asthma and inflammatory bowel disease), the result demonstrated that our proposed model was effective in revealing the relationship between diseases and microbes. The source code of our model and the data were available on https://github.com/ChangzhiJiang/KGNMDA_master.
Collapse
|
40
|
Shi K, Li L, Wang Z, Chen H, Chen Z, Fang S. Identifying microbe-disease association based on graph convolutional attention network: Case study of liver cirrhosis and epilepsy. Front Neurosci 2023; 16:1124315. [PMID: 36741060 PMCID: PMC9892757 DOI: 10.3389/fnins.2022.1124315] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 12/31/2022] [Indexed: 01/20/2023] Open
Abstract
The interactions between the microbiota and the human host can affect the physiological functions of organs (such as the brain, liver, gut, etc.). Accumulating investigations indicate that the imbalance of microbial community is closely related to the occurrence and development of diseases. Thus, the identification of potential links between microbes and diseases can provide insight into the pathogenesis of diseases. In this study, we propose a deep learning framework (MDAGCAN) based on graph convolutional attention network to identify potential microbe-disease associations. In MDAGCAN, we first construct a heterogeneous network consisting of the known microbe-disease associations and multi-similarity fusion networks of microbes and diseases. Then, the node embeddings considering the neighbor information of the heterogeneous network are learned by applying graph convolutional layers and graph attention layers. Finally, a bilinear decoder using node embedding representations reconstructs the unknown microbe-disease association. Experiments show that our method achieves reliable performance with average AUCs of 0.9778 and 0.9454 ± 0.0038 in the frameworks of Leave-one-out cross validation (LOOCV) and 5-fold cross validation (5-fold CV), respectively. Furthermore, we apply MDAGCAN to predict latent microbes for two high-risk human diseases, i.e., liver cirrhosis and epilepsy, and results illustrate that 16 and 17 out of the top 20 predicted microbes are verified by published literatures, respectively. In conclusion, our method displays effective and reliable prediction performance and can be expected to predict unknown microbe-disease associations facilitating disease diagnosis and prevention.
Collapse
Affiliation(s)
- Kai Shi
- College of Information Science and Engineering, Guilin University of Technology, Guilin, China
- Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, Guilin, China
| | - Lin Li
- College of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Zhengfeng Wang
- College of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Huazhou Chen
- College of Science, Guilin University of Technology, Guilin, China
| | - Zilin Chen
- Department of Developmental and Behavioural Pediatric Department & Department of Child Primary Care, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shuanfeng Fang
- Department of Children Health Care, Children’s Hospital Affiliated to Zhengzhou University, Zhengzhou, China
| |
Collapse
|
41
|
Liu H, Bing P, Zhang M, Tian G, Ma J, Li H, Bao M, He K, He J, He B, Yang J. MNNMDA: Predicting human microbe-disease association via a method to minimize matrix nuclear norm. Comput Struct Biotechnol J 2023; 21:1414-1423. [PMID: 36824227 PMCID: PMC9941872 DOI: 10.1016/j.csbj.2022.12.053] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 12/29/2022] [Accepted: 12/30/2022] [Indexed: 01/03/2023] Open
Abstract
Identifying the potential associations between microbes and diseases is the first step for revealing the pathological mechanisms of microbe-associated diseases. However, traditional culture-based microbial experiments are expensive and time-consuming. Thus, it is critical to prioritize disease-associated microbes by computational methods for further experimental validation. In this study, we proposed a novel method called MNNMDA, to predict microbe-disease associations (MDAs) by applying a Matrix Nuclear Norm method into known microbe and disease data. Specifically, we first calculated Gaussian interaction profile kernel similarity and functional similarity for diseases and microbes. Then we constructed a heterogeneous information network by combining the integrated disease similarity network, the integrated microbe similarity network and the known microbe-disease bipartite network. Finally, we formulated the microbe-disease association prediction problem as a low-rank matrix completion problem, which was solved by minimizing the nuclear norm of a matrix with a few regularization terms. We tested the performances of MNNMDA in three datasets including HMDAD, Disbiome, and Combined Data with small, medium and large sizes respectively. We also compared MNNMDA with 5 state-of-the-art methods including KATZHMDA, LRLSHMDA, NTSHMDA, GATMDA, and KGNMDA, respectively. MNNMDA achieved area under the ROC curves (AUROC) of 0.9536 and 0.9364 respectively on HDMAD and Disbiome, better than the AUCs of compared methods under the 5-fold cross-validation for all microbe-disease associations. It also obtained a relatively good performance with AUROC 0.8858 in the combined data. In addition, MNNMDA was also better than other methods in area under precision and recall curve (AUPR) under the 5-fold cross-validation for all associations, and in both AUROC and AUPR under the 5-fold cross-validation for diseases and the 5-fold cross-validation for microbes. Finally, the case studies on colon cancer and inflammatory bowel disease (IBD) also validated the effectiveness of MNNMDA. In conclusion, MNNMDA is an effective method in predicting microbe-disease associations. Availability The codes and data for this paper are freely available at Github https://github.com/Haiyan-Liu666/MNNMDA.
Collapse
Affiliation(s)
- Haiyan Liu
- Academician Workstation, Changsha Medical University, Changsha 410219, PR China,College of Information Engineering, Changsha Medical University, Changsha 410219, PR China,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, PR China
| | - Pingping Bing
- Academician Workstation, Changsha Medical University, Changsha 410219, PR China
| | - Meijun Zhang
- Geneis Beijing Co., Ltd., Beijing 100102, PR China
| | - Geng Tian
- Geneis Beijing Co., Ltd., Beijing 100102, PR China
| | - Jun Ma
- College of Information Engineering, Changsha Medical University, Changsha 410219, PR China
| | - Haigang Li
- Academician Workstation, Changsha Medical University, Changsha 410219, PR China,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, PR China,School of pharmacy, Changsha Medical University, Changsha 410219, PR China
| | - Meihua Bao
- Academician Workstation, Changsha Medical University, Changsha 410219, PR China,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, PR China,School of pharmacy, Changsha Medical University, Changsha 410219, PR China
| | - Kunhui He
- Academician Workstation, Changsha Medical University, Changsha 410219, PR China,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, PR China,School of pharmacy, Changsha Medical University, Changsha 410219, PR China
| | - Jianjun He
- Academician Workstation, Changsha Medical University, Changsha 410219, PR China,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, PR China,School of pharmacy, Changsha Medical University, Changsha 410219, PR China,Corresponding authors at: Academician Workstation, Changsha Medical University, Changsha 410219, PR China.
| | - Binsheng He
- Academician Workstation, Changsha Medical University, Changsha 410219, PR China,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, PR China,School of pharmacy, Changsha Medical University, Changsha 410219, PR China,Corresponding authors at: Academician Workstation, Changsha Medical University, Changsha 410219, PR China.
| | - Jialiang Yang
- Academician Workstation, Changsha Medical University, Changsha 410219, PR China,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, PR China,Geneis Beijing Co., Ltd., Beijing 100102, PR China,School of pharmacy, Changsha Medical University, Changsha 410219, PR China,Corresponding authors at: Academician Workstation, Changsha Medical University, Changsha 410219, PR China.
| |
Collapse
|
42
|
Hu W, Yang X, Wang L, Zhu X. MADGAN:A microbe-disease association prediction model based on generative adversarial networks. Front Microbiol 2023; 14:1159076. [PMID: 37032881 PMCID: PMC10076708 DOI: 10.3389/fmicb.2023.1159076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Accepted: 03/02/2023] [Indexed: 04/11/2023] Open
Abstract
Researches have demonstrated that microorganisms are indispensable for the nutrition transportation, growth and development of human bodies, and disorder and imbalance of microbiota may lead to the occurrence of diseases. Therefore, it is crucial to study relationships between microbes and diseases. In this manuscript, we proposed a novel prediction model named MADGAN to infer potential microbe-disease associations by combining biological information of microbes and diseases with the generative adversarial networks. To our knowledge, it is the first attempt to use the generative adversarial network to complete this important task. In MADGAN, we firstly constructed different features for microbes and diseases based on multiple similarity metrics. And then, we further adopted graph convolution neural network (GCN) to derive different features for microbes and diseases automatically. Finally, we trained MADGAN to identify latent microbe-disease associations by games between the generation network and the decision network. Especially, in order to prevent over-smoothing during the model training process, we introduced the cross-level weight distribution structure to enhance the depth of the network based on the idea of residual network. Moreover, in order to validate the performance of MADGAN, we conducted comprehensive experiments and case studies based on databases of HMDAD and Disbiome respectively, and experimental results demonstrated that MADGAN not only achieved satisfactory prediction performances, but also outperformed existing state-of-the-art prediction models.
Collapse
Affiliation(s)
- Weixin Hu
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
| | - Xiaoyu Yang
- Institute of Bioinformatics Complex Network Big Data, Changsha University, Changsha, China
| | - Lei Wang
- Institute of Bioinformatics Complex Network Big Data, Changsha University, Changsha, China
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China
- *Correspondence: Lei Wang,
| | - Xianyou Zhu
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
- Xianyou Zhu,
| |
Collapse
|
43
|
Yang X, Xu W, Leng D, Wen Y, Wu L, Li R, Huang J, Bo X, He S. Exploring novel disease-disease associations based on multi-view fusion network. Comput Struct Biotechnol J 2023; 21:1807-1819. [PMID: 36923471 PMCID: PMC10009443 DOI: 10.1016/j.csbj.2023.02.038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 02/02/2023] [Accepted: 02/22/2023] [Indexed: 03/06/2023] Open
Abstract
Established taxonomy system based on disease symptom and tissue characteristics have provided an important basis for physicians to correctly identify diseases and treat them successfully. However, these classifications tend to be based on phenotypic observations, lacking a molecular biological foundation. Therefore, there is an urgent to integrate multi-dimensional molecular biological information or multi-omics data to redefine disease classification in order to provide a powerful perspective for understanding the molecular structure of diseases. Therefore, we offer a flexible disease classification that integrates the biological process, gene expression, and symptom phenotype of diseases, and propose a disease-disease association network based on multi-view fusion. We applied the fusion approach to 223 diseases and divided them into 24 disease clusters. The contribution of internal and external edges of disease clusters were analyzed. The results of the fusion model were compared with Medical Subject Headings, a traditional and commonly used disease taxonomy. Then, experimental results of model performance comparison show that our approach performs better than other integration methods. As it was observed, the obtained clusters provided more interesting and novel disease-disease associations. This multi-view human disease association network describes relationships between diseases based on multiple molecular levels, thus breaking through the limitation of the disease classification system based on tissues and organs. This approach which motivates clinicians and researchers to reposition the understanding of diseases and explore diagnosis and therapy strategies, extends the existing disease taxonomy. Availability of data and materials The preprocessed dataset and source code supporting the conclusions of this article are available at GitHub repository https://github.com/yangxiaoxi89/mvHDN.
Collapse
Affiliation(s)
- Xiaoxi Yang
- Clinical Medicine Institute, Beijing Friendship Hospital, Capital Medical University, Beijing 100050, China.,Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Wenjian Xu
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China.,Rare Disease Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing 100045, China.,MOE Key Laboratory of Major Diseases in Children, Beijing 100045, China.,Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute, Beijing 100045, China
| | - Dongjin Leng
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yuqi Wen
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Lianlian Wu
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Ruijiang Li
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Jian Huang
- Clinical Medicine Institute, Beijing Friendship Hospital, Capital Medical University, Beijing 100050, China
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| |
Collapse
|
44
|
Liu JX, Yin MM, Gao YL, Shang J, Zheng CH. MSF-LRR: Multi-Similarity Information Fusion Through Low-Rank Representation to Predict Disease-Associated Microbes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:534-543. [PMID: 35085090 DOI: 10.1109/tcbb.2022.3146176] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
An Increase in microbial activity is shown to be intimately connected with the pathogenesis of diseases. Considering the expense of traditional verification methods, researchers are working to develop high-efficiency methods for detecting potential disease-related microbes. In this article, a new prediction method, MSF-LRR, is established, which uses Low-Rank Representation (LRR) to perform multi-similarity information fusion to predict disease-related microbes. Considering that most existing methods only use one class of similarity, three classes of microbe and disease similarity are added. Then, LRR is used to obtain low-rank structural similarity information. Additionally, the method adaptively extracts the local low-rank structure of the data from a global perspective, to make the information used for the prediction more effective. Finally, a neighbor-based prediction method that utilizes the concept of collaborative filtering is applied to predict unknown microbe-disease pairs. As a result, the AUC value of MSF-LRR is superior to other existing algorithms under 5-fold cross-validation. Furthermore, in case studies, excluding originally known associations, 16 and 19 of the top 20 microbes associated with Bacterial Vaginosis and Irritable Bowel Syndrome, respectively, have been confirmed by the recent literature. In summary, MSF-LRR is a good predictor of potential microbe-disease associations and can contribute to drug discovery and biological research.
Collapse
|
45
|
Gong H, You X, Jin M, Meng Y, Zhang H, Yang S, Xu J. Graph neural network and multi-data heterogeneous networks for microbe-disease prediction. Front Microbiol 2022; 13:1077111. [PMID: 36620040 PMCID: PMC9814480 DOI: 10.3389/fmicb.2022.1077111] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Accepted: 11/30/2022] [Indexed: 12/24/2022] Open
Abstract
The research on microbe association networks is greatly significant for understanding the pathogenic mechanism of microbes and promoting the application of microbes in precision medicine. In this paper, we studied the prediction of microbe-disease associations based on multi-data biological network and graph neural network algorithm. The HMDAD database provided a dataset that included 39 diseases, 292 microbes, and 450 known microbe-disease associations. We proposed a Microbe-Disease Heterogeneous Network according to the microbe similarity network, disease similarity network, and known microbe-disease associations. Furthermore, we integrated the network into the graph convolutional neural network algorithm and developed the GCNN4Micro-Dis model to predict microbe-disease associations. Finally, the performance of the GCNN4Micro-Dis model was evaluated via 5-fold cross-validation. We randomly divided all known microbe-disease association data into five groups. The results showed that the average AUC value and standard deviation were 0.8954 ± 0.0030. Our model had good predictive power and can help identify new microbe-disease associations. In addition, we compared GCNN4Micro-Dis with three advanced methods to predict microbe-disease associations, KATZHMDA, BiRWHMDA, and LRLSHMDA. The results showed that our method had better prediction performance than the other three methods. Furthermore, we selected breast cancer as a case study and found the top 12 microbes related to breast cancer from the intestinal flora of patients, which further verified the model's accuracy.
Collapse
Affiliation(s)
- Houwu Gong
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China,Academy of Military Sciences, Beijing, China
| | - Xiong You
- Center of Rehabilitation Diagnosis and Treatment, Hunan Provincial Rehabilitation Hospital, Changsha, China
| | - Min Jin
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China,*Correspondence: Min Jin, ✉
| | - Yajie Meng
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China
| | - Hanxue Zhang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Shuaishuai Yang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Junlin Xu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China,Junlin Xu, ✉
| |
Collapse
|
46
|
Minadakis G, Tomazou M, Dietis N, Spyrou GM. Vir2Drug: a drug repurposing framework based on protein similarities between pathogens. Brief Bioinform 2022; 24:6895455. [PMID: 36513376 PMCID: PMC9851336 DOI: 10.1093/bib/bbac536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 10/25/2022] [Accepted: 11/08/2022] [Indexed: 12/15/2022] Open
Abstract
We draw from the assumption that similarities between pathogens at both pathogen protein and host protein level, may provide the appropriate framework to identify and rank candidate drugs to be used against a specific pathogen. Vir2Drug is a drug repurposing tool that uses network-based approaches to identify and rank candidate drugs for a specific pathogen, combining information obtained from: (a) ranked pathogen-to-pathogen networks based on protein similarities between pathogens, (b) taxonomy distance between pathogens and (c) drugs targeting specific pathogen's and host proteins. The underlying pathogen networks are used to screen drugs by means of specific methodologies that account for either the host or pathogen's protein targets. Vir2Drug is a useful and yet informative tool for drug repurposing against known or unknown pathogens especially in periods where the emergence for repurposed drugs plays significant role in handling viral outbreaks, until reaching a vaccine. The web tool is available at: https://bioinformatics.cing.ac.cy/vir2drug, https://vir2drug.cing-big.hpcf.cyi.ac.cy.
Collapse
Affiliation(s)
- George Minadakis
- Corresponding author: George Minadakis, Bioinformatics Department, The Cyprus Institute of Neurology & Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, PO Box 23462, 1683 Nicosia, Cyprus. Tel.: +357-22-392852; Fax: +357-22-358238; E-mail:
| | - Marios Tomazou
- Bioinformatics Department, The Cyprus Institute of Neurology & Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus
- PO Box 23462, 1683 Nicosia, Cyprus,The Cyprus School of Molecular Medicine, 6 Iroon Avenue, 2371 Ayios Dometios, PO Box 23462, 1683 Nicosia, Cyprus
| | - Nikolas Dietis
- Medical School, University of Cyprus, Nicosia 1678, Cyprus
| | - George M Spyrou
- Bioinformatics Department, The Cyprus Institute of Neurology & Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus
- PO Box 23462, 1683 Nicosia, Cyprus,The Cyprus School of Molecular Medicine, 6 Iroon Avenue, 2371 Ayios Dometios, PO Box 23462, 1683 Nicosia, Cyprus
| |
Collapse
|
47
|
Liu D, Liu J, Luo Y, He Q, Deng L. MGATMDA: Predicting Microbe-Disease Associations via Multi-Component Graph Attention Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3578-3585. [PMID: 34587092 DOI: 10.1109/tcbb.2021.3116318] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Microbes are parasitic in various human body organs and play significant roles in a wide range of diseases. Identifying microbe-disease associations is conducive to the identification of potential drug targets. Considering the high cost and risk of biological experiments, developing computational approaches to explore the relationship between microbes and diseases is an alternative choice. However, most existing methods are based on unreliable or noisy similarity, and the prediction accuracy could be affected. Besides, it is still a great challenge for most previous methods to make predictions for the large-scale dataset. In this work, we develop a multi-component Graph Attention Network (GAT) based framework, termed MGATMDA, for predicting microbe-disease associations. MGATMDA is built on a bipartite graph of microbes and diseases. It contains three essential parts: decomposer, combiner, and predictor. The decomposer first decomposes the edges in the bipartite graph to identify the latent components by node-level attention mechanism. The combiner then recombines these latent components automatically to obtain unified embedding for prediction by component-level attention mechanism. Finally, a fully connected network is used to predict unknown microbes-disease associations. Experimental results showed that our proposed method outperformed eight state-of-the-art methods. Case studies for two common diseases further demonstrated the effectiveness of MGATMDA in predicting potential microbe-disease associations. The codes are available at Github https://github.com/dayunliu/MGATMDA.
Collapse
|
48
|
Hua M, Yu S, Liu T, Yang X, Wang H. MVGCNMDA: Multi-view Graph Augmentation Convolutional Network for Uncovering Disease-Related Microbes. Interdiscip Sci 2022; 14:669-682. [PMID: 35428964 DOI: 10.1007/s12539-022-00514-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 03/06/2022] [Accepted: 03/13/2022] [Indexed: 06/14/2023]
Abstract
MOTIVATION Exploring the interrelationships between microbes and disease can help microbiologists make decisions and plan treatments. Predicting new microbe-disease associations currently relies on biological experiments and domain knowledge, which is time-consuming and inefficient. Automated algorithms are used to uncover the intrinsic link between microbes and disease. However, due to data noise and inadequate understanding of relevant biology, the efficient prediction of microbe-disease associations is still crucial. This study develops a multi-view graph augmentation convolutional network (MVGCNMDA) to predict potential disease-associated microbes. METHODS First, we use two data augmentation methods, edge perturbation and node dropping, to remove the data noise in the preprocessing stage. Second, we calculate Gaussian interaction profile kernel similarity and cosine similarity. Therefore, the Graph Convolutional Network(GCN) can fully use multi-view features. Then, the multi-view features are fed into the multi-attention block to learn the weights of different features adaptively. Finally, the embedding results are obtained using a Convolutional Neural Network (CNN) combiner, and the matrix completion is used to predict the relationship between potential microbes and diseases. RESULTS We test our model on the Human microbe-disease Association Database (HMDAD), Disbiome, and the Combined Dataset (Peryton and MicroPhenoDB). The area under PR curve (AUPR), area under ROC curve (AUC), F1 score, and RECALL value are calculated to evaluate the performance of the developed MVGCNMDA. The AUPR is 0.9440, AUC is 0.9428, F1 score is 0.9383, and RECALL value is 0.8858. The experiments show that our model can accurately predict potential microbe-disease associations compared with the state-of-the-art works on the global Leave-One-Out-Cross-Validation (LOOCV) and the fivefold Cross-Validation (fivefold CV). To further verify the effectiveness of the proposed graph data augmentation, we designed five different settings in the ablation study. Furthermore, we present two case studies that validate the prediction of the potential association between microbes and diseases by MVGCNMDA.
Collapse
Affiliation(s)
- Meifang Hua
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| | - Shengpeng Yu
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| | - Tianyu Liu
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| | - Xue Yang
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| | - Hong Wang
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China.
| |
Collapse
|
49
|
Yang M, Huang ZA, Gu W, Han K, Pan W, Yang X, Zhu Z. Prediction of biomarker-disease associations based on graph attention network and text representation. Brief Bioinform 2022; 23:6651308. [PMID: 35901464 DOI: 10.1093/bib/bbac298] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 06/28/2022] [Accepted: 06/30/2022] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION The associations between biomarkers and human diseases play a key role in understanding complex pathology and developing targeted therapies. Wet lab experiments for biomarker discovery are costly, laborious and time-consuming. Computational prediction methods can be used to greatly expedite the identification of candidate biomarkers. RESULTS Here, we present a novel computational model named GTGenie for predicting the biomarker-disease associations based on graph and text features. In GTGenie, a graph attention network is utilized to characterize diverse similarities of biomarkers and diseases from heterogeneous information resources. Meanwhile, a pretrained BERT-based model is applied to learn the text-based representation of biomarker-disease relation from biomedical literature. The captured graph and text features are then integrated in a bimodal fusion network to model the hybrid entity representation. Finally, inductive matrix completion is adopted to infer the missing entries for reconstructing relation matrix, with which the unknown biomarker-disease associations are predicted. Experimental results on HMDD, HMDAD and LncRNADisease data sets showed that GTGenie can obtain competitive prediction performance with other state-of-the-art methods. AVAILABILITY The source code of GTGenie and the test data are available at: https://github.com/Wolverinerine/GTGenie.
Collapse
Affiliation(s)
- Minghao Yang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518000, China
| | - Zhi-An Huang
- Center for Computer Science and Information Technology, City University of Hong Kong Dongguan Research Institute, Dongguan, China
| | - Wenhao Gu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518000, China.,GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Kun Han
- GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Wenying Pan
- GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Xiao Yang
- GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Zexuan Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518000, China
| |
Collapse
|
50
|
Wang Y, Lei X, Lu C, Pan Y. Predicting Microbe-Disease Association Based on Multiple Similarities and LINE Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2399-2408. [PMID: 34014827 DOI: 10.1109/tcbb.2021.3082183] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Numerous microbes have been found to have vital impacts on human health through affecting biological processes. Therefore, exploring potential associations between microbes and diseases will promote the understanding and diagnosis of diseases. In this study, we present a novel computational model, named MSLINE, to infer potential microbe-disease associations by integrating Multiple Similarities and Large-scale Information Network Embedding (LINE) based on known associations. Specifically, on the basis of known microbe-disease associations from the Human Microbe-Disease Association Database, we first increase the known associations by collecting proven associations from existing literatures. We then construct a microbe-disease heterogeneous network (MDHN) by integrating known associations and multiple similarities (including Gaussian interaction profile kernel similarity, microbe function similarity, disease semantic similarity and disease-symptom similarity). After that, we implement random walk and LINE algorithm on MDHN to learn its structure information. Finally, we score the microbe-disease associations according to the structure information for every nodes. In the Leave-one-out cross validation and 5-fold cross validation, MSLINE performs better compared to other existing methods. Moreover, case studies of different diseases proved that MSLINE could predict the potential microbe-disease associations efficiently.
Collapse
|