1
|
Gu Y, Zheng S, Zhang B, Kang H, Jiang R, Li J. Deep multiple instance learning on heterogeneous graph for drug-disease association prediction. Comput Biol Med 2025; 184:109403. [PMID: 39577348 DOI: 10.1016/j.compbiomed.2024.109403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 11/05/2024] [Accepted: 11/08/2024] [Indexed: 11/24/2024]
Abstract
Drug repositioning offers promising prospects for accelerating drug discovery by identifying potential drug-disease associations (DDAs) for existing drugs and diseases. Previous methods have generated meta-path-augmented node or graph embeddings for DDA prediction in drug-disease heterogeneous networks. However, these approaches rarely develop end-to-end frameworks for path instance-level representation learning as well as the further feature selection and aggregation. By leveraging the abundant topological information in path instances, more fine-grained and interpretable predictions can be achieved. To this end, we introduce deep multiple instance learning into drug repositioning by proposing a novel method called MilGNet. MilGNet employs a heterogeneous graph neural network (HGNN)-based encoder to learn drug and disease node embeddings. Treating each drug-disease pair as a bag, we designed a special quadruplet meta-path form and implemented a pseudo meta-path generator in MilGNet to obtain multiple meta-path instances based on network topology. Additionally, a bidirectional instance encoder enhances the representation of meta-path instances. Finally, MilGNet utilizes a multi-scale interpretable predictor to aggregate bag embeddings with an attention mechanism, providing predictions at both the bag and instance levels for accurate and explainable predictions. Comprehensive experiments on five benchmarks demonstrate that MilGNet significantly outperforms ten advanced methods. Notably, three case studies on one drug (Methotrexate) and two diseases (Renal Failure and Mismatch Repair Cancer Syndrome) highlight MilGNet's potential for discovering new indications, therapies, and generating rational meta-path instances to investigate possible treatment mechanisms. The source code is available at https://github.com/gu-yaowen/MilGNet.
Collapse
Affiliation(s)
- Yaowen Gu
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS&PUMC), Beijing, 100020, China; Department of Chemistry, New York University, NY, 10027, USA.
| | - Si Zheng
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS&PUMC), Beijing, 100020, China; Institute for Artificial Intelligence, Department of Computer Science and Technology, BNRist, Tsinghua University, Beijing, 100084, China
| | - Bowen Zhang
- Beijing StoneWise Technology Co Ltd., Beijing, 100080, China
| | - Hongyu Kang
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS&PUMC), Beijing, 100020, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Jiao Li
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College (CAMS&PUMC), Beijing, 100020, China.
| |
Collapse
|
2
|
Selote R, Makhijani R. A knowledge graph approach to drug repurposing for Alzheimer's, Parkinson's and Glioma using drug-disease-gene associations. Comput Biol Chem 2024; 115:108302. [PMID: 39693851 DOI: 10.1016/j.compbiolchem.2024.108302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 11/06/2024] [Accepted: 11/26/2024] [Indexed: 12/20/2024]
Abstract
Drug Repurposing gives us facility to find the new uses of previously developed drugs rather than developing new drugs from start. Particularly during pandemic, drug repurposing caught much attention to provide new applications of the previously approved drugs. In our research, we provide a novel method for drug repurposing based on feature learning process from drug-disease-gene network. In our research, we aimed at finding drug candidates which can be repurposed under neurodegenerative diseases and glioma. We collected association data between drugs, diseases and genes from public resources and primarily examined the data related to Alzheimer's, Parkinson's and Glioma diseases. We created a Knowledge Graph using neo4j by integrating all these datasets and applied scalable feature learning algorithm known as node2vec to create node embeddings. These embeddings were later used to predict the unknown associations between disease and their candidate drugs by finding cosine similarity between disease and drug nodes embedding. We obtained a definitive set of candidate drugs for repurposing. These results were validated from the literature and CodReS online tool to rank the candidate drugs. Additionally, we verified the status of candidate drugs from pharmaceutical knowledge databases to confirm their significance.
Collapse
Affiliation(s)
- Ruchira Selote
- Department of Computer Science and Engineering, Indian Institute of Information Technology, Nagpur, India.
| | - Richa Makhijani
- Department of Computer Science and Engineering, Indian Institute of Information Technology, Nagpur, India.
| |
Collapse
|
3
|
Sun Z, Song K. GEMimp: An Accurate and Robust Imputation Method for Microbiome Data Using Graph Embedding Neural Network. J Mol Biol 2024; 436:168841. [PMID: 39490678 DOI: 10.1016/j.jmb.2024.168841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 10/23/2024] [Accepted: 10/23/2024] [Indexed: 11/05/2024]
Abstract
Microbiome research has increasingly underscored the profound link between microbial compositions and human health, with numerous studies establishing a strong correlation between microbiome characteristics and various diseases. However, the analysis of microbiome data is frequently compromised by inherent sparsity issues, characterized by a substantial presence of observed zeros. These zeros not only skew the abundance distribution of microbial species but also undermine the reliability of scientific conclusions drawn from such data. Addressing this challenge, we introduce GEMimp, an innovative imputation method designed to infuse robustness into microbiome data analysis. GEMimp leverages the node2vec algorithm, which incorporates both Breadth-First Search (BFS) and Depth-First Search (DFS) strategies in its random walks sampling process. This approach enables GEMimp to learn nuanced, low-dimensional representations of each taxonomic unit, facilitating the reconstruction of their similarity networks with unprecedented accuracy. Our comparative analysis pits GEMimp against state-of-the-art imputation methods including SAVER, MAGIC and mbImpute. The results unequivocally demonstrate that GEMimp outperforms its counterparts by achieving the highest Pearson correlation coefficient when compared to the original raw dataset. Furthermore, GEMimp shows notable proficiency in identifying significant taxa, enhancing the detection of disease-related taxa and effectively mitigating the impact of sparsity on both simulated and real-world datasets, such as those pertaining to Type 2 Diabetes (T2D) and Colorectal Cancer (CRC). These findings collectively highlight the strong effectiveness of GEMimp, allowing for better analysis on microbial data. With alleviation of sparsity issues, it could be greatly facilitated in downstream analyses and even in the field of microbiology.
Collapse
Affiliation(s)
- Ziwei Sun
- School of Mathematics and Statistics, Qingdao University, Qingdao, China.
| | - Kai Song
- School of Mathematics and Statistics, Qingdao University, Qingdao, China.
| |
Collapse
|
4
|
Cui H, Duan M, Bi H, Li X, Hou X, Zhang Y. Heterogeneous graph contrastive learning with gradient balance for drug repositioning. Brief Bioinform 2024; 26:bbae650. [PMID: 39692448 DOI: 10.1093/bib/bbae650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 11/02/2024] [Accepted: 11/29/2024] [Indexed: 12/19/2024] Open
Abstract
Drug repositioning, which involves identifying new therapeutic indications for approved drugs, is pivotal in accelerating drug discovery. Recently, to mitigate the effect of label sparsity on inferring potential drug-disease associations (DDAs), graph contrastive learning (GCL) has emerged as a promising paradigm to supplement high-quality self-supervised signals through designing auxiliary tasks, then transfer shareable knowledge to main task, i.e. DDA prediction. However, existing approaches still encounter two limitations. The first is how to generate augmented views for fully capturing higher-order interaction semantics. The second is the optimization imbalance issue between auxiliary and main tasks. In this paper, we propose a novel heterogeneous Graph Contrastive learning method with Gradient Balance for DDA prediction, namely GCGB. To handle the first challenge, a fusion view is introduced to integrate both semantic views (drug and disease similarity networks) and interaction view (heterogeneous biomedical network). Next, inter-view contrastive learning auxiliary tasks are designed to contrast the fusion view with semantic and interaction views, respectively. For the second challenge, we adaptively adjust the gradient of GCL auxiliary tasks from the perspective of gradient direction and magnitude for better guiding parameter update toward main task. Extensive experiments conducted on three benchmarks under 10-fold cross-validation demonstrate the model effectiveness.
Collapse
Affiliation(s)
- Hai Cui
- Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, Dalian 116026, Liaoning, China
| | - Meiyu Duan
- Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, Dalian 116026, Liaoning, China
| | - Haijia Bi
- College of Computer Science and Technology, Jilin University, No.2699 Qianjin Street, Changchun 130012, Jilin, China
| | - Xiaobo Li
- Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, Dalian 116026, Liaoning, China
| | - Xiaodi Hou
- Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, Dalian 116026, Liaoning, China
| | - Yijia Zhang
- Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, Dalian 116026, Liaoning, China
| |
Collapse
|
5
|
Muniyappan S, Rayan AXA, Varrieth GT. DRADTiP: Drug repurposing for aging disease through drug-target interaction prediction. Comput Biol Med 2024; 182:109145. [PMID: 39305733 DOI: 10.1016/j.compbiomed.2024.109145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 09/08/2024] [Accepted: 09/08/2024] [Indexed: 11/14/2024]
Abstract
MOTIVATION The greatest risk factor for many non-communicable diseases is aging. Studies on model organisms have demonstrated that genetic and chemical perturbation alterations can lengthen longevity and overall health. However, finding longevity-enhancing medications and their related targets is difficult. METHOD In this work, we designed a novel drug repurposing model by identifying the interaction between aging-related genes or targets and drugs similar to aging disease. Each disease is associated with certain specific genetic factors for the occurrence of that disease. The factors include gene expression, pathway, miRNA, and degree of genes in the protein-protein interaction network. In this paper, we aim to find the drugs that prolong the life span of humans with their aging-related targets using the above-mentioned factors. In addition, the contribution or importance of each factor may vary among drugs and targets. Therefore, we designed a novel multi-layer random walk-based network representation learning model including node and edge weight to learn the features of drugs and targets respectively. RESULT The performance of the proposed model is demonstrated using k-fold cross-validation (k = 5). This model achieved better performance with scores of 0.93 and 0.91 for precision and recall respectively. The drugs identified by the system are evaluated to be potential candidates for aging since the degree of interaction between the potential drugs and their gene sets are high. In addition, the genes that are interacting with drugs produce the same biological functions. Hence the life span of the human will be increased or prolonged.
Collapse
Affiliation(s)
- Saranya Muniyappan
- Computer Science and Engineering, CEG Campus, Anna University, Chennai, Tamil Nadu, India.
| | | | | |
Collapse
|
6
|
Chen L, Zhao X. PCDA-HNMP: Predicting circRNA-disease association using heterogeneous network and meta-path. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:20553-20575. [PMID: 38124565 DOI: 10.3934/mbe.2023909] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Increasing amounts of experimental studies have shown that circular RNAs (circRNAs) play important regulatory roles in human diseases through interactions with related microRNAs (miRNAs). CircRNAs have become new potential disease biomarkers and therapeutic targets. Predicting circRNA-disease association (CDA) is of great significance for exploring the pathogenesis of complex diseases, which can improve the diagnosis level of diseases and promote the targeted therapy of diseases. However, determination of CDAs through traditional clinical trials is usually time-consuming and expensive. Computational methods are now alternative ways to predict CDAs. In this study, a new computational method, named PCDA-HNMP, was designed. For obtaining informative features of circRNAs and diseases, a heterogeneous network was first constructed, which defined circRNAs, mRNAs, miRNAs and diseases as nodes and associations between them as edges. Then, a deep analysis was conducted on the heterogeneous network by extracting meta-paths connecting to circRNAs (diseases), thereby mining hidden associations between various circRNAs (diseases). These associations constituted the meta-path-induced networks for circRNAs and diseases. The features of circRNAs and diseases were derived from the aforementioned networks via mashup. On the other hand, miRNA-disease associations (mDAs) were employed to improve the model's performance. miRNA features were yielded from the meta-path-induced networks on miRNAs and circRNAs, which were constructed from the meta-paths connecting miRNAs and circRNAs in the heterogeneous network. A concatenation operation was adopted to build the features of CDAs and mDAs. Such representations of CDAs and mDAs were fed into XGBoost to set up the model. The five-fold cross-validation yielded an area under the curve (AUC) of 0.9846, which was better than those of some existing state-of-the-art methods. The employment of mDAs can really enhance the model's performance and the importance analysis on meta-path-induced networks shown that networks produced by the meta-paths containing validated CDAs provided the most important contributions.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Xiaoyu Zhao
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
7
|
Muniyappan S, Rayan AXA, Varrieth GT. EGeRepDR: An enhanced genetic-based representation learning for drug repurposing using multiple biomedical sources. J Biomed Inform 2023; 147:104528. [PMID: 37858852 DOI: 10.1016/j.jbi.2023.104528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 09/11/2023] [Accepted: 10/16/2023] [Indexed: 10/21/2023]
Abstract
MOTIVATION Drug repurposing (DR) is an imminent approach for identifying novel therapeutic indications for the available drugs and discovering novel drugs for previously untreatable diseases. Nowadays, DR has major attention in the pharmaceutical industry due to the high cost and time of launching new drugs to the market through traditional drug development. DR task majorly depends on genetic information since the drugs revert the modified Gene Expression (GE) of diseases to normal. Many of the existing studies have not considered the genetic importance of predicting the potential candidates. METHOD We proposed a novel multimodal framework that utilizes genetic aspects of drugs and diseases such as genes, pathways, gene signatures, or expression to enhance the performance of DR using various data sources. Firstly, the heterogeneous biological network (HBN) is constructed with three types of nodes namely drug, disease, and gene, and 4 types of edges similarities (drug, gene, and disease), drug-gene, gene-disease, and drug-disease. Next, a modified graph auto-encoder (GAE*) model is applied to learn the representation of drug and disease nodes using the topological structure and edge information. Secondly, the HBN is enhanced with the information extracted from biomedical literature and ontology using a novel semi-supervised pattern embedding-based bootstrapping model and novel DR perspective representation learning respectively to improve the prediction performance. Finally, our proposed system uses a neural network model to generate the probability score of drug-disease pairs. RESULTS We demonstrate the efficiency of the proposed model on various datasets and achieved outstanding performance in 5-fold cross-validation (AUC = 0.99, AUPR = 0.98). Further, we validated the top-ranked potential candidates using pathway analysis and proved that the known and predicted candidates share common genes in the pathways.
Collapse
Affiliation(s)
- Saranya Muniyappan
- Computer Science and Engineering, CEG Campus, Anna University, Chennai, Tamil Nadu, India.
| | | | | |
Collapse
|
8
|
Chen L, Chen K, Zhou B. Inferring drug-disease associations by a deep analysis on drug and disease networks. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:14136-14157. [PMID: 37679129 DOI: 10.3934/mbe.2023632] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Drugs, which treat various diseases, are essential for human health. However, developing new drugs is quite laborious, time-consuming, and expensive. Although investments into drug development have greatly increased over the years, the number of drug approvals each year remain quite low. Drug repositioning is deemed an effective means to accelerate the procedures of drug development because it can discover novel effects of existing drugs. Numerous computational methods have been proposed in drug repositioning, some of which were designed as binary classifiers that can predict drug-disease associations (DDAs). The negative sample selection was a common defect of this method. In this study, a novel reliable negative sample selection scheme, named RNSS, is presented, which can screen out reliable pairs of drugs and diseases with low probabilities of being actual DDAs. This scheme considered information from k-neighbors of one drug in a drug network, including their associations to diseases and the drug. Then, a scoring system was set up to evaluate pairs of drugs and diseases. To test the utility of the RNSS, three classic classification algorithms (random forest, bayes network and nearest neighbor algorithm) were employed to build classifiers using negative samples selected by the RNSS. The cross-validation results suggested that such classifiers provided a nearly perfect performance and were significantly superior to those using some traditional and previous negative sample selection schemes.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Kaiyu Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Bo Zhou
- Shanghai University of Medicine & Health Sciences, Shanghai 201318, China
| |
Collapse
|