1
|
Mishra S, Singh G, Bhattacharya M. Tissue specific tumor-gene link prediction through sampling based GNN using a heterogeneous network. Med Biol Eng Comput 2024; 62:2499-2510. [PMID: 38635004 DOI: 10.1007/s11517-024-03087-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 03/31/2024] [Indexed: 04/19/2024]
Abstract
A tissue sample is a valuable resource for understanding a patient's symptoms and health status in relation to tumor growth. Recent research seeks to establish a connection between tissue-specific tumor samples and genetic markers (genes). This breakthrough has paved the way for personalized cancer therapies. With this motivation, the proposed model constructs a heterogeneous network based on tumor sample-gene relation data and gene-gene interaction data. This network also incorporates tissue-specific gene expression and primary site-based gene counts as features, enabling tissue-specific predictions. Graph neural networks (GNNs) have proven effective in modeling complex interactions and predicting links within this network. The proposed model has successfully predicted tumor-gene associations by leveraging sampling-based GNNs and link layer embedding. The model's performance metrics, such as AUC-ROC scores, reached approximately 94%, demonstrating the potential of this heterogeneous network in predicting tissue-specific tumor sample-gene links. This paper's findings highlight the importance of tissue-specific associations in cancer research.
Collapse
Affiliation(s)
- Surabhi Mishra
- Department of Information Technology, ABV- Indian Institute of Information Technology and Management, Morena Road, Gwalior, 474015, Madhya Pradesh, India.
| | - Gurjot Singh
- Department of Information Technology, ABV- Indian Institute of Information Technology and Management, Morena Road, Gwalior, 474015, Madhya Pradesh, India
| | - Mahua Bhattacharya
- Department of Information Technology, ABV- Indian Institute of Information Technology and Management, Morena Road, Gwalior, 474015, Madhya Pradesh, India
| |
Collapse
|
2
|
Hu X, Sun Z, Nian Y, Wang Y, Dang Y, Li F, Feng J, Yu E, Tao C. Self-Explainable Graph Neural Network for Alzheimer Disease and Related Dementias Risk Prediction: Algorithm Development and Validation Study. JMIR Aging 2024; 7:e54748. [PMID: 38976869 PMCID: PMC11263893 DOI: 10.2196/54748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 03/31/2024] [Accepted: 06/02/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Alzheimer disease and related dementias (ADRD) rank as the sixth leading cause of death in the United States, underlining the importance of accurate ADRD risk prediction. While recent advancements in ADRD risk prediction have primarily relied on imaging analysis, not all patients undergo medical imaging before an ADRD diagnosis. Merging machine learning with claims data can reveal additional risk factors and uncover interconnections among diverse medical codes. OBJECTIVE The study aims to use graph neural networks (GNNs) with claim data for ADRD risk prediction. Addressing the lack of human-interpretable reasons behind these predictions, we introduce an innovative, self-explainable method to evaluate relationship importance and its influence on ADRD risk prediction. METHODS We used a variationally regularized encoder-decoder GNN (variational GNN [VGNN]) integrated with our proposed relation importance method for estimating ADRD likelihood. This self-explainable method can provide a feature-important explanation in the context of ADRD risk prediction, leveraging relational information within a graph. Three scenarios with 1-year, 2-year, and 3-year prediction windows were created to assess the model's efficiency, respectively. Random forest (RF) and light gradient boost machine (LGBM) were used as baselines. By using this method, we further clarify the key relationships for ADRD risk prediction. RESULTS In scenario 1, the VGNN model showed area under the receiver operating characteristic (AUROC) scores of 0.7272 and 0.7480 for the small subset and the matched cohort data set. It outperforms RF and LGBM by 10.6% and 9.1%, respectively, on average. In scenario 2, it achieved AUROC scores of 0.7125 and 0.7281, surpassing the other models by 10.5% and 8.9%, respectively. Similarly, in scenario 3, AUROC scores of 0.7001 and 0.7187 were obtained, exceeding 10.1% and 8.5% than the baseline models, respectively. These results clearly demonstrate the significant superiority of the graph-based approach over the tree-based models (RF and LGBM) in predicting ADRD. Furthermore, the integration of the VGNN model and our relation importance interpretation could provide valuable insight into paired factors that may contribute to or delay ADRD progression. CONCLUSIONS Using our innovative self-explainable method with claims data enhances ADRD risk prediction and provides insights into the impact of interconnected medical code relationships. This methodology not only enables ADRD risk modeling but also shows potential for other image analysis predictions using claims data.
Collapse
Affiliation(s)
- Xinyue Hu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Zenan Sun
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Yi Nian
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Yichen Wang
- Division of Hospital Medicine at Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA, United States
| | - Yifang Dang
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Fang Li
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Jingna Feng
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Evan Yu
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Cui Tao
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| |
Collapse
|
3
|
Zhang D, Wang Z, Zhao D, Li J. DRGATAN: Directed relation graph attention aware network for asymmetric drug-drug interaction prediction. iScience 2024; 27:109943. [PMID: 38868194 PMCID: PMC11167430 DOI: 10.1016/j.isci.2024.109943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 03/21/2024] [Accepted: 05/06/2024] [Indexed: 06/14/2024] Open
Abstract
In scenarios involving the treatment of complex or coexisting diseases with multiple drugs, the potential for severe adverse drug reactions in patients necessitates the identification of potential drug-drug interactions (DDIs). Most existing computational methods have not taken into account the asymmetry and relation types of drug interactions caused by the relation information between drugs, which may lead to missing information in embedded learning. Therefore, this paper proposes a directed relation graph attention aware network (DRGATAN) to predict asymmetric drug interactions. DRGATAN leverages an encoder to learn multi-relational role embeddings of drugs across different types of relations. The experimental results show that DRGATAN's performance is superior to recognized advanced methods. The visualization demonstrates the effect of utilizing asymmetric information, and the case analysis validates the reliability of the proposed method. This study provides guidance for predicting asymmetric drug interactions.
Collapse
Affiliation(s)
- Dehai Zhang
- The Key Laboratory of Software Engineering of Yunnan Province, School of Software, Yunnan University, Kunming 650091, P.R. China
| | - Zhengwu Wang
- The Key Laboratory of Software Engineering of Yunnan Province, School of Software, Yunnan University, Kunming 650091, P.R. China
| | - Di Zhao
- The Key Laboratory of Software Engineering of Yunnan Province, School of Software, Yunnan University, Kunming 650091, P.R. China
| | - Jin Li
- The Key Laboratory of Software Engineering of Yunnan Province, School of Software, Yunnan University, Kunming 650091, P.R. China
| |
Collapse
|
4
|
Guo D, Wang Y, Chen J, Liu X. Integration of multi-omics data for survival prediction of lung adenocarcinoma. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 250:108192. [PMID: 38701699 DOI: 10.1016/j.cmpb.2024.108192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 04/08/2024] [Accepted: 04/20/2024] [Indexed: 05/05/2024]
Abstract
BACKGROUND AND OBJECTIVE The morbidity of lung adenocarcinoma (LUAD) has been increasing year by year and the prognosis is poor. This has prompted researchers to study the survival of LUAD patients to ensure that patients can be cured in time or survive after appropriate treatment. There is still no fully valid model that can be applied to clinical practice. METHODS We introduced struc2vec-based multi-omics data integration (SBMOI), which could integrate gene expression, somatic mutations and clinical data to construct mutation gene vectors representing LUAD patient features. Based on the patient features, the random survival forest (RSF) model was used to predict the long- and short-term survival of LUAD patients. To further demonstrate the superiority of SBMOI, we simultaneously replaced scale-free gene co-expression network (FCN) with a protein-protein interaction (PPI) network and a significant co-expression network (SCN) to compare accuracy in predicting LUAD patient survival under the same conditions. RESULTS Our results suggested that compared with SCN and PPI network, the FCN based SBMOI combined with RSF model had better performance in long- and short-term survival prediction tasks for LUAD patients. The AUC of 1-year, 5-year, and 10-year survival in the validation dataset were 0.791, 0.825, and 0.917, respectively. CONCLUSIONS This study provided a powerful network-based method to multi-omics data integration. SBMOI combined with RSF successfully predicted long- and short-term survival of LUAD patients, especially with high accuracy on long-term survival. Besides, SBMOI algorithm has the potential to combine with other machine learning models to complete clustering or stratificational tasks, and being applied to other diseases.
Collapse
Affiliation(s)
- Dingjie Guo
- Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, Jilin, China
| | - Yixian Wang
- Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, Jilin, China
| | - Jing Chen
- Academy for Advanced Interdisciplinary Studies, Northeast Normal University, Changchun, 130024, China
| | - Xin Liu
- Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, Jilin, China.
| |
Collapse
|
5
|
Wang S, Liu T, Ren C, Zhao Y, Qiao S, Zhang Y, Pang S. Heterogeneous graph inference with range constrainted L 2,1-collaborative matrix factorization for small molecule-miRNA association prediction. Comput Biol Chem 2024; 110:108078. [PMID: 38677013 DOI: 10.1016/j.compbiolchem.2024.108078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/03/2024] [Accepted: 04/16/2024] [Indexed: 04/29/2024]
Abstract
MicroRNAs (miRNAs) play a vital role in regulating gene expression and various biological processes. As a result, they have been identified as effective targets for small molecule (SM) drugs in disease treatment. Heterogeneous graph inference stands as a classical approach for predicting SM-miRNA associations, showcasing commendable convergence accuracy and speed. However, most existing methods do not adequately address the inherent sparsity in SM-miRNA association networks, and imprecise SM/miRNA similarity metrics reduce the accuracy of predicting SM-miRNA associations. In this research, we proposed a heterogeneous graph inference with range constrained L2,1-collaborative matrix factorization (HGIRCLMF) method to predict potential SM-miRNA associations. First, we computed the multi-source similarities of SM/miRNA and integrated these similarity information into a comprehensive SM/miRNA similarity. This step improved the accuracy of SM and miRNA similarity, ensuring reliability for the subsequent inference of the heterogeneity map. Second, we used a range constrained L2,1-collaborative matrix factorization (RCLMF) model to pre-populate the SM-miRNA association matrix with missing values. In this step, we developed a novel matrix decomposition method that enhances the robustness and formative nature of SM-miRNA edges between SM networks and miRNA networks. Next, we built a well-established SM-miRNA heterogeneous network utilizing the processed biological information. Finally, HGIRCLMF used this network data to infer unknown association pair scores. We implemented four cross-validation experiments on two distinct datasets, and HGIRCLMF acquired the highest areas under the curve, surpassing six state-of-the-art computational approaches. Furthermore, we performed three case studies to validate the predictive power of our method in practical application.
Collapse
Affiliation(s)
- Shudong Wang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
| | - Tiyao Liu
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
| | - Chuanru Ren
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
| | - Yawu Zhao
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
| | - Sibo Qiao
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266525, China.
| | - Shanchen Pang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
| |
Collapse
|
6
|
Liu W, Teng Z, Li Z, Chen J. CVGAE: A Self-Supervised Generative Method for Gene Regulatory Network Inference Using Single-Cell RNA Sequencing Data. Interdiscip Sci 2024:10.1007/s12539-024-00633-y. [PMID: 38778003 DOI: 10.1007/s12539-024-00633-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 04/07/2024] [Accepted: 04/09/2024] [Indexed: 05/25/2024]
Abstract
Gene regulatory network (GRN) inference based on single-cell RNA sequencing data (scRNAseq) plays a crucial role in understanding the regulatory mechanisms between genes. Various computational methods have been employed for GRN inference, but their performance in terms of network accuracy and model generalization is not satisfactory, and their poor performance is caused by high-dimensional data and network sparsity. In this paper, we propose a self-supervised method for gene regulatory network inference using single-cell RNA sequencing data (CVGAE). CVGAE uses graph neural network for inductive representation learning, which merges gene expression data and observed topology into a low-dimensional vector space. The well-trained vectors will be used to calculate mathematical distance of each gene, and further predict interactions between genes. In overall framework, FastICA is implemented to relief computational complexity caused by high dimensional data, and CVGAE adopts multi-stacked GraphSAGE layers as an encoder and an improved decoder to overcome network sparsity. CVGAE is evaluated on several single cell datasets containing four related ground-truth networks, and the result shows that CVGAE achieve better performance than comparative methods. To validate learning and generalization capabilities, CVGAE is applied in few-shot environment by change the ratio of train set and test set. In condition of few-shot, CVGAE obtains comparable or superior performance.
Collapse
Affiliation(s)
- Wei Liu
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China.
| | - Zhijie Teng
- School of Computer Science, Xiangtan University, Xiangtan, 411105, China
| | - Zejun Li
- School of Computer Science and Engineering, Hunan Institute of Technology, Hengyang, 412002, China
| | - Jing Chen
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.
| |
Collapse
|
7
|
Labarga A, Martínez-Gonzalez J, Barajas M. Integrative Multi-Omics Analysis for Etiology Classification and Biomarker Discovery in Stroke: Advancing towards Precision Medicine. BIOLOGY 2024; 13:338. [PMID: 38785820 PMCID: PMC11149453 DOI: 10.3390/biology13050338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 05/02/2024] [Accepted: 05/06/2024] [Indexed: 05/25/2024]
Abstract
Recent advancements in high-throughput omics technologies have opened new avenues for investigating stroke at the molecular level and elucidating the intricate interactions among various molecular components. We present a novel approach for multi-omics data integration on knowledge graphs and have applied it to a stroke etiology classification task of 30 stroke patients through the integrative analysis of DNA methylation and mRNA, miRNA, and circRNA. This approach has demonstrated promising performance as compared to other existing single technology approaches.
Collapse
Affiliation(s)
- Alberto Labarga
- Health Science Department, Public University of Navarra, 31006 Pamplona, Spain;
| | | | - Miguel Barajas
- Health Science Department, Public University of Navarra, 31006 Pamplona, Spain;
| |
Collapse
|
8
|
Yao X, Ouyang S, Lian Y, Peng Q, Zhou X, Huang F, Hu X, Shi F, Xia J. PheSeq, a Bayesian deep learning model to enhance and interpret the gene-disease association studies. Genome Med 2024; 16:56. [PMID: 38627848 PMCID: PMC11020195 DOI: 10.1186/s13073-024-01330-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 04/02/2024] [Indexed: 04/19/2024] Open
Abstract
Despite the abundance of genotype-phenotype association studies, the resulting association outcomes often lack robustness and interpretations. To address these challenges, we introduce PheSeq, a Bayesian deep learning model that enhances and interprets association studies through the integration and perception of phenotype descriptions. By implementing the PheSeq model in three case studies on Alzheimer's disease, breast cancer, and lung cancer, we identify 1024 priority genes for Alzheimer's disease and 818 and 566 genes for breast cancer and lung cancer, respectively. Benefiting from data fusion, these findings represent moderate positive rates, high recall rates, and interpretation in gene-disease association studies.
Collapse
Affiliation(s)
- Xinzhi Yao
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Sizhuo Ouyang
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Yulong Lian
- College of Science, Huazhong Agricultural University, Wuhan, China
| | - Qianqian Peng
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Xionghui Zhou
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Feier Huang
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Xuehai Hu
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Feng Shi
- College of Science, Huazhong Agricultural University, Wuhan, China
| | - Jingbo Xia
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China.
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China.
| |
Collapse
|
9
|
Zhang Y, Deng Z, Xu X, Feng Y, Junliang S. Application of Artificial Intelligence in Drug-Drug Interactions Prediction: A Review. J Chem Inf Model 2024; 64:2158-2173. [PMID: 37458400 DOI: 10.1021/acs.jcim.3c00582] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Drug-drug interactions (DDI) are a critical aspect of drug research that can have adverse effects on patients and can lead to serious consequences. Predicting these events accurately can significantly improve clinicians' ability to make better decisions and establish optimal treatment regimens. However, manually detecting these interactions is time-consuming and labor-intensive. Utilizing the advancements in Artificial Intelligence (AI) is essential for achieving accurate forecasts of DDIs. In this review, DDI prediction tasks are classified into three types according to the type of DDI prediction: undirected DDI prediction, DDI events prediction, and Asymmetric DDI prediction. The paper then reviews the progress of AI for each of these three prediction tasks in DDI and provides a summary of the data sets used as well as the representative methods used in these three prediction directions. In this review, we aim to provide a comprehensive overview of drug interaction prediction. The first section introduces commonly used databases and presents an overview of current research advancements and techniques across three domains of DDI. Additionally, we introduce classical machine learning techniques for predicting undirected drug interactions and provide a timeline for the progression of the predicted drug interaction events. At last, we debate the difficulties and prospects of AI approaches at predicting DDI, emphasizing their potential for improving clinical decision-making and patient outcomes.
Collapse
Affiliation(s)
- Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao,266000,China
| | - Zengqian Deng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao,266000,China
| | - Xiaoyu Xu
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao,266000,China
| | - Yinfei Feng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao,266000,China
| | - Shang Junliang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276800, China
| |
Collapse
|
10
|
Yan X, Gu C, Feng Y, Han J. Predicting Drug-drug Interaction with Graph Mutual Interaction Attention Mechanism. Methods 2024; 223:16-25. [PMID: 38262485 DOI: 10.1016/j.ymeth.2024.01.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/04/2024] [Accepted: 01/19/2024] [Indexed: 01/25/2024] Open
Abstract
Effective representation of molecules is a crucial step in AI-driven drug design and drug discovery, especially for drug-drug interaction (DDIs) prediction. Previous work usually models the drug information from the drug-related knowledge graph or the single drug molecules, but the interaction information between molecular substructures of drug pair is seldom considered, thus often ignoring the influence of bond information on atom node representation, leading to insufficient drug representation. Moreover, key molecular substructures have significant contribution to the DDIs prediction results. Therefore, in this work, we propose a novel Graph learning framework of Mutual Interaction Attention mechanism (called GMIA) to predict DDIs by effectively representing the drug molecules. Specifically, we build the node-edge message communication encoder to aggregate atom node and the incoming edge information for atom node representation and design the mutual interaction attention decoder to capture the mutual interaction context between molecular graphs of drug pairs. GMIA can bridge the gap between two encoders for the single drug molecules by attention mechanism. We also design a co-attention matrix to analyze the significance of different-size substructures obtained from the encoder-decoder layer and provide interpretability. In comparison with other recent state-of-the-art methods, our GMIA achieves the best results in terms of area under the precision-recall-curve (AUPR), area under the ROC curve (AUC), and F1 score on two different scale datasets. The case study indicates that our GMIA can detect the key substructure for potential DDIs, demonstrating the enhanced performance and interpretation ability of GMIA.
Collapse
Affiliation(s)
- Xiaoying Yan
- College of Computer Science, Xi'an Shiyou University, Xi'an 710065, China.
| | - Chi Gu
- College of Computer Science, Xi'an Shiyou University, Xi'an 710065, China
| | - Yuehua Feng
- College of Computer Science, Xi'an Shiyou University, Xi'an 710065, China
| | - Jiaxin Han
- College of Computer Science, Xi'an Shiyou University, Xi'an 710065, China
| |
Collapse
|
11
|
Zhang H, Jiao J, Zhao T, Zhao E, Li L, Li G, Zhang B, Qin QM. GERWR: Identifying the Key Pathogenicity- Associated sRNAs of Magnaporthe Oryzae Infection in Rice Based on Graph Embedding and Random Walk With Restart. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:227-239. [PMID: 38153818 DOI: 10.1109/tcbb.2023.3348080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2023]
Abstract
Rice blast, caused by Magnaporthe oryzae(M.oryzae), is a destructive rice disease that reduces rice yield by 10% to 30% annually. It also affects other cereal crops such as barley, wheat, rye, millet, sorghum, and maize. Small RNAs (sRNAs) play an essential regulatory role in fungus-plant interaction during the fungal invasion, but studies on pathogenic sRNAs during the fungal invasion of plants based on multi-omics data integration are rare. This paper proposes a novel approach called Graph Embedding combined with Random Walk with Restart (GERWR) to identify pathogenic sRNAs based on multi-omics data integration during M.oryzae invasion. By constructing a multi-omics network (MRMO), we identified 29 pathogenic sRNAs of rice blast fungus. Further analysis revealed that these sRNAs regulate rice genes in a many-to-many relationship, playing a significant regulatory role in the pathogenesis of rice blast disease. This paper explores the pathogenic factors of rice blast disease from the perspective of multi-omics data analysis, revealing the inherent connection between pathogenic factors of different omics. It has essential scientific significance for studying the pathogenic mechanism of rice blast fungus, the rice blast fungus-rice model system, and the pathogen-host interaction in related fields.
Collapse
|
12
|
Pan D, Lu P, Wu Y, Kang L, Huang F, Lin K, Yang F. Prediction of multiple types of drug interactions based on multi-scale fusion and dual-view fusion. Front Pharmacol 2024; 15:1354540. [PMID: 38434701 PMCID: PMC10904638 DOI: 10.3389/fphar.2024.1354540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 01/30/2024] [Indexed: 03/05/2024] Open
Abstract
Potential drug-drug interactions (DDI) can lead to adverse drug reactions (ADR), and DDI prediction can help pharmacy researchers detect harmful DDI early. However, existing DDI prediction methods fall short in fully capturing drug information. They typically employ a single-view input, focusing solely on drug features or drug networks. Moreover, they rely exclusively on the final model layer for predictions, overlooking the nuanced information present across various network layers. To address these limitations, we propose a multi-scale dual-view fusion (MSDF) method for DDI prediction. More specifically, MSDF first constructs two views, topological and feature views of drugs, as model inputs. Then a graph convolutional neural network is used to extract the feature representations from each view. On top of that, a multi-scale fusion module integrates information across different graph convolutional layers to create comprehensive drug embeddings. The embeddings from the two views are summed as the final representation for classification. Experiments on two real-world datasets demonstrate that MSDF achieves higher accuracy than state-of-the-art methods, as the dual-view, multi-scale approach better captures drug characteristics.
Collapse
Affiliation(s)
- Dawei Pan
- School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, China
| | - Ping Lu
- School of Economics and Management, Xiamen University of Technology, Xiamen, China
| | - Yunbing Wu
- College of Computer and Big Data, Fuzhou University, Fuzhou, China
| | - Liping Kang
- Pasteur Institute, Soochow University, Suzhou, China
| | - Fengxin Huang
- School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, China
| | - Kaibiao Lin
- School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, China
| | - Fan Yang
- Shenzhen Research Institute of Xiamen University, Shenzhen, China
- Department of Automation, Xiamen University, Xiamen, China
| |
Collapse
|
13
|
Zhang P, Zhang W, Sun W, Xu J, Hu H, Wang L, Wong L. Identification of gene biomarkers for brain diseases via multi-network topological semantics extraction and graph convolutional network. BMC Genomics 2024; 25:175. [PMID: 38350848 PMCID: PMC10865627 DOI: 10.1186/s12864-024-09967-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 01/03/2024] [Indexed: 02/15/2024] Open
Abstract
BACKGROUND Brain diseases pose a significant threat to human health, and various network-based methods have been proposed for identifying gene biomarkers associated with these diseases. However, the brain is a complex system, and extracting topological semantics from different brain networks is necessary yet challenging to identify pathogenic genes for brain diseases. RESULTS In this study, we present a multi-network representation learning framework called M-GBBD for the identification of gene biomarker in brain diseases. Specifically, we collected multi-omics data to construct eleven networks from different perspectives. M-GBBD extracts the spatial distributions of features from these networks and iteratively optimizes them using Kullback-Leibler divergence to fuse the networks into a common semantic space that represents the gene network for the brain. Subsequently, a graph consisting of both gene and large-scale disease proximity networks learns representations through graph convolution techniques and predicts whether a gene is associated which brain diseases while providing associated scores. Experimental results demonstrate that M-GBBD outperforms several baseline methods. Furthermore, our analysis supported by bioinformatics revealed CAMP as a significantly associated gene with Alzheimer's disease identified by M-GBBD. CONCLUSION Collectively, M-GBBD provides valuable insights into identifying gene biomarkers for brain diseases and serves as a promising framework for brain networks representation learning.
Collapse
Affiliation(s)
- Ping Zhang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Weihan Zhang
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Innovative Academy of Seed Design, Chinese Academy of Sciences, Hubei Hongshan Laboratory, Wuhan, 430074, China
| | - Weicheng Sun
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jinsheng Xu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hua Hu
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China.
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China.
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning, 530007, China.
| | - Leon Wong
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, 518118, China.
| |
Collapse
|
14
|
Castaneda EU, Baker EJ. KNeXT: a NetworkX-based topologically relevant KEGG parser. Front Genet 2024; 15:1292394. [PMID: 38415058 PMCID: PMC10896898 DOI: 10.3389/fgene.2024.1292394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 01/25/2024] [Indexed: 02/29/2024] Open
Abstract
Automating the recreation of gene and mixed gene-compound networks from Kyoto Encyclopedia of Genes and Genomes (KEGG) Markup Language (KGML) files is challenging because the data structure does not preserve the independent or loosely connected neighborhoods in which they were originally derived, referred to here as its topological environment. Identical accession numbers may overlap, causing neighborhoods to artificially collapse based on duplicated identifiers. This causes current parsers to create misleading or erroneous graphical representations when mixed gene networks are converted to gene-only networks. To overcome these challenges we created a python-based KEGG NetworkX Topological (KNeXT) parser that allows users to accurately recapitulate genetic networks and mixed networks from KGML map data. The software, archived as a python package index (PyPI) file to ensure broad application, is designed to ingest KGML files through built-in APIs and dynamically create high-fidelity topological representations. The utilization of NetworkX's framework to generate tab-separated files additionally ensures that KNeXT results may be imported into other graph frameworks and maintain programmatic access to the original x-y axis positions to each node in the KEGG pathway. KNeXT is a well-described Python 3 package that allows users to rapidly download and aggregate specific KGML files and recreate KEGG pathways based on a range of user-defined settings. KNeXT is platform-independent, distinctive, and it is not written on top of other Python parsers. Furthermore, KNeXT enables users to parse entire local folders or single files through command line scripts and convert the output into NCBI or UniProt IDs. KNeXT provides an ability for researchers to generate pathway visualizations while persevering the original context of a KEGG pathway. Source code is freely available at https://github.com/everest-castaneda/knext.
Collapse
Affiliation(s)
- Everest Uriel Castaneda
- Department of Biology, Baylor University, Waco, TX, United States
- School of Engineering and Computer Science, Baylor University, Waco, TX, United States
| | - Erich J Baker
- Department of Mathematics and Computer Science, Belmont University, Nashville, TN, United States
| |
Collapse
|
15
|
Zhang C, Zang T, Zhao T. KGE-UNIT: toward the unification of molecular interactions prediction based on knowledge graph and multi-task learning on drug discovery. Brief Bioinform 2024; 25:bbae043. [PMID: 38348746 PMCID: PMC10939374 DOI: 10.1093/bib/bbae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 12/29/2023] [Accepted: 01/23/2024] [Indexed: 02/15/2024] Open
Abstract
The prediction of molecular interactions is vital for drug discovery. Existing methods often focus on individual prediction tasks and overlook the relationships between them. Additionally, certain tasks encounter limitations due to insufficient data availability, resulting in limited performance. To overcome these limitations, we propose KGE-UNIT, a unified framework that combines knowledge graph embedding (KGE) and multi-task learning, for simultaneous prediction of drug-target interactions (DTIs) and drug-drug interactions (DDIs) and enhancing the performance of each task, even when data availability is limited. Via KGE, we extract heterogeneous features from the drug knowledge graph to enhance the structural features of drug and protein nodes, thereby improving the quality of features. Additionally, employing multi-task learning, we introduce an innovative predictor that comprises the task-aware Convolutional Neural Network-based (CNN-based) encoder and the task-aware attention decoder which can fuse better multimodal features, capture the contextual interactions of molecular tasks and enhance task awareness, leading to improved performance. Experiments on two imbalanced datasets for DTIs and DDIs demonstrate the superiority of KGE-UNIT, achieving high area under the receiver operating characteristics curves (AUROCs) (0.942, 0.987) and area under the precision-recall curve ( AUPRs) (0.930, 0.980) for DTIs and high AUROCs (0.975, 0.989) and AUPRs (0.966, 0.988) for DDIs. Notably, on the LUO dataset where the data were more limited, KGE-UNIT exhibited a more pronounced improvement, with increases of 4.32$\%$ in AUROC and 3.56$\%$ in AUPR for DTIs and 6.56$\%$ in AUROC and 8.17$\%$ in AUPR for DDIs. The scalability of KGE-UNIT is demonstrated through its extension to protein-protein interactions prediction, ablation studies and case studies further validate its effectiveness.
Collapse
Affiliation(s)
- Chengcheng Zhang
- Department of Computer Science, Harbin Institute of Technology, Harbin, 150001, China
| | - Tianyi Zang
- Department of Computer Science, Harbin Institute of Technology, Harbin, 150001, China
| | - Tianyi Zhao
- School of Medicine and Health, Harbin Institute of Technology, Harbin, 150001, China
| |
Collapse
|
16
|
Alvarez-Mamani E, Dechant R, Beltran-Castañón CA, Ibáñez AJ. Graph embedding on mass spectrometry- and sequencing-based biomedical data. BMC Bioinformatics 2024; 25:1. [PMID: 38166530 PMCID: PMC10763173 DOI: 10.1186/s12859-023-05612-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 12/11/2023] [Indexed: 01/04/2024] Open
Abstract
Graph embedding techniques are using deep learning algorithms in data analysis to solve problems of such as node classification, link prediction, community detection, and visualization. Although typically used in the context of guessing friendships in social media, several applications for graph embedding techniques in biomedical data analysis have emerged. While these approaches remain computationally demanding, several developments over the last years facilitate their application to study biomedical data and thus may help advance biological discoveries. Therefore, in this review, we discuss the principles of graph embedding techniques and explore the usefulness for understanding biological network data derived from mass spectrometry and sequencing experiments, the current workhorses of systems biology studies. In particular, we focus on recent examples for characterizing protein-protein interaction networks and predicting novel drug functions.
Collapse
Affiliation(s)
- Edwin Alvarez-Mamani
- Engineering Department, Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru
- Institute for Omics Sciences and Applied Biotechnology (ICOBA PUCP), Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru
| | - Reinhard Dechant
- Institute for Omics Sciences and Applied Biotechnology (ICOBA PUCP), Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru
- Calico Life Sciences, 1170 Veterans Blvd, San Francisco, CA, 94080, USA
| | | | - Alfredo J Ibáñez
- Institute for Omics Sciences and Applied Biotechnology (ICOBA PUCP), Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru.
- Science Department, Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru.
| |
Collapse
|
17
|
Liu Y, Sang G, Liu Z, Pan Y, Cheng J, Zhang Y. MPTN: A message-passing transformer network for drug repurposing from knowledge graph. Comput Biol Med 2024; 168:107800. [PMID: 38043469 DOI: 10.1016/j.compbiomed.2023.107800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/09/2023] [Accepted: 11/29/2023] [Indexed: 12/05/2023]
Abstract
Drug repurposing (DR) based on knowledge graphs (KGs) is challenging, which uses knowledge graph reasoning models to predict new therapeutic pathways for existing drugs. With the rapid development of computing technology and the growing availability of validated biomedical data, various knowledge graph-based methods have been widely used to analyze and process complex and novel data to discover new indications for given drugs. However, existing methods need to be improved in extracting semantic information from contextual triples of biomedical entities. In this study, we propose a message-passing transformer network named MPTN based on knowledge graph for drug repurposing. Firstly, CompGCN is used as precoder to jointly aggregate entity and relation embeddings. Then, to fully capture the semantic information of entity context triples, the message propagating transformer module is designed. The module integrates the transformer into the message passing mechanism and incorporates the attention weight information of computing entity context triples into the entity embedding to update the entity embedding. Next, the residual connection is introduced to retain information as much as possible and improve prediction accuracy. Finally, MPTN utilizes the InteractE module as the decoder to obtain heterogeneous feature interactions in entity and relation representations and predict new pathways for drug treatment. Experiments on two datasets show that the model is superior to the existing knowledge graph embedding (KGE) learning methods.
Collapse
Affiliation(s)
- Yuanxin Liu
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, Liaoning, China
| | - Guoming Sang
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, Liaoning, China
| | - Zhi Liu
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, Liaoning, China
| | - Yilin Pan
- School of Artificial Intelligence, Dalian Maritime University, Dalian, 116026, Liaoning, China
| | - Junkai Cheng
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, Liaoning, China
| | - Yijia Zhang
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, Liaoning, China.
| |
Collapse
|
18
|
Djeddi WE, Hermi K, Ben Yahia S, Diallo G. Advancing drug-target interaction prediction: a comprehensive graph-based approach integrating knowledge graph embedding and ProtBert pretraining. BMC Bioinformatics 2023; 24:488. [PMID: 38114937 PMCID: PMC10731821 DOI: 10.1186/s12859-023-05593-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 11/30/2023] [Indexed: 12/21/2023] Open
Abstract
BACKGROUND The pharmaceutical field faces a significant challenge in validating drug target interactions (DTIs) due to the time and cost involved, leading to only a fraction being experimentally verified. To expedite drug discovery, accurate computational methods are essential for predicting potential interactions. Recently, machine learning techniques, particularly graph-based methods, have gained prominence. These methods utilize networks of drugs and targets, employing knowledge graph embedding (KGE) to represent structured information from knowledge graphs in a continuous vector space. This phenomenon highlights the growing inclination to utilize graph topologies as a means to improve the precision of predicting DTIs, hence addressing the pressing requirement for effective computational methodologies in the field of drug discovery. RESULTS The present study presents a novel approach called DTIOG for the prediction of DTIs. The methodology employed in this study involves the utilization of a KGE strategy, together with the incorporation of contextual information obtained from protein sequences. More specifically, the study makes use of Protein Bidirectional Encoder Representations from Transformers (ProtBERT) for this purpose. DTIOG utilizes a two-step process to compute embedding vectors using KGE techniques. Additionally, it employs ProtBERT to determine target-target similarity. Different similarity measures, such as Cosine similarity or Euclidean distance, are utilized in the prediction procedure. In addition to the contextual embedding, the proposed unique approach incorporates local representations obtained from the Simplified Molecular Input Line Entry Specification (SMILES) of drugs and the amino acid sequences of protein targets. CONCLUSIONS The effectiveness of the proposed approach was assessed through extensive experimentation on datasets pertaining to Enzymes, Ion Channels, and G-protein-coupled Receptors. The remarkable efficacy of DTIOG was showcased through the utilization of diverse similarity measures in order to calculate the similarities between drugs and targets. The combination of these factors, along with the incorporation of various classifiers, enabled the model to outperform existing algorithms in its ability to predict DTIs. The consistent observation of this advantage across all datasets underlines the robustness and accuracy of DTIOG in the domain of DTIs. Additionally, our case study suggests that the DTIOG can serve as a valuable tool for discovering new DTIs.
Collapse
Affiliation(s)
- Warith Eddine Djeddi
- LR11ES14, Faculty of Sciences of Tunis, University of Tunis El Manar, Campus Universitaire, 2092, Tunis, Tunisia.
- High Institute of Informatics in Kef, University of Jendouba, Saleh Ayech, 8189, Jendouba, Tunisia.
| | - Khalil Hermi
- High Institute of Informatics in Kef, University of Jendouba, Saleh Ayech, 8189, Jendouba, Tunisia
| | - Sadok Ben Yahia
- Department of Software Science, Tallinn University of Technology, Ehitajate tee-5, 12618, Tallinn, Estonia
- The Maersk Mc-Kinney Moller Institute, Southern Syddansk Universitet, Alsion 2, 6400, Sønderborg, Denmark
| | - Gayo Diallo
- Bordeaux Population Health Inserm 1219, University of Bordeaux, rue Léo Saignat, 33000, Bordeaux, France
| |
Collapse
|
19
|
Brechtmann F, Bechtler T, Londhe S, Mertes C, Gagneur J. Evaluation of input data modality choices on functional gene embeddings. NAR Genom Bioinform 2023; 5:lqad095. [PMID: 37942285 PMCID: PMC10629286 DOI: 10.1093/nargab/lqad095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 09/07/2023] [Accepted: 09/28/2023] [Indexed: 11/10/2023] Open
Abstract
Functional gene embeddings, numerical vectors capturing gene function, provide a promising way to integrate functional gene information into machine learning models. These embeddings are learnt by applying self-supervised machine-learning algorithms on various data types including quantitative omics measurements, protein-protein interaction networks and literature. However, downstream evaluations comparing alternative data modalities used to construct functional gene embeddings have been lacking. Here we benchmarked functional gene embeddings obtained from various data modalities for predicting disease-gene lists, cancer drivers, phenotype-gene associations and scores from genome-wide association studies. Off-the-shelf predictors trained on precomputed embeddings matched or outperformed dedicated state-of-the-art predictors, demonstrating their high utility. Embeddings based on literature and protein-protein interactions inferred from low-throughput experiments outperformed embeddings derived from genome-wide experimental data (transcriptomics, deletion screens and protein sequence) when predicting curated gene lists. In contrast, they did not perform better when predicting genome-wide association signals and were biased towards highly-studied genes. These results indicate that embeddings derived from literature and low-throughput experiments appear favourable in many existing benchmarks because they are biased towards well-studied genes and should therefore be considered with caution. Altogether, our study and precomputed embeddings will facilitate the development of machine-learning models in genetics and related fields.
Collapse
Affiliation(s)
- Felix Brechtmann
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Thibault Bechtler
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Shubhankar Londhe
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Christian Mertes
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Data Science Institute, Technical University of Munich, Garching, Germany
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany
| | - Julien Gagneur
- TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany
| |
Collapse
|
20
|
Jiang H, Chen P, Sun Z, Liang C, Xue R, Zhao L, Wang Q, Li X, Deng W, Gao Z, Huang F, Huang S, Zhang Y, Li T. Assisting schizophrenia diagnosis using clinical electroencephalography and interpretable graph neural networks: a real-world and cross-site study. Neuropsychopharmacology 2023; 48:1920-1930. [PMID: 37491671 PMCID: PMC10584957 DOI: 10.1038/s41386-023-01658-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 05/24/2023] [Accepted: 07/07/2023] [Indexed: 07/27/2023]
Abstract
Schizophrenia (SCZ) is a chronic and serious mental disorder with a high mortality rate. At present, there is a lack of objective, cost-effective and widely disseminated diagnosis tools to address this mental health crisis globally. Clinical electroencephalogram (EEG) is a noninvasive technique to measure brain activity with high temporal resolution, and accumulating evidence demonstrates that clinical EEG is capable of capturing abnormal SCZ neuropathology. Although EEG-based automated diagnostic tools have obtained impressive performance on individual datasets, the transportability of potential EEG biomarkers in cross-site real-world application is still an open question. To address the challenges of small sample sizes and population heterogeneity, we develop an advanced interpretable deep learning model using multimodal clinical EEG features and demographic information as inputs to graph neural networks, and further propose different transfer learning strategies to adapt to different clinical scenarios. Taking the disease discrimination of health control (HC) and SCZ with 1030 participants as a use case, our model is trained on a small clinical dataset (N = 188, Chinese) and enhanced using a large-scale public dataset (N = 508, American) of adult participants. Cross-site validation from an independent dataset of adult participants (N = 157, Chinese) produced stable performance, with AUCs of 0.793-0.852 and accuracies of 0.786-0.858 for different SCZ prevalence, respectively. In addition, cross-site validation from another dataset of adolescent boys (N = 84, Russian) yielded an AUC of 0.702 and an accuracy of 0.690. Moreover, feature visualization further revealed that the ranking of feature importance varied significantly among different datasets, and that EEG theta and alpha band power appeared to be the most significant and translational biomarkers of SCZ pathology. Overall, our promising results demonstrate the feasibility of SCZ discrimination using EEG biomarkers in multiple clinical settings.
Collapse
Affiliation(s)
- Haiteng Jiang
- Affiliated Mental Health Center & Hangzhou Seventh People's Hospital and School of Brain Science and Brain Medicine, Zhejiang University School of Medicine, Hangzhou, China
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, 1369 West Wenyi Road, Hangzhou, 311121, China
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou, 310058, China
| | - Peiyin Chen
- Alibaba Damo Academy, 969 West Wen Yi Road, Yu Hang District, Hangzhou, Zhejiang, China
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China
| | - Zhaohong Sun
- College of Biomedical Engineering & Instrument Science, Zhejiang University, Hangzhou, Zhejiang, China
| | - Chengqian Liang
- Affiliated Mental Health Center & Hangzhou Seventh People's Hospital and School of Brain Science and Brain Medicine, Zhejiang University School of Medicine, Hangzhou, China
| | - Rui Xue
- Affiliated Mental Health Center & Hangzhou Seventh People's Hospital and School of Brain Science and Brain Medicine, Zhejiang University School of Medicine, Hangzhou, China
| | - Liansheng Zhao
- Psychiatric Laboratory and Mental Health Center, West China Hospital of Sichuan University, Chengdu, Sichuan, China
| | - Qiang Wang
- Psychiatric Laboratory and Mental Health Center, West China Hospital of Sichuan University, Chengdu, Sichuan, China
| | - Xiaojing Li
- Affiliated Mental Health Center & Hangzhou Seventh People's Hospital and School of Brain Science and Brain Medicine, Zhejiang University School of Medicine, Hangzhou, China
| | - Wei Deng
- Affiliated Mental Health Center & Hangzhou Seventh People's Hospital and School of Brain Science and Brain Medicine, Zhejiang University School of Medicine, Hangzhou, China
| | - Zhongke Gao
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China
| | - Fei Huang
- Alibaba Damo Academy, 969 West Wen Yi Road, Yu Hang District, Hangzhou, Zhejiang, China
| | - Songfang Huang
- Alibaba Damo Academy, 969 West Wen Yi Road, Yu Hang District, Hangzhou, Zhejiang, China
| | - Yaoyun Zhang
- Alibaba Damo Academy, 969 West Wen Yi Road, Yu Hang District, Hangzhou, Zhejiang, China.
| | - Tao Li
- Affiliated Mental Health Center & Hangzhou Seventh People's Hospital and School of Brain Science and Brain Medicine, Zhejiang University School of Medicine, Hangzhou, China.
- Liangzhu Laboratory, MOE Frontier Science Center for Brain Science and Brain-machine Integration, State Key Laboratory of Brain-machine Intelligence, Zhejiang University, 1369 West Wenyi Road, Hangzhou, 311121, China.
- NHC and CAMS Key Laboratory of Medical Neurobiology, Zhejiang University, Hangzhou, 310058, China.
| |
Collapse
|
21
|
Lecca P, Lecca M. Graph embedding and geometric deep learning relevance to network biology and structural chemistry. Front Artif Intell 2023; 6:1256352. [PMID: 38035201 PMCID: PMC10687447 DOI: 10.3389/frai.2023.1256352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 10/16/2023] [Indexed: 12/02/2023] Open
Abstract
Graphs are used as a model of complex relationships among data in biological science since the advent of systems biology in the early 2000. In particular, graph data analysis and graph data mining play an important role in biology interaction networks, where recent techniques of artificial intelligence, usually employed in other type of networks (e.g., social, citations, and trademark networks) aim to implement various data mining tasks including classification, clustering, recommendation, anomaly detection, and link prediction. The commitment and efforts of artificial intelligence research in network biology are motivated by the fact that machine learning techniques are often prohibitively computational demanding, low parallelizable, and ultimately inapplicable, since biological network of realistic size is a large system, which is characterised by a high density of interactions and often with a non-linear dynamics and a non-Euclidean latent geometry. Currently, graph embedding emerges as the new learning paradigm that shifts the tasks of building complex models for classification, clustering, and link prediction to learning an informative representation of the graph data in a vector space so that many graph mining and learning tasks can be more easily performed by employing efficient non-iterative traditional models (e.g., a linear support vector machine for the classification task). The great potential of graph embedding is the main reason of the flourishing of studies in this area and, in particular, the artificial intelligence learning techniques. In this mini review, we give a comprehensive summary of the main graph embedding algorithms in light of the recent burgeoning interest in geometric deep learning.
Collapse
Affiliation(s)
- Paola Lecca
- Faculty of Engineering, Free University of Bozen-Bolzano, Bolzano, Italy
| | - Michela Lecca
- Fondazione Bruno Kessler, Digital Industry Center, Technologies of Vision, Trento, Italy
| |
Collapse
|
22
|
Yabuuchi H, Hayashi K, Shigemoto A, Fujiwara M, Nomura Y, Nakashima M, Ogusu T, Mori M, Tokumoto SI, Miyai K. In vitro and in silico prediction of antibacterial interaction between essential oils via graph embedding approach. Sci Rep 2023; 13:18947. [PMID: 37919469 PMCID: PMC10622510 DOI: 10.1038/s41598-023-46377-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 10/31/2023] [Indexed: 11/04/2023] Open
Abstract
Essential oils contain a variety of volatile metabolites, and are expected to be utilized in wide fields such as antimicrobials, insect repellents and herbicides. However, it is difficult to foresee the effect of oil combinations because hundreds of compounds can be involved in synergistic and antagonistic interactions. In this research, it was developed and evaluated a machine learning method to classify types of (synergistic/antagonistic/no) antibacterial interaction between essential oils. Graph embedding was employed to capture structural features of the interaction network from literature data, and was found to improve in silico predicting performances to classify synergistic interactions. Furthermore, in vitro antibacterial assay against a standard strain of Staphylococcus aureus revealed that four essential oil pairs (Origanum compactum-Trachyspermum ammi, Cymbopogon citratus-Thujopsis dolabrata, Cinnamomum verum-Cymbopogon citratus and Trachyspermum ammi-Zingiber officinale) exhibited synergistic interaction as predicted. These results indicate that graph embedding approach can efficiently find synergistic interactions between antibacterial essential oils.
Collapse
Affiliation(s)
- Hiroaki Yabuuchi
- Department of Pharmaceutical Industry, Industrial Technology Center of Wakayama Prefecture, Wakayama, Japan.
- Kushimoto Branch, Shingu Health Center of Wakayama Prefecture, Wakayama, Japan.
| | - Kazuhito Hayashi
- Department of Pharmaceutical Industry, Industrial Technology Center of Wakayama Prefecture, Wakayama, Japan
- Tanabe Health Center of Wakayama Prefecture, Wakayama, Japan
| | - Akihiko Shigemoto
- Department of Digital Manufacturing, Industrial Technology Center of Wakayama Prefecture, Wakayama, Japan
| | - Makiko Fujiwara
- Department of Pharmaceutical Industry, Industrial Technology Center of Wakayama Prefecture, Wakayama, Japan
| | - Yuhei Nomura
- Department of Digital Manufacturing, Industrial Technology Center of Wakayama Prefecture, Wakayama, Japan
| | - Mayumi Nakashima
- Department of Digital Manufacturing, Industrial Technology Center of Wakayama Prefecture, Wakayama, Japan
| | - Takeshi Ogusu
- Department of Pharmaceutical Industry, Industrial Technology Center of Wakayama Prefecture, Wakayama, Japan
| | - Megumi Mori
- Department of Pharmaceutical Industry, Industrial Technology Center of Wakayama Prefecture, Wakayama, Japan
| | - Shin-Ichi Tokumoto
- Department of Digital Manufacturing, Industrial Technology Center of Wakayama Prefecture, Wakayama, Japan
| | - Kazuyuki Miyai
- Department of Pharmaceutical Industry, Industrial Technology Center of Wakayama Prefecture, Wakayama, Japan
| |
Collapse
|
23
|
Jin S, Hong Y, Zeng L, Jiang Y, Lin Y, Wei L, Yu Z, Zeng X, Liu X. A general hypergraph learning algorithm for drug multi-task predictions in micro-to-macro biomedical networks. PLoS Comput Biol 2023; 19:e1011597. [PMID: 37956212 PMCID: PMC10681315 DOI: 10.1371/journal.pcbi.1011597] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 11/27/2023] [Accepted: 10/13/2023] [Indexed: 11/15/2023] Open
Abstract
The powerful combination of large-scale drug-related interaction networks and deep learning provides new opportunities for accelerating the process of drug discovery. However, chemical structures that play an important role in drug properties and high-order relations that involve a greater number of nodes are not tackled in current biomedical networks. In this study, we present a general hypergraph learning framework, which introduces Drug-Substructures relationship into Molecular interaction Networks to construct the micro-to-macro drug centric heterogeneous network (DSMN), and develop a multi-branches HyperGraph learning model, called HGDrug, for Drug multi-task predictions. HGDrug achieves highly accurate and robust predictions on 4 benchmark tasks (drug-drug, drug-target, drug-disease, and drug-side-effect interactions), outperforming 8 state-of-the-art task specific models and 6 general-purpose conventional models. Experiments analysis verifies the effectiveness and rationality of the HGDrug model architecture as well as the multi-branches setup, and demonstrates that HGDrug is able to capture the relations between drugs associated with the same functional groups. In addition, our proposed drug-substructure interaction networks can help improve the performance of existing network models for drug-related prediction tasks.
Collapse
Affiliation(s)
- Shuting Jin
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China
- School of Informatics, Xiamen University, Xiamen, China
- Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai, China
| | - Yue Hong
- School of Informatics, Xiamen University, Xiamen, China
| | - Li Zeng
- Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai, China
| | - Yinghui Jiang
- School of Informatics, Xiamen University, Xiamen, China
| | - Yuan Lin
- School of Economics, Innovation, and Technology, Kristiania University College, Bergen, Norway
| | - Leyi Wei
- School of Software, Shandong University, Shandong, China
| | - Zhuohang Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Xiangxiang Zeng
- School of Information Science and Engineering, Hunan University, Hunan, China
| | - Xiangrong Liu
- School of Informatics, Xiamen University, Xiamen, China
- Zhejiang Lab, Hangzhou, China
| |
Collapse
|
24
|
Wang S, Wang F, Qiao S, Zhuang Y, Zhang K, Pang S, Nowak R, Lv Z. MSHGANMDA: Meta-Subgraphs Heterogeneous Graph Attention Network for miRNA-Disease Association Prediction. IEEE J Biomed Health Inform 2023; 27:4639-4648. [PMID: 35759606 DOI: 10.1109/jbhi.2022.3186534] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
MicroRNAs (miRNAs) influence several biological processes involved in human disease. Biological experiments for verifying the association between miRNA and disease are always costly in terms of both money and time. Although numerous biological experiments have identified multi-types of associations between miRNAs and diseases, existing computational methods are unable to sufficiently mine the knowledge in these associations to predict unknown associations. In this study, we innovatively propose a heterogeneous graph attention network model based on meta-subgraphs (MSHGANMDA) to predict the potential miRNA-disease associations. Firstly, we define five types of meta-subgraph from the known miRNA-disease associations. Then, we use meta-subgraph attention and meta-subgraph semantic attention to extract features of miRNA-disease pairs within and between these five meta-subgraphs, respectively. Finally, we apply a fully-connected layer (FCL) to predict the scores of unknown miRNA-disease associations and cross-entropy loss to train our model end-to-end. To evaluate the effectiveness of MSHGANMDA, we apply five-fold cross-validation to calculate the mean values of evaluation metrics Accuracy, Precision, Recall, and F1-score as 0.8595, 0.8601, 0.8596, and 0.8595, respectively. Experiments show that our model, which primarily utilizes multi-types of miRNA-disease association data, gets the greatest ROC-AUC value of 0.934 when compared to other state-of-the-art approaches. Furthermore, through case studies, we further confirm the effectiveness of MSHGANMDA in predicting unknown diseases.
Collapse
|
25
|
Shi W, Feng H, Li J, Liu T, Liu Z. DapBCH: a disease association prediction model Based on Cross-species and Heterogeneous graph embedding. Front Genet 2023; 14:1222346. [PMID: 37811150 PMCID: PMC10556742 DOI: 10.3389/fgene.2023.1222346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 09/11/2023] [Indexed: 10/10/2023] Open
Abstract
The study of comorbidity can provide new insights into the pathogenesis of the disease and has important economic significance in the clinical evaluation of treatment difficulty, medical expenses, length of stay, and prognosis of the disease. In this paper, we propose a disease association prediction model DapBCH, which constructs a cross-species biological network and applies heterogeneous graph embedding to predict disease association. First, we combine the human disease-gene network, mouse gene-phenotype network, human-mouse homologous gene network, and human protein-protein interaction network to reconstruct a heterogeneous biological network. Second, we apply heterogeneous graph embedding based on meta-path aggregation to generate the feature vector of disease nodes. Finally, we employ link prediction to obtain the similarity of disease pairs. The experimental results indicate that our model is highly competitive in predicting the disease association and is promising for finding potential disease associations.
Collapse
Affiliation(s)
- Wanqi Shi
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Hailin Feng
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Jian Li
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Tongcun Liu
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Zhe Liu
- College of Media Engineering, Zhejiang University of Media and Communications, Hangzhou, Zhejiang, China
| |
Collapse
|
26
|
Yue Z, Xiang Y, Chen G, Wang X, Li K, Zhang Y. PredinID: Predicting Pathogenic Inframe Indels in Human Through Graph Convolution Neural Network With Graph Sampling Technique. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3226-3233. [PMID: 37040252 DOI: 10.1109/tcbb.2023.3266232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Inframe insertion/deletion (indel) variants may alter protein sequence and function, which are closely related to an extensive variety of diseases. Although recent researches have paid attention to the associations between inframe indels and diseases, modeling indels in silico and interpreting their pathogenicity remain challenging, mainly due to the lack of experimental information and computational methodologies. In this article, we propose a novel computational method named PredinID (Predictor for inframe InDels) via graph convolutional network (GCN). PredinID leverages k-nearest neighbor algorithm to construct the feature graph for aggregating more informative representation, regarding the pathogenic inframe indel prediction as a node classification task. An edge-based sampling strategy is designed for extracting information from both the potential connections of feature space and the topological structure of subgraphs. Evaluated by 5-fold cross-validations, the PredinID method achieves satisfactory performance and is superior to four classic machine learning algorithms and two GCN methods. Comprehensive experiments show that PredinID has superior performances when compared with the state-of-the-art methods on the independent test set. Moreover, we also implement a web server at http://predinid.bio.aielab.cc/, to facilitate the use of the model.
Collapse
|
27
|
Pu Y, Beck D, Verspoor K. Graph embedding-based link prediction for literature-based discovery in Alzheimer's Disease. J Biomed Inform 2023; 145:104464. [PMID: 37541406 DOI: 10.1016/j.jbi.2023.104464] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Revised: 07/29/2023] [Accepted: 07/30/2023] [Indexed: 08/06/2023]
Abstract
OBJECTIVE We explore the framing of literature-based discovery (LBD) as link prediction and graph embedding learning, with Alzheimer's Disease (AD) as our focus disease context. The key link prediction setting of prediction window length is specifically examined in the context of a time-sliced evaluation methodology. METHODS We propose a four-stage approach to explore literature-based discovery for Alzheimer's Disease, creating and analyzing a knowledge graph tailored to the AD context, and predicting and evaluating new knowledge based on time-sliced link prediction. The first stage is to collect an AD-specific corpus. The second stage involves constructing an AD knowledge graph with identified AD-specific concepts and relations from the corpus. In the third stage, 20 pairs of training and testing datasets are constructed with the time-slicing methodology. Finally, we infer new knowledge with graph embedding-based link prediction methods. We compare different link prediction methods in this context. The impact of limiting prediction evaluation of LBD models in the context of short-term and longer-term knowledge evolution for Alzheimer's Disease is assessed. RESULTS We constructed an AD corpus of over 16 k papers published in 1977-2021, and automatically annotated it with concepts and relations covering 11 AD-specific semantic entity types. The knowledge graph of Alzheimer's Disease derived from this resource consisted of ∼11 k nodes and ∼394 k edges, among which 34% were genotype-phenotype relationships, 57% were genotype-genotype relationships, and 9% were phenotype-phenotype relationships. A Structural Deep Network Embedding (SDNE) model consistently showed the best performance in terms of returning the most confident set of link predictions as time progresses over 20 years. A huge improvement in model performance was observed when changing the link prediction evaluation setting to consider a more distant future, reflecting the time required for knowledge accumulation. CONCLUSION Neural network graph-embedding link prediction methods show promise for the literature-based discovery context, although the prediction setting is extremely challenging, with graph densities of less than 1%. Varying prediction window length on the time-sliced evaluation methodology leads to hugely different results and interpretations of LBD studies. Our approach can be generalized to enable knowledge discovery for other diseases. AVAILABILITY Code, AD ontology, and data are available at https://github.com/READ-BioMed/readbiomed-lbd.
Collapse
Affiliation(s)
- Yiyuan Pu
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Victoria, Australia.
| | - Daniel Beck
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Victoria, Australia.
| | - Karin Verspoor
- School of Computing and Information Systems, The University of Melbourne, Melbourne, Victoria, Australia; School of Computing Technologies, RMIT University, Melbourne, Victoria, Australia.
| |
Collapse
|
28
|
Pan L, Xiao X, Liu S, Peng S. An Integration Framework of Secure Multiparty Computation and Deep Neural Network for Improving Drug-Drug Interaction Predictions. J Comput Biol 2023; 30:1034-1045. [PMID: 37707993 DOI: 10.1089/cmb.2023.0076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/16/2023] Open
Abstract
Drug-drug interaction (DDI) is a key concern in drug development and pharmacovigilance. It is important to improve DDI predictions by integrating multisource data from various pharmaceutical companies. Unfortunately, the data privacy and financial interest issues seriously influence the interinstitutional collaborations for DDI predictions. We propose multiparty computation DDI (MPCDDI), a secure MPC-based deep learning framework for DDI predictions. MPCDDI leverages the secret sharing technologies to incorporate the drug-related feature data from multiple institutions and develops a deep learning model for DDI predictions. In MPCDDI, all data transmission and deep learning operations are integrated into secure MPC frameworks to enable high-quality collaboration among pharmaceutical institutions without divulging private drug-related information. The results suggest that MPCDDI is superior to other eight baselines and achieves the similar performance to that of the corresponding plaintext collaborations. More interestingly, MPCDDI significantly outperforms methods that use private data from the single institution. In summary, MPCDDI is an effective framework for promoting collaborative and privacy-preserving drug discovery.
Collapse
Affiliation(s)
- Liang Pan
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xia Xiao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | | | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
- The State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University, Changsha, China
| |
Collapse
|
29
|
Zhang Y, Hu Y, Han N, Yang A, Liu X, Cai H. A survey of drug-target interaction and affinity prediction methods via graph neural networks. Comput Biol Med 2023; 163:107136. [PMID: 37329615 DOI: 10.1016/j.compbiomed.2023.107136] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 05/29/2023] [Accepted: 06/04/2023] [Indexed: 06/19/2023]
Abstract
The tasks of drug-target interaction (DTI) and drug-target affinity (DTA) prediction play important roles in the field of drug discovery. However, biological experiment-based methods are time-consuming and expensive. Recently, computational-based approaches have accelerated the process of drug-target relationship prediction. Drug and target features are represented in structure-based, sequence-based, and graph-based ways. Although some achievements have been made regarding structure-based representations and sequence-based representations, the acquired feature information is not sufficiently rich. Molecular graph-based representations are some of the more popular approaches, and they have also generated a great deal of interest. In this article, we provide an overview of the DTI prediction and DTA prediction tasks based on graph neural networks (GNNs). We briefly discuss the molecular graphs of drugs, the primary sequences of target proteins, and the graph reSLBpresentations of target proteins. Meanwhile, we conducted experiments on various fundamental datasets to substantiate the plausibility of DTI and DTA utilizing graph neural networks.
Collapse
Affiliation(s)
- Yue Zhang
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, 510665, China.
| | - Yuqing Hu
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, 510665, China
| | - Na Han
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, 510665, China
| | - Aqing Yang
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, 510665, China
| | - Xiaoyong Liu
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, 510665, China
| | - Hongmin Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China
| |
Collapse
|
30
|
Wang Y, Worrell GA, Wang HL. It is the Frequency that Matters: Effects of Electromagnetic Fields on the Release and Content of Extracellular Vesicles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.08.552505. [PMID: 37609326 PMCID: PMC10441284 DOI: 10.1101/2023.08.08.552505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Extracellular vesicles (EVs) are small membrane-bound structures that originate from various cell types and carry molecular cargo to influence the behavior of recipient cells. The use of EVs as biomarkers and delivery vehicles for diagnosis and treatment in a wide range of human disease is a rapidly growing field of research and clinical practice. Four years ago, we postulated the hypothesis that electromagnetic fields (EMF) will influence the release and content of EVs (1). Since then, we have optimized several technical aspects of our experimental setup. We used a bioreactor system that allows cells to grow in a three-dimensional environment mimicking in-vivo conditions. We designed a custom-made EMF stimulation device that encompasses the bioreactor and delivers uniform EMFs. We established a three-step EV purification protocol that enables high-density production of EVs. We then performed mass spectrometry-based proteomics analysis on EV-related proteins and used high-resolution nanoparticle flowcytometry for single-vesicle analysis. We demonstrate that electrical stimulations of current amplitudes at physiological level that are currently applied in therapeutic deep brain stimulation can modulate EV content in a frequency-dependent manner, which may have important implications for basic biology and medical applications. First, it raises intriguing questions about how the endogenous electrical activity of neuronal and other cellular assemblies influence the production and composition of EVs. Second, it reveals an additional underlying mechanism of how therapeutic electrical stimulations can modulate EVs and treat human brain disorders. Third, it provides a novel approach of utilizing electrical stimulations in generating specific EV cargos.
Collapse
Affiliation(s)
- Yihua Wang
- Neurology Department, Mayo Clinic, Rochester, Minnesota
| | - Gregory A. Worrell
- Neurology Department, Mayo Clinic, Rochester, Minnesota
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, Minnesota
| | - Hai-Long Wang
- Neurology Department, Mayo Clinic, Rochester, Minnesota
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
31
|
Walke D, Micheel D, Schallert K, Muth T, Broneske D, Saake G, Heyer R. The importance of graph databases and graph learning for clinical applications. Database (Oxford) 2023; 2023:baad045. [PMID: 37428679 PMCID: PMC10332447 DOI: 10.1093/database/baad045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 05/26/2023] [Accepted: 06/16/2023] [Indexed: 07/12/2023]
Abstract
The increasing amount and complexity of clinical data require an appropriate way of storing and analyzing those data. Traditional approaches use a tabular structure (relational databases) for storing data and thereby complicate storing and retrieving interlinked data from the clinical domain. Graph databases provide a great solution for this by storing data in a graph as nodes (vertices) that are connected by edges (links). The underlying graph structure can be used for the subsequent data analysis (graph learning). Graph learning consists of two parts: graph representation learning and graph analytics. Graph representation learning aims to reduce high-dimensional input graphs to low-dimensional representations. Then, graph analytics uses the obtained representations for analytical tasks like visualization, classification, link prediction and clustering which can be used to solve domain-specific problems. In this survey, we review current state-of-the-art graph database management systems, graph learning algorithms and a variety of graph applications in the clinical domain. Furthermore, we provide a comprehensive use case for a clearer understanding of complex graph learning algorithms. Graphical abstract.
Collapse
Affiliation(s)
- Daniel Walke
- Bioprocess Engineering, Otto von Guericke University, Universitätsplatz 2, Magdeburg 39106, Germany
- Database and Software Engineering Group, Otto von Guericke University, Universitätsplatz 2, Magdeburg 39106, Germany
| | - Daniel Micheel
- Database and Software Engineering Group, Otto von Guericke University, Universitätsplatz 2, Magdeburg 39106, Germany
| | - Kay Schallert
- Multidimensional Omics Analyses Group, Leibniz-Institut für Analytische Wissenschaften—ISAS—e.V., Bunsen-Kirchhoff-Straße 11, Dortmund 44139, Germany
| | - Thilo Muth
- Section eScience (S.3), Federal Institute for Materials Research and Testing (BAM), Unter den Eichen 87, Berlin 12205, Germany
| | - David Broneske
- Infrastructure and Methods, German Center for Higher Education Research and Science Studies (DZHW), Lange Laube 12, Hannover 30159, Germany
| | - Gunter Saake
- Database and Software Engineering Group, Otto von Guericke University, Universitätsplatz 2, Magdeburg 39106, Germany
| | - Robert Heyer
- Multidimensional Omics Analyses Group, Leibniz-Institut für Analytische Wissenschaften—ISAS—e.V., Bunsen-Kirchhoff-Straße 11, Dortmund 44139, Germany
- Faculty of Technology, Bielefeld University, Universitätsstraße 25, Bielefeld 33615, Germany
| |
Collapse
|
32
|
Lin X, Dai L, Zhou Y, Yu ZG, Zhang W, Shi JY, Cao DS, Zeng L, Chen H, Song B, Yu PS, Zeng X. Comprehensive evaluation of deep and graph learning on drug-drug interactions prediction. Brief Bioinform 2023:bbad235. [PMID: 37401373 DOI: 10.1093/bib/bbad235] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/30/2023] [Accepted: 06/05/2023] [Indexed: 07/05/2023] Open
Abstract
Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug-drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in drug discovery and clinical research. DDIs prediction through traditional clinical trials and experiments is an expensive and time-consuming process. To correctly apply the advanced AI and deep learning, the developer and user meet various challenges such as the availability and encoding of data resources, and the design of computational methods. This review summarizes chemical structure based, network based, natural language processing based and hybrid methods, providing an updated and accessible guide to the broad researchers and development community with different domain knowledge. We introduce widely used molecular representation and describe the theoretical frameworks of graph neural network models for representing molecular structures. We present the advantages and disadvantages of deep and graph learning methods by performing comparative experiments. We discuss the potential technical challenges and highlight future directions of deep and graph learning models for accelerating DDIs prediction.
Collapse
Affiliation(s)
- Xuan Lin
- College of Computer Science, Xiangtan University, Xiangtan, China
| | - Lichang Dai
- College of Computer Science, Xiangtan University, Xiangtan, China
| | - Yafang Zhou
- College of Computer Science, Xiangtan University, Xiangtan, China
| | - Zu-Guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, China
| | - Jian-Yu Shi
- Northwestern Polytechnical University, Xian, China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, China
| | - Li Zeng
- AIDD department of Yuyao Biotech, Shanghai, China
| | - Haowen Chen
- College of Computer Science and Electronic Engineering, Hunan University, 410013 Changsha, P. R. China
| | - Bosheng Song
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Philip S Yu
- University of Illinois at Chicago and also holds the Wexler Chair in Information Technology
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, Changsha, China
| |
Collapse
|
33
|
Deng H, Li Q, Liu Y, Zhu J. MTMG: A multi-task model with multi-granularity information for drug-drug interaction extraction. Heliyon 2023; 9:e16819. [PMID: 37484258 PMCID: PMC10360954 DOI: 10.1016/j.heliyon.2023.e16819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 05/29/2023] [Accepted: 05/30/2023] [Indexed: 07/25/2023] Open
Abstract
Drug-drug interactions (DDIs) extraction includes identifying drug entities and interactions between drug pairs from the biomedical corpus. The discovery of potential DDIs aids in our understanding of the mechanisms underlying adverse reactions or combination therapy to improve patient safety. The manual extraction of DDIs is very time-consuming and expensive; therefore, computer-aided extraction of DDIs is vital. Many neural network-based methods have been proposed and achieved good efficiency in the extraction of DDIs over the years. However, most studies improved the performance of DDIs extraction with various external drug features while directly using golden drug entities, leading to error propagation and low universality in practical application. In this paper, we propose a new multi-task framework called MTMG, which changes DDIs extraction from a sentence-level classification task to a sequence labeling task named Drug-Specified Token Classification (DSTC). The proposed approach, MTMG, jointly trains DSTC with drug named entity recognition (DNER) and two sentence-level auxiliary tasks we designed. We aim to improve the performance of the entire DDIs extraction pipeline by better using the correlation between entities and relationships and, to the extent possible, using the information of varying granularity implied in the dataset. Experimental results show that MTMG can both improve the accuracy of DNER and DDIs extraction and outperforms state-of-the-art technique.
Collapse
|
34
|
Amiri Souri E, Chenoweth A, Karagiannis SN, Tsoka S. Drug repurposing and prediction of multiple interaction types via graph embedding. BMC Bioinformatics 2023; 24:202. [PMID: 37193964 DOI: 10.1186/s12859-023-05317-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 04/30/2023] [Indexed: 05/18/2023] Open
Abstract
BACKGROUND Finding drugs that can interact with a specific target to induce a desired therapeutic outcome is key deliverable in drug discovery for targeted treatment. Therefore, both identifying new drug-target links, as well as delineating the type of drug interaction, are important in drug repurposing studies. RESULTS A computational drug repurposing approach was proposed to predict novel drug-target interactions (DTIs), as well as to predict the type of interaction induced. The methodology is based on mining a heterogeneous graph that integrates drug-drug and protein-protein similarity networks, together with verified drug-disease and protein-disease associations. In order to extract appropriate features, the three-layer heterogeneous graph was mapped to low dimensional vectors using node embedding principles. The DTI prediction problem was formulated as a multi-label, multi-class classification task, aiming to determine drug modes of action. DTIs were defined by concatenating pairs of drug and target vectors extracted from graph embedding, which were used as input to classification via gradient boosted trees, where a model is trained to predict the type of interaction. After validating the prediction ability of DT2Vec+, a comprehensive analysis of all unknown DTIs was conducted to predict the degree and type of interaction. Finally, the model was applied to propose potential approved drugs to target cancer-specific biomarkers. CONCLUSION DT2Vec+ showed promising results in predicting type of DTI, which was achieved via integrating and mapping triplet drug-target-disease association graphs into low-dimensional dense vectors. To our knowledge, this is the first approach that addresses prediction between drugs and targets across six interaction types.
Collapse
Affiliation(s)
- E Amiri Souri
- Department of Informatics, Faculty of Natural, Mathematical and Engineering Sciences, King's College London, Bush House, London, WC2B 4BG, UK
| | - A Chenoweth
- St. John's Institute of Dermatology, School of Basic and Medical Biosciences, Guy's Hospital, King's College London, London, SE1 9RT, UK
- Breast Cancer Now Research Unit, School of Cancer and Pharmaceutical Sciences, Guy's Cancer Centre, King's College London, London, SE1 9RT, UK
| | - S N Karagiannis
- St. John's Institute of Dermatology, School of Basic and Medical Biosciences, Guy's Hospital, King's College London, London, SE1 9RT, UK
- Breast Cancer Now Research Unit, School of Cancer and Pharmaceutical Sciences, Guy's Cancer Centre, King's College London, London, SE1 9RT, UK
| | - S Tsoka
- Department of Informatics, Faculty of Natural, Mathematical and Engineering Sciences, King's College London, Bush House, London, WC2B 4BG, UK.
| |
Collapse
|
35
|
Li J, Wang Y, Li Z, Lin H, Wu B. LM-DTI: a tool of predicting drug-target interactions using the node2vec and network path score methods. Front Genet 2023; 14:1181592. [PMID: 37229202 PMCID: PMC10203599 DOI: 10.3389/fgene.2023.1181592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 04/13/2023] [Indexed: 05/27/2023] Open
Abstract
Introduction: Drug-target interaction (DTI) prediction is a key step in drug function discovery and repositioning. The emergence of large-scale heterogeneous biological networks provides an opportunity to identify drug-related target genes, which led to the development of several computational methods for DTI prediction. Methods: Considering the limitations of conventional computational methods, a novel tool named LM-DTI based on integrated information related to lncRNAs and miRNAs was proposed, which adopted the graph embedding (node2vec) and the network path score methods. First, LM-DTI innovatively constructed a heterogeneous information network containing eight networks composed of four types of nodes (drug, target, lncRNA, and miRNA). Next, the node2vec method was used to obtain feature vectors of drug as well as target nodes, and the path score vector of each drug-target pair was calculated using the DASPfind method. Finally, the feature vectors and path score vectors were merged and input into the XGBoost classifier to predict potential drug-target interactions. Results and Discussion: The 10-fold cross validations evaluate the classification accuracies of the LM-DTI. The prediction performance of LM-DTI in AUPR reached 0.96, which showed a significant improvement compared with those of conventional tools. The validity of LM-DTI has also been verified by manually searching literature and various databases. LM-DTI is scalable and computing efficient; thus representing a powerful drug relocation tool that can be accessed for free at http://www.lirmed.com:5038/lm_dti.
Collapse
Affiliation(s)
- Jianwei Li
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
- School of Electronic and Information Engineering, Hebei University of Technology, Tianjin, China
| | - Yinfei Wang
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
| | - Zhiguang Li
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
| | - Hongxin Lin
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
| | - Baoqin Wu
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
| |
Collapse
|
36
|
Lu H, Uddin S. Disease Prediction Using Graph Machine Learning Based on Electronic Health Data: A Review of Approaches and Trends. Healthcare (Basel) 2023; 11:healthcare11071031. [PMID: 37046958 PMCID: PMC10094099 DOI: 10.3390/healthcare11071031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 03/11/2023] [Accepted: 04/01/2023] [Indexed: 04/07/2023] Open
Abstract
Graph machine-learning (ML) methods have recently attracted great attention and have made significant progress in graph applications. To date, most graph ML approaches have been evaluated on social networks, but they have not been comprehensively reviewed in the health informatics domain. Herein, a review of graph ML methods and their applications in the disease prediction domain based on electronic health data is presented in this study from two levels: node classification and link prediction. Commonly used graph ML approaches for these two levels are shallow embedding and graph neural networks (GNN). This study performs comprehensive research to identify articles that applied or proposed graph ML models on disease prediction using electronic health data. We considered journals and conferences from four digital library databases (i.e., PubMed, Scopus, ACM digital library, and IEEEXplore). Based on the identified articles, we review the present status of and trends in graph ML approaches for disease prediction using electronic health data. Even though GNN-based models have achieved outstanding results compared with the traditional ML methods in a wide range of disease prediction tasks, they still confront interpretability and dynamic graph challenges. Though the disease prediction field using ML techniques is still emerging, GNN-based models have the potential to be an excellent approach for disease prediction, which can be used in medical diagnosis, treatment, and the prognosis of diseases.
Collapse
Affiliation(s)
- Haohui Lu
- School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, Sydney, NSW 2037, Australia
| | - Shahadat Uddin
- School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, Sydney, NSW 2037, Australia
| |
Collapse
|
37
|
Liu S, Zhang Y, Cui Y, Qiu Y, Deng Y, Zhang Z, Zhang W. Enhancing Drug-Drug Interaction Prediction Using Deep Attention Neural Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:976-985. [PMID: 35511833 DOI: 10.1109/tcbb.2022.3172421] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Drug-drug interactions are one of the main concerns in drug discovery. Accurate prediction of drug-drug interactions plays a key role in increasing the efficiency of drug research and safety when multiple drugs are co-prescribed. With various data sources that describe the relationships and properties between drugs, the comprehensive approach that integrates multiple data sources would be considerably effective in making high-accuracy prediction. In this paper, we propose a Deep Attention Neural Network based Drug-Drug Interaction prediction framework, abbreviated as DANN-DDI, to predict unobserved drug-drug interactions. First, we construct multiple drug feature networks and learn drug representations from these networks using the graph embedding method; then, we concatenate the learned drug embeddings and design an attention neural network to learn representations of drug-drug pairs; finally, we adopt a deep neural network to accurately predict drug-drug interactions. The experimental results demonstrate that our model DANN-DDI has improved prediction performance compared with state-of-the-art methods. Moreover, the proposed model can predict novel drug-drug interactions and drug-drug interaction-associated events.
Collapse
|
38
|
MSEDDI: Multi-Scale Embedding for Predicting Drug-Drug Interaction Events. Int J Mol Sci 2023; 24:ijms24054500. [PMID: 36901929 PMCID: PMC10002564 DOI: 10.3390/ijms24054500] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 02/18/2023] [Accepted: 02/22/2023] [Indexed: 03/02/2023] Open
Abstract
A norm in modern medicine is to prescribe polypharmacy to treat disease. The core concern with the co-administration of drugs is that it may produce adverse drug-drug interaction (DDI), which can cause unexpected bodily injury. Therefore, it is essential to identify potential DDI. Most existing methods in silico only judge whether two drugs interact, ignoring the importance of interaction events to study the mechanism implied in combination drugs. In this work, we propose a deep learning framework named MSEDDI that comprehensively considers multi-scale embedding representations of the drug for predicting drug-drug interaction events. In MSEDDI, we design three-channel networks to process biomedical network-based knowledge graph embedding, SMILES sequence-based notation embedding, and molecular graph-based chemical structure embedding, respectively. Finally, we fuse three heterogeneous features from channel outputs through a self-attention mechanism and feed them to the linear layer predictor. In the experimental section, we evaluate the performance of all methods on two different prediction tasks on two datasets. The results show that MSEDDI outperforms other state-of-the-art baselines. Moreover, we also reveal the stable performance of our model in a broader sample set via case studies.
Collapse
|
39
|
Shi K, Li L, Wang Z, Chen H, Chen Z, Fang S. Identifying microbe-disease association based on graph convolutional attention network: Case study of liver cirrhosis and epilepsy. Front Neurosci 2023; 16:1124315. [PMID: 36741060 PMCID: PMC9892757 DOI: 10.3389/fnins.2022.1124315] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 12/31/2022] [Indexed: 01/20/2023] Open
Abstract
The interactions between the microbiota and the human host can affect the physiological functions of organs (such as the brain, liver, gut, etc.). Accumulating investigations indicate that the imbalance of microbial community is closely related to the occurrence and development of diseases. Thus, the identification of potential links between microbes and diseases can provide insight into the pathogenesis of diseases. In this study, we propose a deep learning framework (MDAGCAN) based on graph convolutional attention network to identify potential microbe-disease associations. In MDAGCAN, we first construct a heterogeneous network consisting of the known microbe-disease associations and multi-similarity fusion networks of microbes and diseases. Then, the node embeddings considering the neighbor information of the heterogeneous network are learned by applying graph convolutional layers and graph attention layers. Finally, a bilinear decoder using node embedding representations reconstructs the unknown microbe-disease association. Experiments show that our method achieves reliable performance with average AUCs of 0.9778 and 0.9454 ± 0.0038 in the frameworks of Leave-one-out cross validation (LOOCV) and 5-fold cross validation (5-fold CV), respectively. Furthermore, we apply MDAGCAN to predict latent microbes for two high-risk human diseases, i.e., liver cirrhosis and epilepsy, and results illustrate that 16 and 17 out of the top 20 predicted microbes are verified by published literatures, respectively. In conclusion, our method displays effective and reliable prediction performance and can be expected to predict unknown microbe-disease associations facilitating disease diagnosis and prevention.
Collapse
Affiliation(s)
- Kai Shi
- College of Information Science and Engineering, Guilin University of Technology, Guilin, China,Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, Guilin, China,*Correspondence: Kai Shi,
| | - Lin Li
- College of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Zhengfeng Wang
- College of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Huazhou Chen
- College of Science, Guilin University of Technology, Guilin, China
| | - Zilin Chen
- Department of Developmental and Behavioural Pediatric Department & Department of Child Primary Care, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shuanfeng Fang
- Department of Children Health Care, Children’s Hospital Affiliated to Zhengzhou University, Zhengzhou, China,Shuanfeng Fang,
| |
Collapse
|
40
|
Temiz M, Bakir-Gungor B, Güner Şahan P, Coskun M. Topological feature generation for link prediction in biological networks. PeerJ 2023; 11:e15313. [PMID: 37187525 PMCID: PMC10178302 DOI: 10.7717/peerj.15313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 04/06/2023] [Indexed: 05/17/2023] Open
Abstract
Graph or network embedding is a powerful method for extracting missing or potential information from interactions between nodes in biological networks. Graph embedding methods learn representations of nodes and interactions in a graph with low-dimensional vectors, which facilitates research to predict potential interactions in networks. However, most graph embedding methods suffer from high computational costs in the form of high computational complexity of the embedding methods and learning times of the classifier, as well as the high dimensionality of complex biological networks. To address these challenges, in this study, we use the Chopper algorithm as an alternative approach to graph embedding, which accelerates the iterative processes and thus reduces the running time of the iterative algorithms for three different (nervous system, blood, heart) undirected protein-protein interaction (PPI) networks. Due to the high dimensionality of the matrix obtained after the embedding process, the data are transformed into a smaller representation by applying feature regularization techniques. We evaluated the performance of the proposed method by comparing it with state-of-the-art methods. Extensive experiments demonstrate that the proposed approach reduces the learning time of the classifier and performs better in link prediction. We have also shown that the proposed embedding method is faster than state-of-the-art methods on three different PPI datasets.
Collapse
Affiliation(s)
- Mustafa Temiz
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Pınar Güner Şahan
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Mustafa Coskun
- Department of Artificial Intelligence and Big Data Engineering, Ankara University, Ankara, Turkey
| |
Collapse
|
41
|
Hong E, Jeon J, Kim HU. Recent development of machine learning models for the prediction of drug-drug interactions. KOREAN J CHEM ENG 2023; 40:276-285. [PMID: 36748027 PMCID: PMC9894510 DOI: 10.1007/s11814-023-1377-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Revised: 12/09/2022] [Accepted: 12/16/2022] [Indexed: 02/05/2023]
Abstract
Polypharmacy, the co-administration of multiple drugs, has become an area of concern as the elderly population grows and an unexpected infection, such as COVID-19 pandemic, keeps emerging. However, it is very costly and time-consuming to experimentally examine the pharmacological effects of polypharmacy. To address this challenge, machine learning models that predict drug-drug interactions (DDIs) have actively been developed in recent years. In particular, the growing volume of drug datasets and the advances in machine learning have facilitated the model development. In this regard, this review discusses the DDI-predicting machine learning models that have been developed since 2018. Our discussion focuses on dataset sources used to develop the models, featurization approaches of molecular structures and biological information, and types of DDI prediction outcomes from the models. Finally, we make suggestions for research opportunities in this field.
Collapse
Affiliation(s)
- Eujin Hong
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Korea
| | - Junhyeok Jeon
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Korea
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Korea ,BioProcess Engineering Research Center and BioInformatics Research Center, KAIST, Daejeon, 34141 Korea
| |
Collapse
|
42
|
Ng TA, Rashid S, Kwoh CK. Virulence network of interacting domains of influenza a and mouse proteins. FRONTIERS IN BIOINFORMATICS 2023; 3:1123993. [PMID: 36875146 PMCID: PMC9982101 DOI: 10.3389/fbinf.2023.1123993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 02/03/2023] [Indexed: 02/19/2023] Open
Abstract
There exist several databases that provide virus-host protein interactions. While most provide curated records of interacting virus-host protein pairs, information on the strain-specific virulence factors or protein domains involved, is lacking. Some databases offer incomplete coverage of influenza strains because of the need to sift through vast amounts of literature (including those of major viruses including HIV and Dengue, besides others). None have offered complete, strain specific protein-protein interaction records for the influenza A group of viruses. In this paper, we present a comprehensive network of predicted domain-domain interaction(s) (DDI) between influenza A virus (IAV) and mouse host proteins, that will allow the systematic study of disease factors by taking the virulence information (lethal dose) into account. From a previously published dataset of lethal dose studies of IAV infection in mice, we constructed an interacting domain network of mouse and viral protein domains as nodes with weighted edges. The edges were scored with the Domain Interaction Statistical Potential (DISPOT) to indicate putative DDI. The virulence network can be easily navigated via a web browser, with the associated virulence information (LD50 values) prominently displayed. The network will aid influenza A disease modeling by providing strain-specific virulence levels with interacting protein domains. It can possibly contribute to computational methods for uncovering influenza infection mechanisms mediated through protein domain interactions between viral and host proteins. It is available at https://iav-ppi.onrender.com/home.
Collapse
Affiliation(s)
- Teng Ann Ng
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - Shamima Rashid
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
43
|
Manzo M, Giordano M, Maddalena L, Guarracino MR, Granata I. Novel Data Science Methodologies for Essential Genes Identification Based on Network Analysis. STUDIES IN COMPUTATIONAL INTELLIGENCE 2023:117-145. [DOI: 10.1007/978-3-031-24453-7_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
44
|
Li J, Lin H, Wang Y, Li Z, Wu B. Prediction of potential small molecule-miRNA associations based on heterogeneous network representation learning. Front Genet 2022; 13:1079053. [PMID: 36531225 PMCID: PMC9755196 DOI: 10.3389/fgene.2022.1079053] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 11/21/2022] [Indexed: 11/25/2023] Open
Abstract
MicroRNAs (miRNAs) are closely associated with the occurrences and developments of many complex human diseases. Increasing studies have shown that miRNAs emerge as new therapeutic targets of small molecule (SM) drugs. Since traditional experiment methods are expensive and time consuming, it is particularly crucial to find efficient computational approaches to predict potential small molecule-miRNA (SM-miRNA) associations. Considering that integrating multi-source heterogeneous information related with SM-miRNA association prediction would provide a comprehensive insight into the features of both SMs and miRNAs, we proposed a novel model of Small Molecule-MiRNA Association prediction based on Heterogeneous Network Representation Learning (SMMA-HNRL) for more precisely predicting the potential SM-miRNA associations. In SMMA-HNRL, a novel heterogeneous information network was constructed with SM nodes, miRNA nodes and disease nodes. To access and utilize of the topological information of the heterogeneous information network, feature vectors of SM and miRNA nodes were obtained by two different heterogeneous network representation learning algorithms (HeGAN and HIN2Vec) respectively and merged with connect operation. Finally, LightGBM was chosen as the classifier of SMMA-HNRL for predicting potential SM-miRNA associations. The 10-fold cross validations were conducted to evaluate the prediction performance of SMMA-HNRL, it achieved an area under of ROC curve of 0.9875, which was superior to other three state-of-the-art models. With two independent validation datasets, the test experiment results revealed the robustness of our model. Moreover, three case studies were performed. As a result, 35, 37, and 22 miRNAs among the top 50 predicting miRNAs associated with 5-FU, cisplatin, and imatinib were validated by experimental literature works respectively, which confirmed the effectiveness of SMMA-HNRL. The source code and experimental data of SMMA-HNRL are available at https://github.com/SMMA-HNRL/SMMA-HNRL.
Collapse
Affiliation(s)
- Jianwei Li
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
- Hebei Province Key Laboratory of Big Data Calculation, Hebei University of Technology, Tianjin, China
| | - Hongxin Lin
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
| | - Yinfei Wang
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
| | - Zhiguang Li
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
| | - Baoqin Wu
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
| |
Collapse
|
45
|
Askr H, Elgeldawi E, Aboul Ella H, Elshaier YAMM, Gomaa MM, Hassanien AE. Deep learning in drug discovery: an integrative review and future challenges. Artif Intell Rev 2022; 56:5975-6037. [PMID: 36415536 PMCID: PMC9669545 DOI: 10.1007/s10462-022-10306-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/24/2022] [Indexed: 11/18/2022]
Abstract
Recently, using artificial intelligence (AI) in drug discovery has received much attention since it significantly shortens the time and cost of developing new drugs. Deep learning (DL)-based approaches are increasingly being used in all stages of drug development as DL technology advances, and drug-related data grows. Therefore, this paper presents a systematic Literature review (SLR) that integrates the recent DL technologies and applications in drug discovery Including, drug-target interactions (DTIs), drug-drug similarity interactions (DDIs), drug sensitivity and responsiveness, and drug-side effect predictions. We present a review of more than 300 articles between 2000 and 2022. The benchmark data sets, the databases, and the evaluation measures are also presented. In addition, this paper provides an overview of how explainable AI (XAI) supports drug discovery problems. The drug dosing optimization and success stories are discussed as well. Finally, digital twining (DT) and open issues are suggested as future research challenges for drug discovery problems. Challenges to be addressed, future research directions are identified, and an extensive bibliography is also included.
Collapse
Affiliation(s)
- Heba Askr
- Faculty of Computers and Artificial Intelligence, University of Sadat City, Sadat City, Egypt
| | - Enas Elgeldawi
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Heba Aboul Ella
- Faculty of Pharmacy and Drug Technology, Chinese University in Egypt (CUE), Cairo, Egypt
| | | | - Mamdouh M. Gomaa
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Aboul Ella Hassanien
- Faculty of Computers and Artificial Intelligence, Cairo University, Cairo, Egypt
| |
Collapse
|
46
|
Jagtap S, Pirayre A, Bidard F, Duval L, Malliaros FD. BRANEnet: embedding multilayer networks for omics data integration. BMC Bioinformatics 2022; 23:429. [PMID: 36245002 PMCID: PMC9575224 DOI: 10.1186/s12859-022-04955-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Accepted: 08/24/2022] [Indexed: 11/10/2022] Open
Abstract
Background Gene expression is regulated at different molecular levels, including chromatin accessibility, transcription, RNA maturation, and transport. These regulatory mechanisms have strong connections with cellular metabolism. In order to study the cellular system and its functioning, omics data at each molecular level can be generated and efficiently integrated. Here, we propose BRANEnet, a novel multi-omics integration framework for multilayer heterogeneous networks. BRANEnet is an expressive, scalable, and versatile method to learn node embeddings, leveraging random walk information within a matrix factorization framework. Our goal is to efficiently integrate multi-omics data to study different regulatory aspects of multilayered processes that occur in organisms. We evaluate our framework using multi-omics data of Saccharomyces cerevisiae, a well-studied yeast model organism. Results We test BRANEnet on transcriptomics (RNA-seq) and targeted metabolomics (NMR) data for wild-type yeast strain during a heat-shock time course of 0, 20, and 120 min. Our framework learns features for differentially expressed bio-molecules showing heat stress response. We demonstrate the applicability of the learned features for targeted omics inference tasks: transcription factor (TF)-target prediction, integrated omics network (ION) inference, and module identification. The performance of BRANEnet is compared to existing network integration methods. Our model outperforms baseline methods by achieving high prediction scores for a variety of downstream tasks. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04955-w.
Collapse
Affiliation(s)
- Surabhi Jagtap
- Université Paris-Saclay, CentraleSupélec, Inria, 3 Rue Joliot Curie, 91190, Gif-Sur-Yvette, France.,IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852, Rueil-Malmaison, France
| | - Aurélie Pirayre
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852, Rueil-Malmaison, France
| | - Frédérique Bidard
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852, Rueil-Malmaison, France
| | - Laurent Duval
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, 92852, Rueil-Malmaison, France
| | - Fragkiskos D Malliaros
- Université Paris-Saclay, CentraleSupélec, Inria, 3 Rue Joliot Curie, 91190, Gif-Sur-Yvette, France.
| |
Collapse
|
47
|
Lin K, Kang L, Yang F, Lu P, Lu J. MFDA: Multiview fusion based on dual-level attention for drug interaction prediction. Front Pharmacol 2022; 13:1021329. [PMID: 36278200 PMCID: PMC9584567 DOI: 10.3389/fphar.2022.1021329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 09/13/2022] [Indexed: 11/30/2022] Open
Abstract
Drug-drug interaction prediction plays an important role in pharmacology and clinical applications. Most traditional methods predict drug interactions based on drug attributes or network structure. They usually have three limitations: 1) failing to integrate drug features and network structures well, resulting in less informative drug embeddings; 2) being restricted to a single view of drug interaction relationships; 3) ignoring the importance of different neighbors. To tackle these challenges, this paper proposed a multiview fusion based on dual-level attention to predict drug interactions (called MFDA). The MFDA first constructed multiple views for the drug interaction relationship, and then adopted a cross-fusion strategy to deeply fuse drug features with the drug interaction network under each view. To distinguish the importance of different neighbors and views, MFDA adopted a dual-level attention mechanism (node level and view level) to obtain the unified drug embedding for drug interaction prediction. Extensive experiments were conducted on real datasets, and the MFDA demonstrated superior performance compared to state-of-the-art baselines. In the multitask analysis of new drug reactions, MFDA obtained higher scores on multiple metrics. In addition, its prediction results corresponded to specific drug reaction events, which achieved more accurate predictions.
Collapse
Affiliation(s)
- Kaibiao Lin
- School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, China
| | - Liping Kang
- School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, China
- *Correspondence: Liping Kang, ; Fan Yang,
| | - Fan Yang
- Shenzhen Research Institute of Xiamen University, Shenzhen, China
- Department of Automation, Xiamen University, Xiamen, China
- *Correspondence: Liping Kang, ; Fan Yang,
| | - Ping Lu
- School of Economics and Management, Xiamen University of Technology, Xiamen, China
| | - Jiangtao Lu
- School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, China
| |
Collapse
|
48
|
Yang C, Xiao Y, Zhang Y, Sun Y, Han J. Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2022; 34:4854-4873. [PMID: 37915376 PMCID: PMC10619966 DOI: 10.1109/tkde.2020.3045924] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2023]
Abstract
Since real-world objects and their interactions are often multi-modal and multi-typed, heterogeneous networks have been widely used as a more powerful, realistic, and generic superclass of traditional homogeneous networks (graphs). Meanwhile, representation learning (a.k.a. embedding) has recently been intensively studied and shown effective for various network mining and analytical tasks. In this work, we aim to provide a unified framework to deeply summarize and evaluate existing research on heterogeneous network embedding (HNE), which includes but goes beyond a normal survey. Since there has already been a broad body of HNE algorithms, as the first contribution of this work, we provide a generic paradigm for the systematic categorization and analysis over the merits of various existing HNE algorithms. Moreover, existing HNE algorithms, though mostly claimed generic, are often evaluated on different datasets. Understandable due to the application favor of HNE, such indirect comparisons largely hinder the proper attribution of improved task performance towards effective data preprocessing and novel technical design, especially considering the various ways possible to construct a heterogeneous network from real-world application data. Therefore, as the second contribution, we create four benchmark datasets with various properties regarding scale, structure, attribute/label availability, and etc. from different sources, towards handy and fair evaluations of HNE algorithms. As the third contribution, we carefully refactor and amend the implementations and create friendly interfaces for 13 popular HNE algorithms, and provide all-around comparisons among them over multiple tasks and experimental settings. By putting all existing HNE algorithms under a unified framework, we aim to provide a universal reference and guideline for the understanding and development of HNE algorithms. Meanwhile, by open-sourcing all data and code, we envision to serve the community with an ready-to-use benchmark platform to test and compare the performance of existing and future HNE algorithms (https://github.com/yangji9181/HNE).
Collapse
Affiliation(s)
- Carl Yang
- Carl Yang is with Emory University; Yuxin Xiao is with Carnegie Mellon University; Yu Zhang and Jiawei Han are with University of Illinois, Urbana Champaign; Yizhou Sun is with University of California, Los Angeles
| | - Yuxin Xiao
- Carl Yang is with Emory University; Yuxin Xiao is with Carnegie Mellon University; Yu Zhang and Jiawei Han are with University of Illinois, Urbana Champaign; Yizhou Sun is with University of California, Los Angeles
| | - Yu Zhang
- Carl Yang is with Emory University; Yuxin Xiao is with Carnegie Mellon University; Yu Zhang and Jiawei Han are with University of Illinois, Urbana Champaign; Yizhou Sun is with University of California, Los Angeles
| | - Yizhou Sun
- Carl Yang is with Emory University; Yuxin Xiao is with Carnegie Mellon University; Yu Zhang and Jiawei Han are with University of Illinois, Urbana Champaign; Yizhou Sun is with University of California, Los Angeles
| | - Jiawei Han
- Carl Yang is with Emory University; Yuxin Xiao is with Carnegie Mellon University; Yu Zhang and Jiawei Han are with University of Illinois, Urbana Champaign; Yizhou Sun is with University of California, Los Angeles
| |
Collapse
|
49
|
Hua M, Yu S, Liu T, Yang X, Wang H. MVGCNMDA: Multi-view Graph Augmentation Convolutional Network for Uncovering Disease-Related Microbes. Interdiscip Sci 2022; 14:669-682. [PMID: 35428964 DOI: 10.1007/s12539-022-00514-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 03/06/2022] [Accepted: 03/13/2022] [Indexed: 06/14/2023]
Abstract
MOTIVATION Exploring the interrelationships between microbes and disease can help microbiologists make decisions and plan treatments. Predicting new microbe-disease associations currently relies on biological experiments and domain knowledge, which is time-consuming and inefficient. Automated algorithms are used to uncover the intrinsic link between microbes and disease. However, due to data noise and inadequate understanding of relevant biology, the efficient prediction of microbe-disease associations is still crucial. This study develops a multi-view graph augmentation convolutional network (MVGCNMDA) to predict potential disease-associated microbes. METHODS First, we use two data augmentation methods, edge perturbation and node dropping, to remove the data noise in the preprocessing stage. Second, we calculate Gaussian interaction profile kernel similarity and cosine similarity. Therefore, the Graph Convolutional Network(GCN) can fully use multi-view features. Then, the multi-view features are fed into the multi-attention block to learn the weights of different features adaptively. Finally, the embedding results are obtained using a Convolutional Neural Network (CNN) combiner, and the matrix completion is used to predict the relationship between potential microbes and diseases. RESULTS We test our model on the Human microbe-disease Association Database (HMDAD), Disbiome, and the Combined Dataset (Peryton and MicroPhenoDB). The area under PR curve (AUPR), area under ROC curve (AUC), F1 score, and RECALL value are calculated to evaluate the performance of the developed MVGCNMDA. The AUPR is 0.9440, AUC is 0.9428, F1 score is 0.9383, and RECALL value is 0.8858. The experiments show that our model can accurately predict potential microbe-disease associations compared with the state-of-the-art works on the global Leave-One-Out-Cross-Validation (LOOCV) and the fivefold Cross-Validation (fivefold CV). To further verify the effectiveness of the proposed graph data augmentation, we designed five different settings in the ablation study. Furthermore, we present two case studies that validate the prediction of the potential association between microbes and diseases by MVGCNMDA.
Collapse
Affiliation(s)
- Meifang Hua
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| | - Shengpeng Yu
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| | - Tianyu Liu
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| | - Xue Yang
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| | - Hong Wang
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China.
| |
Collapse
|
50
|
Huang D, An J, Zhang L, Liu B. Computational method using heterogeneous graph convolutional network model combined with reinforcement layer for MiRNA-disease association prediction. BMC Bioinformatics 2022; 23:299. [PMID: 35879658 PMCID: PMC9316361 DOI: 10.1186/s12859-022-04843-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 07/11/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A large number of evidences from biological experiments have confirmed that miRNAs play an important role in the progression and development of various human complex diseases. However, the traditional experiment methods are expensive and time-consuming. Therefore, it is a challenging task that how to develop more accurate and efficient methods for predicting potential associations between miRNA and disease. RESULTS In the study, we developed a computational model that combined heterogeneous graph convolutional network with enhanced layer for miRNA-disease association prediction (HGCNELMDA). The major improvement of our method lies in through restarting the random walk optimized the original features of nodes and adding a reinforcement layer to the hidden layer of graph convolutional network retained similar information between nodes in the feature space. In addition, the proposed approach recalculated the influence of neighborhood nodes on target nodes by introducing the attention mechanism. The reliable performance of the HGCNELMDA was certified by the AUC of 93.47% in global leave-one-out cross-validation (LOOCV), and the average AUCs of 93.01% in fivefold cross-validation. Meanwhile, we compared the HGCNELMDA with the state‑of‑the‑art methods. Comparative results indicated that o the HGCNELMDA is very promising and may provide a cost‑effective alternative for miRNA-disease association prediction. Moreover, we applied HGCNELMDA to 3 different case studies to predict potential miRNAs related to lung cancer, prostate cancer, and pancreatic cancer. Results showed that 48, 50, and 50 of the top 50 predicted miRNAs were supported by experimental association evidence. Therefore, the HGCNELMDA is a reliable method for predicting disease-related miRNAs. CONCLUSIONS The results of the HGCNELMDA method in the LOOCV (leave-one-out cross validation, LOOCV) and 5-cross validations were 93.47% and 93.01%, respectively. Compared with other typical methods, the performance of HGCNELMDA is higher. Three cases of lung cancer, prostate cancer, and pancreatic cancer were studied. Among the predicted top 50 candidate miRNAs, 48, 50, and 50 were verified in the biological database HDMMV2.0. Therefore; this further confirms the feasibility and effectiveness of our method. Therefore, this further confirms the feasibility and effectiveness of our method. To facilitate extensive studies for future disease-related miRNAs research, we developed a freely available web server called HGCNELMDA is available at http://124.221.62.44:8080/HGCNELMDA.jsp .
Collapse
Affiliation(s)
- Dan Huang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116, Jiangsu, China
| | - JiYong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116, Jiangsu, China.
| | - Lei Zhang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116, Jiangsu, China.
| | - BaiLong Liu
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116, Jiangsu, China
| |
Collapse
|