1
|
Hu X, Sun Z, Nian Y, Wang Y, Dang Y, Li F, Feng J, Yu E, Tao C. Self-Explainable Graph Neural Network for Alzheimer Disease and Related Dementias Risk Prediction: Algorithm Development and Validation Study. JMIR Aging 2024; 7:e54748. [PMID: 38976869 PMCID: PMC11263893 DOI: 10.2196/54748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 03/31/2024] [Accepted: 06/02/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Alzheimer disease and related dementias (ADRD) rank as the sixth leading cause of death in the United States, underlining the importance of accurate ADRD risk prediction. While recent advancements in ADRD risk prediction have primarily relied on imaging analysis, not all patients undergo medical imaging before an ADRD diagnosis. Merging machine learning with claims data can reveal additional risk factors and uncover interconnections among diverse medical codes. OBJECTIVE The study aims to use graph neural networks (GNNs) with claim data for ADRD risk prediction. Addressing the lack of human-interpretable reasons behind these predictions, we introduce an innovative, self-explainable method to evaluate relationship importance and its influence on ADRD risk prediction. METHODS We used a variationally regularized encoder-decoder GNN (variational GNN [VGNN]) integrated with our proposed relation importance method for estimating ADRD likelihood. This self-explainable method can provide a feature-important explanation in the context of ADRD risk prediction, leveraging relational information within a graph. Three scenarios with 1-year, 2-year, and 3-year prediction windows were created to assess the model's efficiency, respectively. Random forest (RF) and light gradient boost machine (LGBM) were used as baselines. By using this method, we further clarify the key relationships for ADRD risk prediction. RESULTS In scenario 1, the VGNN model showed area under the receiver operating characteristic (AUROC) scores of 0.7272 and 0.7480 for the small subset and the matched cohort data set. It outperforms RF and LGBM by 10.6% and 9.1%, respectively, on average. In scenario 2, it achieved AUROC scores of 0.7125 and 0.7281, surpassing the other models by 10.5% and 8.9%, respectively. Similarly, in scenario 3, AUROC scores of 0.7001 and 0.7187 were obtained, exceeding 10.1% and 8.5% than the baseline models, respectively. These results clearly demonstrate the significant superiority of the graph-based approach over the tree-based models (RF and LGBM) in predicting ADRD. Furthermore, the integration of the VGNN model and our relation importance interpretation could provide valuable insight into paired factors that may contribute to or delay ADRD progression. CONCLUSIONS Using our innovative self-explainable method with claims data enhances ADRD risk prediction and provides insights into the impact of interconnected medical code relationships. This methodology not only enables ADRD risk modeling but also shows potential for other image analysis predictions using claims data.
Collapse
Affiliation(s)
- Xinyue Hu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Zenan Sun
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Yi Nian
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Yichen Wang
- Division of Hospital Medicine at Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA, United States
| | - Yifang Dang
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Fang Li
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Jingna Feng
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Evan Yu
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Cui Tao
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States
| |
Collapse
|
2
|
Gualdi F, Oliva B, Piñero J. Predicting gene disease associations with knowledge graph embeddings for diseases with curtailed information. NAR Genom Bioinform 2024; 6:lqae049. [PMID: 38745993 PMCID: PMC11091931 DOI: 10.1093/nargab/lqae049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 03/08/2024] [Accepted: 04/24/2024] [Indexed: 05/16/2024] Open
Abstract
Knowledge graph embeddings (KGE) are a powerful technique used in the biomedical domain to represent biological knowledge in a low dimensional space. However, a deep understanding of these methods is still missing, and, in particular, regarding their applications to prioritize genes associated with complex diseases with reduced genetic information. In this contribution, we built a knowledge graph (KG) by integrating heterogeneous biomedical data and generated KGE by implementing state-of-the-art methods, and two novel algorithms: Dlemb and BioKG2vec. Extensive testing of the embeddings with unsupervised clustering and supervised methods showed that KGE can be successfully implemented to predict genes associated with diseases and that our novel approaches outperform most existing algorithms in both scenarios. Our findings underscore the significance of data quality, preprocessing, and integration in achieving accurate predictions. Additionally, we applied KGE to predict genes linked to Intervertebral Disc Degeneration (IDD) and illustrated that functions pertinent to the disease are enriched within the prioritized gene set.
Collapse
Affiliation(s)
- Francesco Gualdi
- Integrative Biomedical Informatics, Research Programme on Biomedical Informatics (IBI-GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain
- Structural Bioinformatics Lab, Research Programme on Biomedical Informatics (SBI-GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain
| | - Baldomero Oliva
- Structural Bioinformatics Lab, Research Programme on Biomedical Informatics (SBI-GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain
| | - Janet Piñero
- Integrative Biomedical Informatics, Research Programme on Biomedical Informatics (IBI-GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain
- Medbioinformatics Solutions SL, Barcelona, Spain
| |
Collapse
|
3
|
Huan JM, Wang XJ, Li Y, Zhang SJ, Hu YL, Li YL. The biomedical knowledge graph of symptom phenotype in coronary artery plaque: machine learning-based analysis of real-world clinical data. BioData Min 2024; 17:13. [PMID: 38773619 PMCID: PMC11110203 DOI: 10.1186/s13040-024-00365-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 05/17/2024] [Indexed: 05/24/2024] Open
Abstract
A knowledge graph can effectively showcase the essential characteristics of data and is increasingly emerging as a significant means of integrating information in the field of artificial intelligence. Coronary artery plaque represents a significant etiology of cardiovascular events, posing a diagnostic challenge for clinicians who are confronted with a multitude of nonspecific symptoms. To visualize the hierarchical relationship network graph of the molecular mechanisms underlying plaque properties and symptom phenotypes, patient symptomatology was extracted from electronic health record data from real-world clinical settings. Phenotypic networks were constructed utilizing clinical data and protein‒protein interaction networks. Machine learning techniques, including convolutional neural networks, Dijkstra's algorithm, and gene ontology semantic similarity, were employed to quantify clinical and biological features within the network. The resulting features were then utilized to train a K-nearest neighbor model, yielding 23 symptoms, 41 association rules, and 61 hub genes across the three types of plaques studied, achieving an area under the curve of 92.5%. Weighted correlation network analysis and pathway enrichment were subsequently utilized to identify lipid status-related genes and inflammation-associated pathways that could help explain the differences in plaque properties. To confirm the validity of the network graph model, we conducted coexpression analysis of the hub genes to evaluate their potential diagnostic value. Additionally, we investigated immune cell infiltration, examined the correlations between hub genes and immune cells, and validated the reliability of the identified biological pathways. By integrating clinical data and molecular network information, this biomedical knowledge graph model effectively elucidated the potential molecular mechanisms that collude symptoms, diseases, and molecules.
Collapse
Affiliation(s)
- Jia-Ming Huan
- First School of Clinical Medicine, Shandong University of Traditional Chinese Medicine, Jinan, 250355, China
| | - Xiao-Jie Wang
- First School of Clinical Medicine, Shandong University of Traditional Chinese Medicine, Jinan, 250355, China
| | - Yuan Li
- First School of Clinical Medicine, Shandong University of Traditional Chinese Medicine, Jinan, 250355, China
| | - Shi-Jun Zhang
- First School of Clinical Medicine, Shandong University of Traditional Chinese Medicine, Jinan, 250355, China
| | - Yuan-Long Hu
- First School of Clinical Medicine, Shandong University of Traditional Chinese Medicine, Jinan, 250355, China
| | - Yun-Lun Li
- First School of Clinical Medicine, Shandong University of Traditional Chinese Medicine, Jinan, 250355, China.
- Department of Cardiovascular, Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, 250014, China.
- Precision Diagnosis and Treatment of Cardiovascular Diseases with Traditional Chinese Medicine Shandong Engineering Research Center, Jinan, 250355, China.
| |
Collapse
|
4
|
Xia Y, Pan X, Shen HB. Heterogeneous sampled subgraph neural networks with knowledge distillation to enhance double-blind compound-protein interaction prediction. Structure 2024; 32:611-620.e4. [PMID: 38447575 DOI: 10.1016/j.str.2024.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/18/2023] [Accepted: 02/08/2024] [Indexed: 03/08/2024]
Abstract
Identifying binding compounds against a target protein is crucial for large-scale virtual screening in drug development. Recently, network-based methods have been developed for compound-protein interaction (CPI) prediction. However, they are difficult to be applied to unseen (i.e., never-seen-before) proteins and compounds. In this study, we propose SgCPI to incorporate local known interacting networks to predict CPI interactions. SgCPI randomly samples the local CPI network of the query compound-protein pair as a subgraph and applies a heterogeneous graph neural network (HGNN) to embed the active/inactive message of the subgraph. For unseen compounds and proteins, SgCPI-KD takes SgCPI as the teacher model to distillate its knowledge by estimating the potential neighbors. Experimental results indicate: (1) the sampled subgraphs of the CPI network introduce efficient knowledge for unseen molecular prediction with the HGNNs, and (2) the knowledge distillation strategy is beneficial to the double-blind interaction prediction by estimating molecular neighbors and distilling knowledge.
Collapse
Affiliation(s)
- Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| |
Collapse
|
5
|
Liu JX, Zhang X, Huang YQ, Hao GF, Yang GF. Multi-level bioinformatics resources support drug target discovery of protein-protein interactions. Drug Discov Today 2024; 29:103979. [PMID: 38608830 DOI: 10.1016/j.drudis.2024.103979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 03/14/2024] [Accepted: 04/05/2024] [Indexed: 04/14/2024]
Abstract
Drug discovery often begins with a new target. Protein-protein interactions (PPIs) are crucial to multitudinous cellular processes and offer a promising avenue for drug-target discovery. PPIs are characterized by multi-level complexity: at the protein level, interaction networks can be used to identify potential targets, whereas at the residue level, the details of the interactions of individual PPIs can be used to examine a target's druggability. Much great progress has been made in target discovery through multi-level PPI-related computational approaches, but these resources have not been fully discussed. Here, we systematically survey bioinformatics tools for identifying and assessing potential drug targets, examining their characteristics, limitations and applications. This work will aid the integration of the broader protein-to-network context with the analysis of detailed binding mechanisms to support the discovery of drug targets.
Collapse
Affiliation(s)
- Jia-Xin Liu
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China
| | - Xiao Zhang
- State Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for R&D of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Yuan-Qin Huang
- State Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for R&D of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Ge-Fei Hao
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China; State Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for R&D of Fine Chemicals, Guizhou University, Guiyang 550025, PR China.
| | - Guang-Fu Yang
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China.
| |
Collapse
|
6
|
Harrigan WL, Ferrell BD, Wommack KE, Polson SW, Schreiber ZD, Belcaid M. Improvements in viral gene annotation using large language models and soft alignments. BMC Bioinformatics 2024; 25:165. [PMID: 38664627 PMCID: PMC11046836 DOI: 10.1186/s12859-024-05779-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 04/12/2024] [Indexed: 04/28/2024] Open
Abstract
BACKGROUND The annotation of protein sequences in public databases has long posed a challenge in molecular biology. This issue is particularly acute for viral proteins, which demonstrate limited homology to known proteins when using alignment, k-mer, or profile-based homology search approaches. A novel methodology employing Large Language Models (LLMs) addresses this methodological challenge by annotating protein sequences based on embeddings. RESULTS Central to our contribution is the soft alignment algorithm, drawing from traditional protein alignment but leveraging embedding similarity at the amino acid level to bypass the need for conventional scoring matrices. This method not only surpasses pooled embedding-based models in efficiency but also in interpretability, enabling users to easily trace homologous amino acids and delve deeper into the alignments. Far from being a black box, our approach provides transparent, BLAST-like alignment visualizations, combining traditional biological research with AI advancements to elevate protein annotation through embedding-based analysis while ensuring interpretability. Tests using the Virus Orthologous Groups and ViralZone protein databases indicated that the novel soft alignment approach recognized and annotated sequences that both blastp and pooling-based methods, which are commonly used for sequence annotation, failed to detect. CONCLUSION The embeddings approach shows the great potential of LLMs for enhancing protein sequence annotation, especially in viral genomics. These findings present a promising avenue for more efficient and accurate protein function inference in molecular biology.
Collapse
Affiliation(s)
- William L Harrigan
- Hawai'i Institute of Marine Biology, University of Hawai'i at Mānoa, Honolulu, HI, 96822, USA
| | - Barbra D Ferrell
- Department of Plant & Soil Sciences, University of Delaware, Newark, DE, 19713, USA
| | - K Eric Wommack
- Department of Plant & Soil Sciences, University of Delaware, Newark, DE, 19713, USA
| | - Shawn W Polson
- Department of Computer and Information Sciences, University of Delaware, Newark, DE, 19713, USA
| | - Zachary D Schreiber
- Department of Plant & Soil Sciences, University of Delaware, Newark, DE, 19713, USA
| | - Mahdi Belcaid
- Department of Computer Science, University of Hawai'i at Mānoa, Honolulu, HI, 96822, USA.
| |
Collapse
|
7
|
Luo H, Yin W, Wang J, Zhang G, Liang W, Luo J, Yan C. Drug-drug interactions prediction based on deep learning and knowledge graph: A review. iScience 2024; 27:109148. [PMID: 38405609 PMCID: PMC10884936 DOI: 10.1016/j.isci.2024.109148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2024] Open
Abstract
Drug-drug interactions (DDIs) can produce unpredictable pharmacological effects and lead to adverse events that have the potential to cause irreversible damage to the organism. Traditional methods to detect DDIs through biological or pharmacological analysis are time-consuming and expensive, therefore, there is an urgent need to develop computational methods to effectively predict drug-drug interactions. Currently, deep learning and knowledge graph techniques which can effectively extract features of entities have been widely utilized to develop DDI prediction methods. In this research, we aim to systematically review DDI prediction researches applying deep learning and graph knowledge. The available biomedical data and public databases related to drugs are firstly summarized in this review. Then, we discuss the existing drug-drug interactions prediction methods which have utilized deep learning and knowledge graph techniques and group them into three main classes: deep learning-based methods, knowledge graph-based methods, and methods that combine deep learning with knowledge graph. We comprehensively analyze the commonly used drug related data and various DDI prediction methods, and compare these prediction methods on benchmark datasets. Finally, we briefly discuss the challenges related to drug-drug interactions prediction, including asymmetric DDIs prediction and high-order DDI prediction.
Collapse
Affiliation(s)
- Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
| | - Weijie Yin
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Jianlin Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Academy for Advanced Interdisciplinary Studies, Zhengzhou, China
| | - Ge Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
| | - Wenjuan Liang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Academy for Advanced Interdisciplinary Studies, Zhengzhou, China
| |
Collapse
|
8
|
Alvarez-Mamani E, Dechant R, Beltran-Castañón CA, Ibáñez AJ. Graph embedding on mass spectrometry- and sequencing-based biomedical data. BMC Bioinformatics 2024; 25:1. [PMID: 38166530 PMCID: PMC10763173 DOI: 10.1186/s12859-023-05612-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 12/11/2023] [Indexed: 01/04/2024] Open
Abstract
Graph embedding techniques are using deep learning algorithms in data analysis to solve problems of such as node classification, link prediction, community detection, and visualization. Although typically used in the context of guessing friendships in social media, several applications for graph embedding techniques in biomedical data analysis have emerged. While these approaches remain computationally demanding, several developments over the last years facilitate their application to study biomedical data and thus may help advance biological discoveries. Therefore, in this review, we discuss the principles of graph embedding techniques and explore the usefulness for understanding biological network data derived from mass spectrometry and sequencing experiments, the current workhorses of systems biology studies. In particular, we focus on recent examples for characterizing protein-protein interaction networks and predicting novel drug functions.
Collapse
Affiliation(s)
- Edwin Alvarez-Mamani
- Engineering Department, Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru
- Institute for Omics Sciences and Applied Biotechnology (ICOBA PUCP), Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru
| | - Reinhard Dechant
- Institute for Omics Sciences and Applied Biotechnology (ICOBA PUCP), Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru
- Calico Life Sciences, 1170 Veterans Blvd, San Francisco, CA, 94080, USA
| | | | - Alfredo J Ibáñez
- Institute for Omics Sciences and Applied Biotechnology (ICOBA PUCP), Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru.
- Science Department, Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru.
| |
Collapse
|
9
|
James T, Hennig H. Knowledge Graphs and Their Applications in Drug Discovery. Methods Mol Biol 2024; 2716:203-221. [PMID: 37702941 DOI: 10.1007/978-1-0716-3449-3_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
Knowledge graphs represent information in the form of entities and relationships between those entities. Such a representation has multiple potential applications in drug discovery, including democratizing access to biomedical data, contextualizing or visualizing that data, and generating novel insights through the application of machine learning approaches. Knowledge graphs put data into context and therefore offer the opportunity to generate explainable predictions, which is a key topic in contemporary artificial intelligence. In this chapter, we outline some of the factors that need to be considered when constructing biomedical knowledge graphs, examine recent advances in mining such systems to gain insights for drug discovery, and identify potential future areas for further development.
Collapse
Affiliation(s)
- Tim James
- Evotec (UK) Ltd., Abingdon, Oxfordshire, UK.
| | | |
Collapse
|
10
|
Su C, Hou Y, Levin M, Zhang R, Wang F. Protocol to implement a computational pipeline for biomedical discovery based on a biomedical knowledge graph. STAR Protoc 2023; 4:102666. [PMID: 37883224 PMCID: PMC10630678 DOI: 10.1016/j.xpro.2023.102666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/06/2023] [Accepted: 10/03/2023] [Indexed: 10/28/2023] Open
Abstract
Biomedical knowledge graphs (BKGs) provide a new paradigm for managing abundant biomedical knowledge efficiently. Today's artificial intelligence techniques enable mining BKGs to discover new knowledge. Here, we present a protocol for implementing a computational pipeline for biomedical knowledge discovery (BKD) based on a BKG. We describe steps of the pipeline including data processing, implementing BKD based on knowledge graph embeddings, and prediction result interpretation. We detail how our pipeline can be used for drug repurposing hypothesis generation for Parkinson's disease. For complete details on the use and execution of this protocol, please refer to Su et al.1.
Collapse
Affiliation(s)
- Chang Su
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA.
| | - Yu Hou
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA
| | - Michael Levin
- Bioengineering Department, College of Engineering, Temple University, Philadelphia, PA 19122, USA
| | - Rui Zhang
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA.
| |
Collapse
|
11
|
Zhang Y, Sui X, Pan F, Yu K, Li K, Tian S, Erdengasileng A, Han Q, Wang W, Wang J, Wang J, Sun D, Chung H, Zhou J, Zhou E, Lee B, Zhang P, Qiu X, Zhao T, Zhang J. BioKG: a comprehensive, large-scale biomedical knowledge graph for AI-powered, data-driven biomedical research. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.13.562216. [PMID: 38168218 PMCID: PMC10760044 DOI: 10.1101/2023.10.13.562216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
To cope with the rapid growth of scientific publications and data in biomedical research, knowledge graphs (KGs) have emerged as a powerful data structure for integrating large volumes of heterogeneous data to facilitate accurate and efficient information retrieval and automated knowledge discovery (AKD). However, transforming unstructured content from scientific literature into KGs has remained a significant challenge, with previous methods unable to achieve human-level accuracy. In this study, we utilized an information extraction pipeline that won first place in the LitCoin NLP Challenge to construct a largescale KG using all PubMed abstracts. The quality of the large-scale information extraction rivals that of human expert annotations, signaling a new era of automatic, high-quality database construction from literature. Our extracted information markedly surpasses the amount of content in manually curated public databases. To enhance the KG's comprehensiveness, we integrated relation data from 40 public databases and relation information inferred from high-throughput genomics data. The comprehensive KG enabled rigorous performance evaluation of AKD, which was infeasible in previous studies. We designed an interpretable, probabilistic-based inference method to identify indirect causal relations and achieved unprecedented results for drug target identification and drug repurposing. Taking lung cancer as an example, we found that 40% of drug targets reported in literature could have been predicted by our algorithm about 15 years ago in a retrospective study, demonstrating that substantial acceleration in scientific discovery could be achieved through automated hypotheses generation and timely dissemination. A cloud-based platform (https://www.biokde.com) was developed for academic users to freely access this rich structured data and associated tools.
Collapse
Affiliation(s)
- Yuan Zhang
- Department of Statistics, Florida State University, Tallahassee, FL 32306
| | - Xin Sui
- Insilicom LLC, Tallahassee, FL 32303
| | - Feng Pan
- Insilicom LLC, Tallahassee, FL 32303
| | | | - Keqiao Li
- Department of Statistics, Florida State University, Tallahassee, FL 32306
| | - Shubo Tian
- Department of Statistics, Florida State University, Tallahassee, FL 32306
| | | | - Qing Han
- Department of Statistics, Florida State University, Tallahassee, FL 32306
| | - Wanjing Wang
- Department of Statistics, Florida State University, Tallahassee, FL 32306
| | | | - Jian Wang
- 977 Wisteria Ter., Sunnyvale, CA 94086
| | | | | | - Jun Zhou
- Insilicom LLC, Tallahassee, FL 32303
| | - Eric Zhou
- Insilicom LLC, Tallahassee, FL 32303
| | - Ben Lee
- Insilicom LLC, Tallahassee, FL 32303
| | - Peili Zhang
- Forward Informatics, Winchester, Massachusetts, 01890
| | - Xing Qiu
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY 14642
| | - Tingting Zhao
- Department of Geography, Florida State University, Tallahassee, FL 32306
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, FL 32306
- Insilicom LLC, Tallahassee, FL 32303
| |
Collapse
|
12
|
Fu C, Huang Z, van Harmelen F, He T, Jiang X. Food4healthKG: Knowledge graphs for food recommendations based on gut microbiota and mental health. Artif Intell Med 2023; 145:102677. [PMID: 37925207 DOI: 10.1016/j.artmed.2023.102677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 08/05/2023] [Accepted: 10/03/2023] [Indexed: 11/06/2023]
Abstract
Food is increasingly acknowledged as a powerful means to promote and maintain mental health. The introduction of the gut-brain axis has been instrumental in understanding the impact of food on mental health. It is widely reported that food can significantly influence gut microbiota metabolism, thereby playing a pivotal role in maintaining mental health. However, the vast amount of heterogeneous data published in recent research lacks systematic integration and application development. To remedy this, we construct a comprehensive knowledge graph, named Food4healthKG, focusing on food, gut microbiota, and mental diseases. The constructed workflow includes the integration of numerous heterogeneous data, entity linking to a normalized format, and the well-designed representation of the acquired knowledge. To illustrate the availability of Food4healthKG, we design two case studies: the knowledge query and the food recommendation based on Food4healthKG. Furthermore, we propose two evaluation methods to validate the quality of the results obtained from Food4healthKG. The results demonstrate the system's effectiveness in practical applications, particularly in providing convincing food recommendations based on gut microbiota and mental health. Food4healthKG is accessible at https://github.com/ccszbd/Food4healthKG.
Collapse
Affiliation(s)
- Chengcheng Fu
- National Engineering Research Center for E-Learning, Central China Normal University, Wuhan, China; School of Computer Science, Central China Normal University, Wuhan, China; Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands; National Language Resources Monitor Research Center for Network Media, Central China Normal University, Wuhan, China
| | - Zhisheng Huang
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands; Clinical Research Center for Mental Disorders, Shanghai Pudong New Area Mental Health Center, Tongji University School of Medicine, Shanghai, China; Deep Blue Technology Group, Shanghai, China
| | - Frank van Harmelen
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Tingting He
- School of Computer Science, Central China Normal University, Wuhan, China; National Language Resources Monitor Research Center for Network Media, Central China Normal University, Wuhan, China
| | - Xingpeng Jiang
- School of Computer Science, Central China Normal University, Wuhan, China; National Language Resources Monitor Research Center for Network Media, Central China Normal University, Wuhan, China.
| |
Collapse
|
13
|
Shan W, Shen C, Luo L, Ding P. Multi-task learning for predicting synergistic drug combinations based on auto-encoding multi-relational graphs. iScience 2023; 26:108020. [PMID: 37854693 PMCID: PMC10579440 DOI: 10.1016/j.isci.2023.108020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 08/26/2023] [Accepted: 09/19/2023] [Indexed: 10/20/2023] Open
Abstract
Combinatorial drug therapy is a promising approach for treating complex diseases by combining drugs with synergistic effects. However, predicting effective drug combinations is challenging due to the complexity of biological systems and the limited understanding of pathophysiological mechanisms and drug targets. In this paper, we proposed a computational framework called VGAETF (Variational Graph Autoencoder Tensor Decomposition), which leveraged multi-relational graph to model complex relationships between entities in biological systems and predicted disease-related synergistic drug combinations in an end-to-end manner. In the computational experiments, VGAETF achieved high performances (AUROC [the area under receiver operating characteristic] = 0.9767, AUPR [the area under precision-recall] = 0.9660), outperforming other compared methods. Moreover, case studies further demonstrated the effectiveness of VGAETF in identifying potential disease-related synergistic drug combinations.
Collapse
Affiliation(s)
- Wenyu Shan
- School of Computer Science, University of South China, Hengyang, Hunan 421001, China
| | - Cong Shen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Lingyun Luo
- School of Computer Science, University of South China, Hengyang, Hunan 421001, China
- Hunan Medical Big Data International Science and Technology Innovation Cooperation Base, Hengyang, Hunan 421001, China
| | - Pingjian Ding
- School of Computer Science, University of South China, Hengyang, Hunan 421001, China
| |
Collapse
|
14
|
Dlamini SB, Saunders CJ, Laguette MJN, Gibbon A, Gamieldien J, Collins M, September AV. Application of an in silico approach identifies a genetic locus within ITGB2, and its interactions with HSPG2 and FGF9, to be associated with anterior cruciate ligament rupture risk. Eur J Sport Sci 2023; 23:2098-2108. [PMID: 36680346 DOI: 10.1080/17461391.2023.2171906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
We developed a Biomedical Knowledge Graph model that is phenotype and biological function-aware through integrating knowledge from multiple domains in a Neo4j, graph database. All known human genes were assessed through the model to identify potential new risk genes for anterior cruciate ligament (ACL) ruptures and Achilles tendinopathy (AT). Genes were prioritised and explored in a case-control study comparing participants with ACL ruptures (ACL-R), including a sub-group with non-contact mechanism injuries (ACL-NON), to uninjured control individuals (CON). After gene filtering, 3376 genes, including 411 genes identified through previous whole exome sequencing, were found to be potentially linked to AT and ACL ruptures. Four variants were prioritised: HSPG2:rs2291826A/G, HSPG2:rs2291827G/A, ITGB2:rs2230528C/T and FGF9:rs2274296C/T. The rs2230528 CC genotype was over-represented in the CON group compared to ACL-R (p < 0.001) and ACL-NON (p < 0.001) and the TT genotype and T allele were over-represented in the ACL-R group and ACL-NON compared to CON (p < 0.001) group. Several significant differences in distributions were noted for the gene-gene interactions: (HSPG2:rs2291826, rs2291827 and ITGB2:rs2230528) and (ITGB2:rs2230528 and FGF9:rs2297429). This study substantiates the efficiency of using a prior knowledge-driven in silico approach to identify candidate genes linked to tendon and ACL injuries. Our biomedical knowledge graph identified and, with further testing, highlighted novel associations of the ITGB2 gene which has not been explored in a genetic case control association study, with ACL rupture risk. We thus recommend a multistep approach including bioinformatics in conjunction with next generation sequencing technology to improve the discovery potential of genomics technologies in musculoskeletal soft tissue injuries.HighlightsA biomedical knowledge graph was modelled for musculoskeletal soft tissue injuries to efficiently identify candidate genes for genetic susceptibility analyses.The biomedical knowledge graph and sequencing data identified potential biologically relevant variants to explore susceptibility to common tendon and ligament injuries. Specifically genetic variants within the ITGB2 and FGF9 genes were associated with ACL risk.Novel allele combinations (HSPG2-ITGB2 and ITGB2-FGF9) showcase the potential effect of ITGB2 in influencing risk of ACL rupture.
Collapse
Affiliation(s)
- Senanile B Dlamini
- Division of Physiological Sciences, Department of Human Biology, University of Cape Town, Cape Town, South Africa
- Department of Human Biology, Health through Physical Activity Lifestyle and Sport Research Centre (HPALS), Newlands, South Africa
| | - Colleen J Saunders
- Division of Emergency Medicine, Department of Surgery, University of Cape Town, Cape Town, South Africa
- South African National Bioinformatics Institute, University of the Western Cape, Cape Town, South Africa
| | - Mary-Jessica N Laguette
- Division of Physiological Sciences, Department of Human Biology, University of Cape Town, Cape Town, South Africa
- Department of Human Biology, Health through Physical Activity Lifestyle and Sport Research Centre (HPALS), Newlands, South Africa
| | - Andrea Gibbon
- Division of Physiological Sciences, Department of Human Biology, University of Cape Town, Cape Town, South Africa
| | - Junaid Gamieldien
- South African National Bioinformatics Institute, University of the Western Cape, Cape Town, South Africa
| | - Malcolm Collins
- Division of Physiological Sciences, Department of Human Biology, University of Cape Town, Cape Town, South Africa
- Department of Human Biology, Health through Physical Activity Lifestyle and Sport Research Centre (HPALS), Newlands, South Africa
- Department of Human Biology, International Federation of Sports Medicine (FIMS) Collaborative Centre of Sports Medicine, University of Cape Town, Newlands, South Africa
| | - Alison V September
- Division of Physiological Sciences, Department of Human Biology, University of Cape Town, Cape Town, South Africa
- Department of Human Biology, Health through Physical Activity Lifestyle and Sport Research Centre (HPALS), Newlands, South Africa
- Department of Human Biology, International Federation of Sports Medicine (FIMS) Collaborative Centre of Sports Medicine, University of Cape Town, Newlands, South Africa
| |
Collapse
|
15
|
Cheng M, Jiang Y, Xu J, Mentis AFA, Wang S, Zheng H, Sahu SK, Liu L, Xu X. Spatially resolved transcriptomics: a comprehensive review of their technological advances, applications, and challenges. J Genet Genomics 2023; 50:625-640. [PMID: 36990426 DOI: 10.1016/j.jgg.2023.03.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 03/11/2023] [Accepted: 03/16/2023] [Indexed: 03/29/2023]
Abstract
The ability to explore life kingdoms is largely driven by innovations and breakthroughs in technology, from the invention of the microscope 350 years ago to the recent emergence of single-cell sequencing, by which the scientific community has been able to visualize life at an unprecedented resolution. Most recently, the Spatially Resolved Transcriptomics (SRT) technologies have filled the gap in probing the spatial or even three-dimensional organization of the molecular foundation behind the molecular mysteries of life, including the origin of different cellular populations developed from totipotent cells and human diseases. In this review, we introduce recent progresses and challenges on SRT from the perspectives of technologies and bioinformatic tools, as well as the representative SRT applications. With the currently fast-moving progress of the SRT technologies and promising results from early adopted research projects, we can foresee the bright future of such new tools in understanding life at the most profound analytical level.
Collapse
Affiliation(s)
| | - Yujia Jiang
- BGI-Hangzhou, Hangzhou, Zhejiang 310012, China
| | | | | | - Shuai Wang
- BGI-Hangzhou, Hangzhou, Zhejiang 310012, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | | | - Sunil Kumar Sahu
- BGI-Shenzhen, Shenzhen, Guangdong 518103, China; State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, Guangdong 518083, China
| | - Longqi Liu
- BGI-Hangzhou, Hangzhou, Zhejiang 310012, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Xun Xu
- BGI-Hangzhou, Hangzhou, Zhejiang 310012, China; BGI-Shenzhen, Shenzhen, Guangdong 518103, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; Guangdong Provincial Key Laboratory of Genome Read and Write, Shenzhen, Guangdong 518120, China.
| |
Collapse
|
16
|
Evangelista JE, Xie Z, Marino GB, Nguyen N, Clarke DB, Ma’ayan A. Enrichr-KG: bridging enrichment analysis across multiple libraries. Nucleic Acids Res 2023; 51:W168-W179. [PMID: 37166973 PMCID: PMC10320098 DOI: 10.1093/nar/gkad393] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 04/23/2023] [Accepted: 05/02/2023] [Indexed: 05/12/2023] Open
Abstract
Gene and protein set enrichment analysis is a critical step in the analysis of data collected from omics experiments. Enrichr is a popular gene set enrichment analysis web-server search engine that contains hundreds of thousands of annotated gene sets. While Enrichr has been useful in providing enrichment analysis with many gene set libraries from different categories, integrating enrichment results across libraries and domains of knowledge can further hypothesis generation. To this end, Enrichr-KG is a knowledge graph database and a web-server application that combines selected gene set libraries from Enrichr for integrative enrichment analysis and visualization. The enrichment results are presented as subgraphs made of nodes and links that connect genes to their enriched terms. In addition, users of Enrichr-KG can add gene-gene links, as well as predicted genes to the subgraphs. This graphical representation of cross-library results with enriched and predicted genes can illuminate hidden associations between genes and annotated enriched terms from across datasets and resources. Enrichr-KG currently serves 26 gene set libraries from different categories that include transcription, pathways, ontologies, diseases/drugs, and cell types. To demonstrate the utility of Enrichr-KG we provide several case studies. Enrichr-KG is freely available at: https://maayanlab.cloud/enrichr-kg.
Collapse
Affiliation(s)
- John Erol Evangelista
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, NY, NY, USA
| | - Zhuorui Xie
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, NY, NY, USA
| | - Giacomo B Marino
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, NY, NY, USA
| | - Nhi Nguyen
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, NY, NY, USA
| | - Daniel J B Clarke
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, NY, NY, USA
| | - Avi Ma’ayan
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, NY, NY, USA
| |
Collapse
|
17
|
Zhang B, Shi H, Wang H. Machine Learning and AI in Cancer Prognosis, Prediction, and Treatment Selection: A Critical Approach. J Multidiscip Healthc 2023; 16:1779-1791. [PMID: 37398894 PMCID: PMC10312208 DOI: 10.2147/jmdh.s410301] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 06/12/2023] [Indexed: 07/04/2023] Open
Abstract
Cancer is a leading cause of morbidity and mortality worldwide. While progress has been made in the diagnosis, prognosis, and treatment of cancer patients, individualized and data-driven care remains a challenge. Artificial intelligence (AI), which is used to predict and automate many cancers, has emerged as a promising option for improving healthcare accuracy and patient outcomes. AI applications in oncology include risk assessment, early diagnosis, patient prognosis estimation, and treatment selection based on deep knowledge. Machine learning (ML), a subset of AI that enables computers to learn from training data, has been highly effective at predicting various types of cancer, including breast, brain, lung, liver, and prostate cancer. In fact, AI and ML have demonstrated greater accuracy in predicting cancer than clinicians. These technologies also have the potential to improve the diagnosis, prognosis, and quality of life of patients with various illnesses, not just cancer. Therefore, it is important to improve current AI and ML technologies and to develop new programs to benefit patients. This article examines the use of AI and ML algorithms in cancer prediction, including their current applications, limitations, and future prospects.
Collapse
Affiliation(s)
- Bo Zhang
- Jinling Institute of Science and Technology, Nanjing City, Jiangsu Province, People’s Republic of China
| | - Huiping Shi
- Jinling Institute of Science and Technology, Nanjing City, Jiangsu Province, People’s Republic of China
| | - Hongtao Wang
- School of Life Science, Tonghua Normal University, Tonghua City, Jilin Province, People’s Republic of China
| |
Collapse
|
18
|
Aldughayfiq B, Ashfaq F, Jhanjhi NZ, Humayun M. Capturing Semantic Relationships in Electronic Health Records Using Knowledge Graphs: An Implementation Using MIMIC III Dataset and GraphDB. Healthcare (Basel) 2023; 11:1762. [PMID: 37372880 DOI: 10.3390/healthcare11121762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 06/03/2023] [Accepted: 06/12/2023] [Indexed: 06/29/2023] Open
Abstract
Electronic health records (EHRs) are an increasingly important source of information for healthcare professionals and researchers. However, EHRs are often fragmented, unstructured, and difficult to analyze due to the heterogeneity of the data sources and the sheer volume of information. Knowledge graphs have emerged as a powerful tool for capturing and representing complex relationships within large datasets. In this study, we explore the use of knowledge graphs to capture and represent complex relationships within EHRs. Specifically, we address the following research question: Can a knowledge graph created using the MIMIC III dataset and GraphDB effectively capture semantic relationships within EHRs and enable more efficient and accurate data analysis? We map the MIMIC III dataset to an ontology using text refinement and Protege; then, we create a knowledge graph using GraphDB and use SPARQL queries to retrieve and analyze information from the graph. Our results demonstrate that knowledge graphs can effectively capture semantic relationships within EHRs, enabling more efficient and accurate data analysis. We provide examples of how our implementation can be used to analyze patient outcomes and identify potential risk factors. Our results demonstrate that knowledge graphs are an effective tool for capturing semantic relationships within EHRs, enabling a more efficient and accurate data analysis. Our implementation provides valuable insights into patient outcomes and potential risk factors, contributing to the growing body of literature on the use of knowledge graphs in healthcare. In particular, our study highlights the potential of knowledge graphs to support decision-making and improve patient outcomes by enabling a more comprehensive and holistic analysis of EHR data. Overall, our research contributes to a better understanding of the value of knowledge graphs in healthcare and lays the foundation for further research in this area.
Collapse
Affiliation(s)
- Bader Aldughayfiq
- Department of Information Systems, College of Computer and Information Sciences, Jouf University, Sakaka 72388, Saudi Arabia
| | - Farzeen Ashfaq
- School of Computer Science-SCS, Taylor's University, Subang Jaya 47500, Malaysia
| | - N Z Jhanjhi
- School of Computer Science-SCS, Taylor's University, Subang Jaya 47500, Malaysia
| | - Mamoona Humayun
- Department of Information Systems, College of Computer and Information Sciences, Jouf University, Sakaka 72388, Saudi Arabia
| |
Collapse
|
19
|
Zhang DY, Cui WQ, Hou L, Yang J, Lyu LY, Wang ZY, Linghu KG, He WB, Yu H, Hu YJ. Expanding potential targets of herbal chemicals by node2vec based on herb-drug interactions. Chin Med 2023; 18:64. [PMID: 37264453 PMCID: PMC10233865 DOI: 10.1186/s13020-023-00763-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 05/01/2023] [Indexed: 06/03/2023] Open
Abstract
BACKGROUND The identification of chemical-target interaction is key to pharmaceutical research and development, but the unclear materials basis and complex mechanisms of traditional medicine (TM) make it difficult, especially for low-content chemicals which are hard to test in experiments. In this research, we aim to apply the node2vec algorithm in the context of drug-herb interactions for expanding potential targets and taking advantage of molecular docking and experiments for verification. METHODS Regarding the widely reported risks between cardiovascular drugs and herbs, Salvia miltiorrhiza (Danshen, DS) and Ligusticum chuanxiong (Chuanxiong, CX), which are widely used in the treatment of cardiovascular disease (CVD), and approved drugs for CVD form the new dataset as an example. Three data groups DS-drug, CX-drug, and DS-CX-drug were applied to serve as the context of drug-herb interactions for link prediction. Three types of datasets were set under three groups, containing information from chemical-target connection (CTC), chemical-chemical connection (CCC) and protein-protein interaction (PPI) in increasing steps. Five algorithms, including node2vec, were applied as comparisons. Molecular docking and pharmacological experiments were used for verification. RESULTS Node2vec represented the best performance with average AUROC and AP values of 0.91 on the datasets "CTC, CCC, PPI". Targets of 32 herbal chemicals were identified within 43 predicted edges of herbal chemicals and drug targets. Among them, 11 potential chemical-drug target interactions showed better binding affinity by molecular docking. Further pharmacological experiments indicated caffeic acid increased the thermal stability of the protein GGT1 and ligustilide and low-content chemical neocryptotanshinone induced mRNA change of FGF2 and MTNR1A, respectively. CONCLUSIONS The analytical framework and methods established in the study provide an important reference for researchers in discovering herb-drug interactions, alerting clinical risks, and understanding complex mechanisms of TM.
Collapse
Affiliation(s)
- Dai-Yan Zhang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, 999078, Macao, China
| | - Wen-Qing Cui
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, 999078, Macao, China
| | - Ling Hou
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, 999078, Macao, China
| | - Jing Yang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, 999078, Macao, China
| | - Li-Yang Lyu
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, 999078, Macao, China
| | - Ze-Yu Wang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, 999078, Macao, China
| | - Ke-Gang Linghu
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, 999078, Macao, China
| | - Wen-Bin He
- Shanxi Key Laboratory of Chinese Medicine Encephalopathy, Shanxi University of Chinese Medicine, Taiyuan, China
| | - Hua Yu
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, 999078, Macao, China
| | - Yuan-Jia Hu
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, 999078, Macao, China.
- DPM, Faculty of Health Sciences, University of Macau, Macao, China.
| |
Collapse
|
20
|
Su C, Hou Y, Zhou M, Rajendran S, Maasch JRA, Abedi Z, Zhang H, Bai Z, Cuturrufo A, Guo W, Chaudhry FF, Ghahramani G, Tang J, Cheng F, Li Y, Zhang R, DeKosky ST, Bian J, Wang F. Biomedical discovery through the integrative biomedical knowledge hub (iBKH). iScience 2023; 26:106460. [PMID: 37020958 PMCID: PMC10068563 DOI: 10.1016/j.isci.2023.106460] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 09/20/2022] [Accepted: 03/16/2023] [Indexed: 04/01/2023] Open
Abstract
The abundance of biomedical knowledge gained from biological experiments and clinical practices is an invaluable resource for biomedicine. The emerging biomedical knowledge graphs (BKGs) provide an efficient and effective way to manage the abundant knowledge in biomedical and life science. In this study, we created a comprehensive BKG called the integrative Biomedical Knowledge Hub (iBKH) by harmonizing and integrating information from diverse biomedical resources. To make iBKH easily accessible for biomedical research, we developed a web-based, user-friendly graphical portal that allows fast and interactive knowledge retrieval. Additionally, we also implemented an efficient and scalable graph learning pipeline for discovering novel biomedical knowledge in iBKH. As a proof of concept, we performed our iBKH-based method for computational in-silico drug repurposing for Alzheimer's disease. The iBKH is publicly available.
Collapse
Affiliation(s)
- Chang Su
- Department of Health Service Administration and Policy, College of Public Health, Temple University, Philadelphia, PA 19122, USA
| | - Yu Hou
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA
| | - Manqi Zhou
- Department of Computational Biology, Cornell University, Ithaca, NY 14850, USA
| | - Suraj Rajendran
- Tri-Institutional Computational Biology & Medicine Program, Cornell University, New York, NY 10065, USA
| | | | - Zehra Abedi
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA
| | - Haotan Zhang
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
| | - Zilong Bai
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA
| | | | - Winston Guo
- Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Fayzan F. Chaudhry
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
| | - Gregory Ghahramani
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
| | - Jian Tang
- Mila-Quebec AI Institute and HEC Montreal, Montreal, QC H2S 3H1, Canada
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Yue Li
- School of Computer Science, McGill University, Montreal, QC H3A 0C6, Canada
| | - Rui Zhang
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA
| | - Steven T. DeKosky
- Department of Neurology, College of Medicine, University of Florida, Gainesville, FL 32610, USA
| | - Jiang Bian
- Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32610, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA
| |
Collapse
|
21
|
Peng C, Xia F, Naseriparsa M, Osborne F. Knowledge Graphs: Opportunities and Challenges. Artif Intell Rev 2023; 56:1-32. [PMID: 37362886 PMCID: PMC10068207 DOI: 10.1007/s10462-023-10465-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/09/2023] [Indexed: 04/05/2023]
Abstract
With the explosive growth of artificial intelligence (AI) and big data, it has become vitally important to organize and represent the enormous volume of knowledge appropriately. As graph data, knowledge graphs accumulate and convey knowledge of the real world. It has been well-recognized that knowledge graphs effectively represent complex information; hence, they rapidly gain the attention of academia and industry in recent years. Thus to develop a deeper understanding of knowledge graphs, this paper presents a systematic overview of this field. Specifically, we focus on the opportunities and challenges of knowledge graphs. We first review the opportunities of knowledge graphs in terms of two aspects: (1) AI systems built upon knowledge graphs; (2) potential application fields of knowledge graphs. Then, we thoroughly discuss severe technical challenges in this field, such as knowledge graph embeddings, knowledge acquisition, knowledge graph completion, knowledge fusion, and knowledge reasoning. We expect that this survey will shed new light on future research and the development of knowledge graphs.
Collapse
Affiliation(s)
- Ciyuan Peng
- Institute of Innovation, Science and Sustainability, Federation University Australia, Ballarat, 3353 VIC Australia
| | - Feng Xia
- School of Computing Technologies, RMIT University, Melbourne, 3000 VIC Australia
| | - Mehdi Naseriparsa
- Global Professional School, Federation University Australia, Ballarat, 3353 VIC Australia
| | - Francesco Osborne
- Knowledge Media Institute, The Open University, Milton Keynes, MK7 6AA UK
| |
Collapse
|
22
|
Sanders LM, Scott RT, Yang JH, Qutub AA, Garcia Martin H, Berrios DC, Hastings JJA, Rask J, Mackintosh G, Hoarfrost AL, Chalk S, Kalantari J, Khezeli K, Antonsen EL, Babdor J, Barker R, Baranzini SE, Beheshti A, Delgado-Aparicio GM, Glicksberg BS, Greene CS, Haendel M, Hamid AA, Heller P, Jamieson D, Jarvis KJ, Komarova SV, Komorowski M, Kothiyal P, Mahabal A, Manor U, Mason CE, Matar M, Mias GI, Miller J, Myers JG, Nelson C, Oribello J, Park SM, Parsons-Wingerter P, Prabhu RK, Reynolds RJ, Saravia-Butler A, Saria S, Sawyer A, Singh NK, Snyder M, Soboczenski F, Soman K, Theriot CA, Van Valen D, Venkateswaran K, Warren L, Worthey L, Zitnik M, Costes SV. Biological research and self-driving labs in deep space supported by artificial intelligence. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-023-00618-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]
|
23
|
A Quick Prototype for Assessing OpenIE Knowledge Graph-Based Question-Answering Systems. INFORMATION 2023. [DOI: 10.3390/info14030186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023] Open
Abstract
Due to the rapid growth of knowledge graphs (KG) as representational learning methods in recent years, question-answering approaches have received increasing attention from academia and industry. Question-answering systems use knowledge graphs to organize, navigate, search and connect knowledge entities. Managing such systems requires a thorough understanding of the underlying graph-oriented structures and, at the same time, an appropriate query language, such as SPARQL, to access relevant data. Natural language interfaces are needed to enable non-technical users to query ever more complex data. The paper proposes a question-answering approach to support end users in querying graph-oriented knowledge bases. The system pipeline is composed of two main modules: one is dedicated to translating a natural language query submitted by the user into a triple of the form <subject, predicate, object>, while the second module implements knowledge graph embedding (KGE) models, exploiting the previous module triple and retrieving the answer to the question. Our framework delivers a fast OpenIE-based knowledge extraction system and a graph-based answer prediction model for question-answering tasks. The system was designed by leveraging existing tools to accomplish a simple prototype for fast experimentation, especially across different knowledge domains, with the added benefit of reducing development time and costs. The experimental results confirm the effectiveness of the proposed system, which provides promising performance, as assessed at the module level. In particular, in some cases, the system outperforms the literature. Finally, a use case example shows the KG generated by user questions in a graphical interface provided by an ad-hoc designed web application.
Collapse
|
24
|
Carvalho RMS, Oliveira D, Pesquita C. Knowledge Graph Embeddings for ICU readmission prediction. BMC Med Inform Decis Mak 2023; 23:12. [PMID: 36658526 PMCID: PMC9850812 DOI: 10.1186/s12911-022-02070-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 11/28/2022] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Intensive Care Unit (ICU) readmissions represent both a health risk for patients,with increased mortality rates and overall health deterioration, and a financial burden for healthcare facilities. As healthcare became more data-driven with the introduction of Electronic Health Records (EHR), machine learning methods have been applied to predict ICU readmission risk. However, these methods disregard the meaning and relationships of data objects and work blindly over clinical data without taking into account scientific knowledge and context. Ontologies and Knowledge Graphs can help bridge this gap between data and scientific context, as they are computational artefacts that represent the entities of a domain and their relationships to each other in a formalized way. METHODS AND RESULTS We have developed an approach that enriches EHR data with semantic annotations to ontologies to build a Knowledge Graph. A patient's ICU stay is represented by Knowledge Graph embeddings in a contextualized manner, which are used by machine learning models to predict 30-days ICU readmissions. This approach is based on several contributions: (1) an enrichment of the MIMIC-III dataset with patient-oriented annotations to various biomedical ontologies; (2) a Knowledge Graph that defines patient data with biomedical ontologies; (3) a predictive model of ICU readmission risk that uses Knowledge Graph embeddings; (4) a variant of the predictive model that targets different time points during an ICU stay. Our predictive approaches outperformed both a baseline and state-of-the-art works achieving a mean Area Under the Receiver Operating Characteristic Curve of 0.827 and an Area Under the Precision-Recall Curve of 0.691. The application of this novel approach to help clinicians decide whether a patient can be discharged has the potential to prevent the readmission of [Formula: see text] of Intensive Care Unit patients, without unnecessarily prolonging the stay of those who would not require it. CONCLUSION The coupling of semantic annotation and Knowledge Graph embeddings affords two clear advantages: they consider scientific context and they are able to build representations of EHR information of different types in a common format. This work demonstrates the potential for impact that integrating ontologies and Knowledge Graphs into clinical machine learning applications can have.
Collapse
Affiliation(s)
- Ricardo M. S. Carvalho
- grid.9983.b0000 0001 2181 4263LASIGE, Faculty of Sciences, University of Lisbon, Lisbon, Portugal
| | - Daniela Oliveira
- grid.9983.b0000 0001 2181 4263LASIGE, Faculty of Sciences, University of Lisbon, Lisbon, Portugal
| | - Catia Pesquita
- grid.9983.b0000 0001 2181 4263LASIGE, Faculty of Sciences, University of Lisbon, Lisbon, Portugal
| |
Collapse
|
25
|
Li G, Siddharth L, Luo J. Embedding knowledge graph of patent metadata to measure knowledge proximity. J Assoc Inf Sci Technol 2023. [DOI: 10.1002/asi.24736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Affiliation(s)
- Guangtong Li
- Data‐Driven Innovation Lab, Engineering Product Development Pillar Singapore University of Technology and Design Singapore Singapore
| | - L. Siddharth
- Data‐Driven Innovation Lab, Engineering Product Development Pillar Singapore University of Technology and Design Singapore Singapore
| | - Jianxi Luo
- Data‐Driven Innovation Lab, Engineering Product Development Pillar Singapore University of Technology and Design Singapore Singapore
| |
Collapse
|
26
|
Nourani E, Asgari E, McHardy AC, Mofrad MRK. TripletProt: Deep Representation Learning of Proteins Based On Siamese Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3744-3753. [PMID: 34460382 DOI: 10.1109/tcbb.2021.3108718] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Pretrained representations have recently gained attention in various machine learning applications. Nonetheless, the high computational costs associated with training these models have motivated alternative approaches for representation learning. Herein we introduce TripletProt, a new approach for protein representation learning based on the Siamese neural networks. Representation learning of biological entities which capture essential features can alleviate many of the challenges associated with supervised learning in bioinformatics. The most important distinction of our proposed method is relying on the protein-protein interaction (PPI) network. The computational cost of the generated representations for any potential application is significantly lower than comparable methods since the length of the representations is significantly smaller than that in other approaches. TripletProt offers great potentials for the protein informatics tasks and can be widely applied to similar tasks. We evaluate TripletProt comprehensively in protein functional annotation tasks including sub-cellular localization (14 categories) and gene ontology prediction (more than 2000 classes), which are both challenging multi-class, multi-label classification machine learning problems. We compare the performance of TripletProt with the state-of-the-art approaches including a recurrent language model-based approach (i.e., UniRep), as well as a protein-protein interaction (PPI) network and sequence-based method (i.e., DeepGO). Our TripletProt showed an overall improvement of F1 score in the above mentioned comprehensive functional annotation tasks, solely relying on the PPI network. Availability: The source code and datasets are available at https://github.com/EsmaeilNourani/TripletProt.
Collapse
|
27
|
Abstract
As genetic circuits become more sophisticated, the size and complexity of data about their designs increase. The data captured goes beyond genetic sequences alone; information about circuit modularity and functional details improves comprehension, performance analysis, and design automation techniques. However, new data types expose new challenges around the accessibility, visualization, and usability of design data (and metadata). Here, we present a method to transform circuit designs into networks and showcase its potential to enhance the utility of design data. Since networks are dynamic structures, initial graphs can be interactively shaped into subnetworks of relevant information based on requirements such as the hierarchy of biological parts or interactions between entities. A significant advantage of a network approach is the ability to scale abstraction, providing an automatic sliding level of detail that further tailors the visualization to a given situation. Additionally, several visual changes can be applied, such as coloring or clustering nodes based on types (e.g., genes or promoters), resulting in easier comprehension from a user perspective. This approach allows circuit designs to be coupled to other networks, such as metabolic pathways or implementation protocols captured in graph-like formats. We advocate using networks to structure, access, and improve synthetic biology information.
Collapse
Affiliation(s)
- Matthew Crowther
- School
of Computing, Newcastle University, Newcastle Upon Tyne NE4
5TG, United Kingdom
- Centro
de Biotecnología y Genómica de Plantas, Universidad
Politécnica de Madrid, Instituto
Nacional de Investigación y Tecnología Agraria y Alimentaria
(INIA-CSIC), Pozuelo
de Alarcón, 28223 Madrid, Spain
| | - Anil Wipat
- School
of Computing, Newcastle University, Newcastle Upon Tyne NE4
5TG, United Kingdom
| | - Ángel Goñi-Moreno
- Centro
de Biotecnología y Genómica de Plantas, Universidad
Politécnica de Madrid, Instituto
Nacional de Investigación y Tecnología Agraria y Alimentaria
(INIA-CSIC), Pozuelo
de Alarcón, 28223 Madrid, Spain
| |
Collapse
|
28
|
Gao Z, Ding P, Xu R. KG-Predict: A knowledge graph computational framework for drug repurposing. J Biomed Inform 2022; 132:104133. [PMID: 35840060 PMCID: PMC9595135 DOI: 10.1016/j.jbi.2022.104133] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 06/18/2022] [Accepted: 07/03/2022] [Indexed: 11/26/2022]
Abstract
The emergence of large-scale phenotypic, genetic, and other multi-model biochemical data has offered unprecedented opportunities for drug discovery including drug repurposing. Various knowledge graph-based methods have been developed to integrate and analyze complex and heterogeneous data sources to find new therapeutic applications for existing drugs. However, existing methods have limitations in modeling and capturing context-sensitive inter-relationships among tens of thousands of biomedical entities. In this paper, we developed KG-Predict: a knowledge graph computational framework for drug repurposing. We first integrated multiple types of entities and relations from various genotypic and phenotypic databases to construct a knowledge graph termed GP-KG. GP-KG was composed of 1,246,726 associations between 61,146 entities. KG-Predict then aggregated the heterogeneous topological and semantic information from GP-KG to learn low-dimensional representations of entities and relations, and further utilized these representations to infer new drug-disease interactions. In cross-validation experiments, KG-Predict achieved high performances [AUROC (the area under receiver operating characteristic) = 0.981, AUPR (the area under precision-recall) = 0.409 and MRR (the mean reciprocal rank) = 0.261], outperforming other state-of-art graph embedding methods. We applied KG-Predict in identifying novel repositioned candidate drugs for Alzheimer's disease (AD) and showed that KG-Predict prioritized both FDA-approved and active clinical trial anti-AD drugs among the top (AUROC = 0.868 and AUPR = 0.364).
Collapse
Affiliation(s)
- Zhenxiang Gao
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, 44106 OH, USA.
| | - Pingjian Ding
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, 44106 OH, USA.
| | - Rong Xu
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, 44106 OH, USA.
| |
Collapse
|
29
|
Knowledge-graph-based cell-cell communication inference for spatially resolved transcriptomic data with SpaTalk. Nat Commun 2022; 13:4429. [PMID: 35908020 PMCID: PMC9338929 DOI: 10.1038/s41467-022-32111-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 07/18/2022] [Indexed: 12/19/2022] Open
Abstract
Spatially resolved transcriptomics provides genetic information in space toward elucidation of the spatial architecture in intact organs and the spatially resolved cell-cell communications mediating tissue homeostasis, development, and disease. To facilitate inference of spatially resolved cell-cell communications, we here present SpaTalk, which relies on a graph network and knowledge graph to model and score the ligand-receptor-target signaling network between spatially proximal cells by dissecting cell-type composition through a non-negative linear model and spatial mapping between single-cell transcriptomic and spatially resolved transcriptomic data. The benchmarked performance of SpaTalk on public single-cell spatial transcriptomic datasets is superior to that of existing inference methods. Then we apply SpaTalk to STARmap, Slide-seq, and 10X Visium data, revealing the in-depth communicative mechanisms underlying normal and disease tissues with spatial structure. SpaTalk can uncover spatially resolved cell-cell communications for single-cell and spot-based spatially resolved transcriptomic data universally, providing valuable insights into spatial inter-cellular tissue dynamics. Cell-cell communication is a vital feature involving numerous biological processes. Here, the authors develop SpaTalk, a cell-cell communication inference method using knowledge graph for spatially resolved transcriptomic data, providing valuable insights into spatial intercellular tissue dynamics.
Collapse
|
30
|
Pan X, Lin X, Cao D, Zeng X, Yu PS, He L, Nussinov R, Cheng F. Deep learning for drug repurposing: Methods, databases, and applications. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1597] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Xiaoqin Pan
- School of Computer Science and Engineering Hunan University Changsha Hunan China
| | - Xuan Lin
- School of Computer Science Xiangtan University Xiangtan China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education Xiangtan University Xiangtan China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Xiangxiang Zeng
- School of Computer Science and Engineering Hunan University Changsha Hunan China
| | - Philip S. Yu
- Department of Computer Science University of Illinois at Chicago Chicago Illinois USA
| | - Lifang He
- Department of Computer Science and Engineering Lehigh University Bethlehem Pennsylvania USA
| | - Ruth Nussinov
- Computational Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research National Cancer Institute at Frederick Frederick Maryland USA
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine Tel Aviv University Tel Aviv Israel
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic Cleveland Ohio USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine Case Western Reserve University Cleveland Ohio USA
- Case Comprehensive Cancer Center Case Western Reserve University School of Medicine Cleveland Ohio USA
| |
Collapse
|
31
|
Zhu C, Yang Z, Xia X, Li N, Zhong F, Liu L. Multimodal reasoning based on knowledge graph embedding for specific diseases. Bioinformatics 2022; 38:2235-2245. [PMID: 35150235 PMCID: PMC9004655 DOI: 10.1093/bioinformatics/btac085] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 01/06/2022] [Accepted: 02/07/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Knowledge Graph (KG) is becoming increasingly important in the biomedical field. Deriving new and reliable knowledge from existing knowledge by KG embedding technology is a cutting-edge method. Some add a variety of additional information to aid reasoning, namely multimodal reasoning. However, few works based on the existing biomedical KGs are focused on specific diseases. RESULTS This work develops a construction and multimodal reasoning process of Specific Disease Knowledge Graphs (SDKGs). We construct SDKG-11, a SDKG set including five cancers, six non-cancer diseases, a combined Cancer5 and a combined Diseases11, aiming to discover new reliable knowledge and provide universal pre-trained knowledge for that specific disease field. SDKG-11 is obtained through original triplet extraction, standard entity set construction, entity linking and relation linking. We implement multimodal reasoning by reverse-hyperplane projection for SDKGs based on structure, category and description embeddings. Multimodal reasoning improves pre-existing models on all SDKGs using entity prediction task as the evaluation protocol. We verify the model's reliability in discovering new knowledge by manually proofreading predicted drug-gene, gene-disease and disease-drug pairs. Using embedding results as initialization parameters for the biomolecular interaction classification, we demonstrate the universality of embedding models. AVAILABILITY AND IMPLEMENTATION The constructed SDKG-11 and the implementation by TensorFlow are available from https://github.com/ZhuChaoY/SDKG-11. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chaoyu Zhu
- Institute of Biomedical Sciences and School of Basic Medical Science, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Zhihao Yang
- College of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Xiaoqiong Xia
- Institute of Biomedical Sciences and School of Basic Medical Science, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Nan Li
- College of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Fan Zhong
- To whom correspondence should be addressed. or
| | - Lei Liu
- To whom correspondence should be addressed. or
| |
Collapse
|
32
|
Zhu F, Li F, Deng L, Meng F, Liang Z. Protein Interaction Network Reconstruction with a Structural Gated Attention Deep Model by Incorporating Network Structure Information. J Chem Inf Model 2022; 62:258-273. [PMID: 35005980 DOI: 10.1021/acs.jcim.1c00982] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Protein-protein interactions (PPIs) provide a physical basis of molecular communications for a wide range of biological processes in living cells. Establishing the PPI network has become a fundamental but essential task for a better understanding of biological events and disease pathogenesis. Although many machine learning algorithms have been employed to predict PPIs, with only protein sequence information as the training features, these models suffer from low robustness and prediction accuracy. In this study, a new deep-learning-based framework named the Structural Gated Attention Deep (SGAD) model was proposed to improve the performance of PPI network reconstruction (PINR). The improved predictive performances were achieved by augmenting multiple protein sequence descriptors, the topological features and information flow of the PPI network, which were further implemented with a gating mechanism to improve its robustness to noise. On 11 independent test data sets and one combined data set, SGAD yielded area under the curve values of approximately 0.83-0.93, outperforming other models. Furthermore, the SGAD ensemble can learn more characteristics information on protein pairs through a two-layer neural network, serving as a powerful tool in the exploration of PPI biological space.
Collapse
Affiliation(s)
- Fei Zhu
- School of Computer Science and Technology, Soochow University, Suzhou 215 006, China
| | - Feifei Li
- School of Computer Science and Technology, Soochow University, Suzhou 215 006, China
| | - Lei Deng
- School of Computer Science and Technology, Soochow University, Suzhou 215 006, China
| | - Fanwang Meng
- Department of Chemistry and Chemical Biology, McMaster University, Hamilton, Ontario L8S 4L8, Canada
| | - Zhongjie Liang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215 006, China
| |
Collapse
|
33
|
Ye Q, Hsieh CY, Yang Z, Kang Y, Chen J, Cao D, He S, Hou T. A unified drug-target interaction prediction framework based on knowledge graph and recommendation system. Nat Commun 2021; 12:6775. [PMID: 34811351 PMCID: PMC8635420 DOI: 10.1038/s41467-021-27137-3] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 11/05/2021] [Indexed: 02/06/2023] Open
Abstract
Prediction of drug-target interactions (DTI) plays a vital role in drug development in various areas, such as virtual screening, drug repurposing and identification of potential drug side effects. Despite extensive efforts have been invested in perfecting DTI prediction, existing methods still suffer from the high sparsity of DTI datasets and the cold start problem. Here, we develop KGE_NFM, a unified framework for DTI prediction by combining knowledge graph (KG) and recommendation system. This framework firstly learns a low-dimensional representation for various entities in the KG, and then integrates the multimodal information via neural factorization machine (NFM). KGE_NFM is evaluated under three realistic scenarios, and achieves accurate and robust predictions on four benchmark datasets, especially in the scenario of the cold start for proteins. Our results indicate that KGE_NFM provides valuable insight to integrate KG and recommendation system-based techniques into a unified framework for novel DTI discovery.
Collapse
Affiliation(s)
- Qing Ye
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang China ,grid.13402.340000 0004 1759 700XCollege of Control Science and Engineering, Zhejiang University, Hangzhou, 310027 Zhejiang China ,grid.13402.340000 0004 1759 700XState Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058 China
| | - Chang-Yu Hsieh
- Tencent Quantum Laboratory, Shenzhen, 518057 Guangdong China
| | - Ziyi Yang
- Tencent Quantum Laboratory, Shenzhen, 518057 Guangdong China
| | - Yu Kang
- grid.13402.340000 0004 1759 700XInnovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 Zhejiang China
| | - Jiming Chen
- grid.13402.340000 0004 1759 700XCollege of Control Science and Engineering, Zhejiang University, Hangzhou, 310027 Zhejiang China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, Hunan, China.
| | - Shibo He
- College of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China. .,State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang, 310058, China.
| |
Collapse
|
34
|
Zeng X, Tu X, Liu Y, Fu X, Su Y. Toward better drug discovery with knowledge graph. Curr Opin Struct Biol 2021; 72:114-126. [PMID: 34649044 DOI: 10.1016/j.sbi.2021.09.003] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Revised: 08/18/2021] [Accepted: 09/06/2021] [Indexed: 01/08/2023]
Abstract
Drug discovery is the process of new drug identification. This process is driven by the increasing data from existing chemical libraries and data banks. The knowledge graph is introduced to the domain of drug discovery for imposing an explicit structure to integrate heterogeneous biomedical data. The graph can provide structured relations among multiple entities and unstructured semantic relations associated with entities. In this review, we summarize knowledge graph-based works that implement drug repurposing and adverse drug reaction prediction for drug discovery. As knowledge representation learning is a common way to explore knowledge graphs for prediction problems, we introduce several representative embedding models to provide a comprehensive understanding of knowledge representation learning.
Collapse
Affiliation(s)
- Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, Changsha, 410086, China
| | - Xinqi Tu
- College of Information Science and Engineering, Hunan University, Changsha, 410086, China
| | - Yuansheng Liu
- College of Information Science and Engineering, Hunan University, Changsha, 410086, China.
| | - Xiangzheng Fu
- College of Information Science and Engineering, Hunan University, Changsha, 410086, China
| | - Yansen Su
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, 230601, China
| |
Collapse
|
35
|
Improving Risk Assessment of Miscarriage During Pregnancy with Knowledge Graph Embeddings. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2021; 5:359-381. [DOI: 10.1007/s41666-021-00096-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 02/28/2021] [Accepted: 03/03/2021] [Indexed: 01/08/2023]
|
36
|
Biomedical Knowledge Graph Embeddings for Personalized Medicine. PROGRESS IN ARTIFICIAL INTELLIGENCE 2021. [DOI: 10.1007/978-3-030-86230-5_46] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|