1
|
Romano JD, Truong V, Kumar R, Venkatesan M, Graham BE, Hao Y, Matsumoto N, Li X, Wang Z, Ritchie MD, Shen L, Moore JH. The Alzheimer's Knowledge Base: A Knowledge Graph for Alzheimer Disease Research. J Med Internet Res 2024; 26:e46777. [PMID: 38635981 PMCID: PMC11066745 DOI: 10.2196/46777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 06/23/2023] [Accepted: 11/07/2023] [Indexed: 04/20/2024] Open
Abstract
BACKGROUND As global populations age and become susceptible to neurodegenerative illnesses, new therapies for Alzheimer disease (AD) are urgently needed. Existing data resources for drug discovery and repurposing fail to capture relationships central to the disease's etiology and response to drugs. OBJECTIVE We designed the Alzheimer's Knowledge Base (AlzKB) to alleviate this need by providing a comprehensive knowledge representation of AD etiology and candidate therapeutics. METHODS We designed the AlzKB as a large, heterogeneous graph knowledge base assembled using 22 diverse external data sources describing biological and pharmaceutical entities at different levels of organization (eg, chemicals, genes, anatomy, and diseases). AlzKB uses a Web Ontology Language 2 ontology to enforce semantic consistency and allow for ontological inference. We provide a public version of AlzKB and allow users to run and modify local versions of the knowledge base. RESULTS AlzKB is freely available on the web and currently contains 118,902 entities with 1,309,527 relationships between those entities. To demonstrate its value, we used graph data science and machine learning to (1) propose new therapeutic targets based on similarities of AD to Parkinson disease and (2) repurpose existing drugs that may treat AD. For each use case, AlzKB recovers known therapeutic associations while proposing biologically plausible new ones. CONCLUSIONS AlzKB is a new, publicly available knowledge resource that enables researchers to discover complex translational associations for AD drug discovery. Through 2 use cases, we show that it is a valuable tool for proposing novel therapeutic hypotheses based on public biomedical knowledge.
Collapse
Affiliation(s)
- Joseph D Romano
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Center of Excellence in Environmental Toxicology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Van Truong
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Rachit Kumar
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Medical Scientist Training Program, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Mythreye Venkatesan
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States
| | - Britney E Graham
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States
| | - Yun Hao
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Nick Matsumoto
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States
| | - Xi Li
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States
| | - Zhiping Wang
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States
| | - Marylyn D Ritchie
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Li Shen
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States
| |
Collapse
|
2
|
Ayuso-Muñoz A, Prieto-Santamaría L, Ugarte-Carro E, Serrano E, Rodríguez-González A. Uncovering hidden therapeutic indications through drug repurposing with graph neural networks and heterogeneous data. Artif Intell Med 2023; 145:102687. [PMID: 37925215 DOI: 10.1016/j.artmed.2023.102687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 10/04/2023] [Accepted: 10/13/2023] [Indexed: 11/06/2023]
Abstract
Drug repurposing has gained the attention of many in the recent years. The practice of repurposing existing drugs for new therapeutic uses helps to simplify the drug discovery process, which in turn reduces the costs and risks that are associated with de novo development. Representing biomedical data in the form of a graph is a simple and effective method to depict the underlying structure of the information. Using deep neural networks in combination with this data represents a promising approach to address drug repurposing. This paper presents BEHOR a more comprehensive version of the REDIRECTION model, which was previously presented. Both versions utilize the DISNET biomedical graph as the primary source of information, providing the model with extensive and intricate data to tackle the drug repurposing challenge. This new version's results for the reported metrics in the RepoDB test are 0.9604 for AUROC and 0.9518 for AUPRC. Additionally, a discussion is provided regarding some of the novel predictions to demonstrate the reliability of the model. The authors believe that BEHOR holds promise for generating drug repurposing hypotheses and could greatly benefit the field.
Collapse
Affiliation(s)
- Adrián Ayuso-Muñoz
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain; Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, 28223 Pozuelo de Alarcón, Madrid, Spain.
| | - Lucía Prieto-Santamaría
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain; Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, 28223 Pozuelo de Alarcón, Madrid, Spain.
| | - Esther Ugarte-Carro
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain; Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, 28223 Pozuelo de Alarcón, Madrid, Spain.
| | - Emilio Serrano
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain.
| | - Alejandro Rodríguez-González
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain; Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, 28223 Pozuelo de Alarcón, Madrid, Spain.
| |
Collapse
|
3
|
Guzman NA, Guzman DE, Blanc T. Advancements in portable instruments based on affinity-capture-migration and affinity-capture-separation for use in clinical testing and life science applications. J Chromatogr A 2023; 1704:464109. [PMID: 37315445 DOI: 10.1016/j.chroma.2023.464109] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Revised: 05/23/2023] [Accepted: 05/25/2023] [Indexed: 06/16/2023]
Abstract
The shift from testing at centralized diagnostic laboratories to remote locations is being driven by the development of point-of-care (POC) instruments and represents a transformative moment in medicine. POC instruments address the need for rapid results that can inform faster therapeutic decisions and interventions. These instruments are especially valuable in the field, such as in an ambulance, or in remote and rural locations. The development of telehealth, enabled by advancements in digital technologies like smartphones and cloud computing, is also aiding in this evolution, allowing medical professionals to provide care remotely, potentially reducing healthcare costs and improving patient longevity. One notable POC device is the lateral flow immunoassay (LFIA), which played a major role in addressing the COVID-19 pandemic due to its ease of use, rapid analysis time, and low cost. However, LFIA tests exhibit relatively low analytical sensitivity and provide semi-quantitative information, indicating either a positive, negative, or inconclusive result, which can be attributed to its one-dimensional format. Immunoaffinity capillary electrophoresis (IACE), on the other hand, offers a two-dimensional format that includes an affinity-capture step of one or more matrix constituents followed by release and electrophoretic separation. The method provides greater analytical sensitivity, and quantitative information, thereby reducing the rate of false positives, false negatives, and inconclusive results. Combining LFIA and IACE technologies can thus provide an effective and economical solution for screening, confirming results, and monitoring patient progress, representing a key strategy in advancing diagnostics in healthcare.
Collapse
Affiliation(s)
- Norberto A Guzman
- Princeton Biochemicals, Inc., Princeton, NJ 08543, United States of America.
| | - Daniel E Guzman
- Princeton Biochemicals, Inc., Princeton, NJ 08543, United States of America; Columbia University Irving Medical Center, New York, NY 10032, United States of America
| | - Timothy Blanc
- Eli Lilly and Company, Branchburg, NJ 08876, United States of America
| |
Collapse
|
4
|
Huang D, Lei F. Temporal group-aware graph diffusion networks for dynamic link prediction. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2023.103292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
|
5
|
Ahmed F, Samantasinghar A, Manzoor Soomro A, Kim S, Hyun Choi K. A systematic review of computational approaches to understand cancer biology for informed drug repurposing. J Biomed Inform 2023; 142:104373. [PMID: 37120047 DOI: 10.1016/j.jbi.2023.104373] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 03/25/2023] [Accepted: 04/23/2023] [Indexed: 05/01/2023]
Abstract
Cancer is the second leading cause of death globally, trailing only heart disease. In the United States alone, 1.9 million new cancer cases and 609,360 deaths were recorded for 2022. Unfortunately, the success rate for new cancer drug development remains less than 10%, making the disease particularly challenging. This low success rate is largely attributed to the complex and poorly understood nature of cancer etiology. Therefore, it is critical to find alternative approaches to understanding cancer biology and developing effective treatments. One such approach is drug repurposing, which offers a shorter drug development timeline and lower costs while increasing the likelihood of success. In this review, we provide a comprehensive analysis of computational approaches for understanding cancer biology, including systems biology, multi-omics, and pathway analysis. Additionally, we examine the use of these methods for drug repurposing in cancer, including the databases and tools that are used for cancer research. Finally, we present case studies of drug repurposing, discussing their limitations and offering recommendations for future research in this area.
Collapse
Affiliation(s)
- Faheem Ahmed
- Department of Mechatronics Engineering, Jeju National University, Republic of Korea
| | | | | | - Sejong Kim
- Department of Internal Medicine, Seoul National University Bundang Hospital, Seongnam, Korea; Department of Internal Medicine, Seoul National University College of Medicine, Seoul, Korea.
| | - Kyung Hyun Choi
- Department of Mechatronics Engineering, Jeju National University, Republic of Korea.
| |
Collapse
|
6
|
Mangione W, Falls Z, Samudrala R. Effective holistic characterization of small molecule effects using heterogeneous biological networks. Front Pharmacol 2023; 14:1113007. [PMID: 37180722 PMCID: PMC10169664 DOI: 10.3389/fphar.2023.1113007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 04/11/2023] [Indexed: 05/16/2023] Open
Abstract
The two most common reasons for attrition in therapeutic clinical trials are efficacy and safety. We integrated heterogeneous data to create a human interactome network to comprehensively describe drug behavior in biological systems, with the goal of accurate therapeutic candidate generation. The Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun multiscale therapeutic discovery, repurposing, and design was enhanced by integrating drug side effects, protein pathways, protein-protein interactions, protein-disease associations, and the Gene Ontology, and complemented with its existing drug/compound, protein, and indication libraries. These integrated networks were reduced to a "multiscale interactomic signature" for each compound that describe its functional behavior as vectors of real values. These signatures are then used for relating compounds to each other with the hypothesis that similar signatures yield similar behavior. Our results indicated that there is significant biological information captured within our networks (particularly via side effects) which enhance the performance of our platform, as evaluated by performing all-against-all leave-one-out drug-indication association benchmarking as well as generating novel drug candidates for colon cancer and migraine disorders corroborated via literature search. Further, drug impacts on pathways derived from computed compound-protein interaction scores served as the features for a random forest machine learning model trained to predict drug-indication associations, with applications to mental disorders and cancer metastasis highlighted. This interactomic pipeline highlights the ability of Computational Analysis of Novel Drug Opportunities to accurately relate drugs in a multitarget and multiscale context, particularly for generating putative drug candidates using the information gleaned from indirect data such as side effect profiles and protein pathway information.
Collapse
Affiliation(s)
| | | | - Ram Samudrala
- Jacobs School of Medicine and Biomedical Sciences, Department of Biomedical Informatics, University at Buffalo, Buffalo, NY, United States
| |
Collapse
|
7
|
Muniyappan S, Rayan AXA, Varrieth GT. DTiGNN: Learning drug-target embedding from a heterogeneous biological network based on a two-level attention-based graph neural network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:9530-9571. [PMID: 37161255 DOI: 10.3934/mbe.2023419] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
MOTIVATION In vitro experiment-based drug-target interaction (DTI) exploration demands more human, financial and data resources. In silico approaches have been recommended for predicting DTIs to reduce time and cost. During the drug development process, one can analyze the therapeutic effect of the drug for a particular disease by identifying how the drug binds to the target for treating that disease. Hence, DTI plays a major role in drug discovery. Many computational methods have been developed for DTI prediction. However, the existing methods have limitations in terms of capturing the interactions via multiple semantics between drug and target nodes in a heterogeneous biological network (HBN). METHODS In this paper, we propose a DTiGNN framework for identifying unknown drug-target pairs. The DTiGNN first calculates the similarity between the drug and target from multiple perspectives. Then, the features of drugs and targets from each perspective are learned separately by using a novel method termed an information entropy-based random walk. Next, all of the learned features from different perspectives are integrated into a single drug and target similarity network by using a multi-view convolutional neural network. Using the integrated similarity networks, drug interactions, drug-disease associations, protein interactions and protein-disease association, the HBN is constructed. Next, a novel embedding algorithm called a meta-graph guided graph neural network is used to learn the embedding of drugs and targets. Then, a convolutional neural network is employed to infer new DTIs after balancing the sample using oversampling techniques. RESULTS The DTiGNN is applied to various datasets, and the result shows better performance in terms of the area under receiver operating characteristic curve (AUC) and area under precision-recall curve (AUPR), with scores of 0.98 and 0.99, respectively. There are 23,739 newly predicted DTI pairs in total.
Collapse
Affiliation(s)
- Saranya Muniyappan
- Computer Science and Engineering, CEG Campus, Anna University, Tamil Nadu, India
| | | | | |
Collapse
|
8
|
Jang YH, Han J, Kim J, Kim W, Woo KS, Kim J, Hwang CS. Graph Analysis with Multifunctional Self-Rectifying Memristive Crossbar Array. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2023; 35:e2209503. [PMID: 36495559 DOI: 10.1002/adma.202209503] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 12/06/2022] [Indexed: 06/17/2023]
Abstract
Many big data have interconnected and dynamic graph structures growing over time. Analyzing these graphical data requires the hidden relationship between the nodes in the graphs to be identified, which has conventionally been achieved by finding the effective similarity. However, graphs are generally non-Euclidean, which does not allow finding it. In this study, the non-Euclidean graphs are mapped to a specific crossbar array (CBA) composed of self-rectifying memristors and metal cells at the diagonal positions. The sneak current, an intrinsic physical property in the CBA, allows for the identification of the similarity function. The sneak-current-based similarity function indicates the distance between the nodes, which can be used to predict the probability that unconnected nodes will be connected in the future, connectivity between communities, and neural connections in a brain. When all bit lines of the CBA are connected to the ground, the sneak current is suppressed, and the CBA can be used to search for adjacent nodes. This work demonstrates the physical calculation methods applied to various graphical problems using the CBA composed of the self-rectifying memristor based on the HfO2 switching layer. Moreover, such applications suffer less from the memristors' inherent issues related to their stochastic nature.
Collapse
Affiliation(s)
- Yoon Ho Jang
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Janguk Han
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Jihun Kim
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Woohyun Kim
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Kyung Seok Woo
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Jaehyun Kim
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Cheol Seong Hwang
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| |
Collapse
|
9
|
Zhang C, Li Q, Lei Y, Qian M, Shen X, Cheng D, Yu W. The Absence of a Weak-Tie Effect When Predicting Large-Weight Links in Complex Networks. ENTROPY (BASEL, SWITZERLAND) 2023; 25:422. [PMID: 36981311 PMCID: PMC10047936 DOI: 10.3390/e25030422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 02/22/2023] [Accepted: 02/23/2023] [Indexed: 06/18/2023]
Abstract
Link prediction is a hot issue in information filtering. Link prediction algorithms, based on local similarity indices, are widely used in many fields due to their high efficiency and high prediction accuracy. However, most existing link prediction algorithms are available for unweighted networks, and there are relatively few studies for weighted networks. In the previous studies on weighted networks, some scholars pointed out that links with small weights play a more important role in link prediction and emphasized that weak-ties theory has a significant impact on prediction accuracy. On this basis, we studied the edges with different weights, and we discovered that, for edges with large weights, this weak-ties theory actually does not work; Instead, the weak-ties theory works in the prediction of edges with small weights. Our discovery has instructive implications for link predictions in weighted networks.
Collapse
Affiliation(s)
- Chengjun Zhang
- School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, China
- Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CI-CAEET), Nanjing University of Information Science and Technology, Nanjing 210044, China
- Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Qi Li
- School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, China
- Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CI-CAEET), Nanjing University of Information Science and Technology, Nanjing 210044, China
- Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Yi Lei
- School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, China
- Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CI-CAEET), Nanjing University of Information Science and Technology, Nanjing 210044, China
- Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Ming Qian
- School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, China
- Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CI-CAEET), Nanjing University of Information Science and Technology, Nanjing 210044, China
- Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Xinyu Shen
- School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, China
- Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CI-CAEET), Nanjing University of Information Science and Technology, Nanjing 210044, China
- Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Di Cheng
- School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, China
- Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CI-CAEET), Nanjing University of Information Science and Technology, Nanjing 210044, China
- Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Wenbin Yu
- School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, China
- Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CI-CAEET), Nanjing University of Information Science and Technology, Nanjing 210044, China
- Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science and Technology, Nanjing 210044, China
| |
Collapse
|
10
|
Rivas-Barragan D, Domingo-Fernández D, Gadiya Y, Healey D. Ensembles of knowledge graph embedding models improve predictions for drug discovery. Brief Bioinform 2022; 23:6831005. [PMID: 36384050 PMCID: PMC9677479 DOI: 10.1093/bib/bbac481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 10/04/2022] [Accepted: 10/08/2022] [Indexed: 11/18/2022] Open
Abstract
Recent advances in Knowledge Graphs (KGs) and Knowledge Graph Embedding Models (KGEMs) have led to their adoption in a broad range of fields and applications. The current publishing system in machine learning requires newly introduced KGEMs to achieve state-of-the-art performance, surpassing at least one benchmark in order to be published. Despite this, dozens of novel architectures are published every year, making it challenging for users, even within the field, to deduce the most suitable configuration for a given application. A typical biomedical application of KGEMs is drug-disease prediction in the context of drug discovery, in which a KGEM is trained to predict triples linking drugs and diseases. These predictions can be later tested in clinical trials following extensive experimental validation. However, given the infeasibility of evaluating each of these predictions and that only a minimal number of candidates can be experimentally tested, models that yield higher precision on the top prioritized triples are preferred. In this paper, we apply the concept of ensemble learning on KGEMs for drug discovery to assess whether combining the predictions of several models can lead to an overall improvement in predictive performance. First, we trained and benchmarked 10 KGEMs to predict drug-disease triples on two independent biomedical KGs designed for drug discovery. Following, we applied different ensemble methods that aggregate the predictions of these models by leveraging the distribution or the position of the predicted triple scores. We then demonstrate how the ensemble models can achieve better results than the original KGEMs by benchmarking the precision (i.e., number of true positives prioritized) of their top predictions. Lastly, we released the source code presented in this work at https://github.com/enveda/kgem-ensembles-in-drug-discovery.
Collapse
Affiliation(s)
| | - Daniel Domingo-Fernández
- Corresponding author: Daniel Domingo-Fernández, Department of Data Science. Enveda Biosciences, Boulder, CO, USA. E-mail:
| | | | | |
Collapse
|
11
|
Al Musawi AF, Roy S, Ghosh P. Identifying accurate link predictors based on assortativity of complex networks. Sci Rep 2022; 12:18107. [PMID: 36302826 PMCID: PMC9613685 DOI: 10.1038/s41598-022-22843-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 10/20/2022] [Indexed: 12/30/2022] Open
Abstract
Link prediction algorithms in complex networks, such as social networks, biological networks, drug-drug interactions, communication networks, and so on, assign scores to predict potential links between two nodes. Link prediction (LP) enables researchers to learn unknown, new as well as future interactions among the entities being modeled in the complex networks. In addition to measures like degree distribution, clustering coefficient, centrality, etc., another metric to characterize structural properties is network assortativity which measures the tendency of nodes to connect with similar nodes. In this paper, we explore metrics that effectively predict the links based on the assortativity profiles of the complex networks. To this end, we first propose an approach that generates networks of varying assortativity levels and utilize three sets of link prediction models combining the similarity of neighborhoods and preferential attachment. We carry out experiments to study the LP accuracy (measured in terms of area under the precision-recall curve) of the link predictors individually and in combination with other baseline measures. Our analysis shows that link prediction models that explore a large neighborhood around nodes of interest, such as CH2-L2 and CH2-L3, perform consistently for assortative as well as disassortative networks. While common neighbor-based local measures are effective for assortative networks, our proposed combination of common neighbors with node degree is a good choice for the LP metric in disassortative networks. We discuss how this analysis helps achieve the best-parameterized combination of link prediction models and its significance in the context of link prediction from incomplete social and biological network data.
Collapse
Affiliation(s)
- Ahmad F. Al Musawi
- Department of Information Technology, University of Thi Qar, Thi Qar, Iraq ,grid.224260.00000 0004 0458 8737Department of Computer Science, Virginia Commonwealth University, Richmond, VA USA
| | - Satyaki Roy
- grid.410711.20000 0001 1034 1720Department of Genetics, University of North Carolina, Chapel Hill, NC USA
| | - Preetam Ghosh
- grid.224260.00000 0004 0458 8737Department of Computer Science, Virginia Commonwealth University, Richmond, VA USA
| |
Collapse
|
12
|
GFCNet: Utilizing graph feature collection networks for coronavirus knowledge graph embeddings. Inf Sci (N Y) 2022; 608:1557-1571. [PMID: 35855405 PMCID: PMC9279179 DOI: 10.1016/j.ins.2022.07.031] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 04/04/2022] [Accepted: 07/03/2022] [Indexed: 01/25/2023]
Abstract
In response to fighting COVID-19 pandemic, researchers in machine learning and artificial intelligence have constructed some medical knowledge graphs (KG) based on existing COVID-19 datasets, however, these KGs contain a considerable amount of semantic relations which are incomplete or missing. In this paper, we focus on the task of knowledge graph embedding (KGE), which serves an important solution to infer the missing relations. In the past, there have been a collection of knowledge graph embedding models with different scoring functions to learn entity and relation embeddings published. However, these models share the same problems of rarely taking important features of KG like attribute features, other than relation triples, into account, while dealing with the heterogeneous, complex and incomplete COVID-19 medical data. To address the above issue, we propose a graph feature collection network (GFCNet) for COVID-19 KGE task, which considers both neighbor and attribute features in KGs. The extensive experiments conducted on the COVID-19 drug KG dataset show promising results and prove the effectiveness and efficiency of our proposed model. In addition, we also explain the future directions of deepening the study on COVID-19 KGE task.
Collapse
|
13
|
Allegri SA, McCoy K, Mitchell CS. CompositeView: A Network-Based Visualization Tool. BIG DATA AND COGNITIVE COMPUTING 2022; 6. [PMID: 35847767 PMCID: PMC9281616 DOI: 10.3390/bdcc6020066] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Large networks are quintessential to bioinformatics, knowledge graphs, social network analysis, and graph-based learning. CompositeView is a Python-based open-source application that improves interactive complex network visualization and extraction of actionable insight. CompositeView utilizes specifically formatted input data to calculate composite scores and display them using the Cytoscape component of Dash. Composite scores are defined representations of smaller sets of conceptually similar data that, when combined, generate a single score to reduce information overload. Visualized interactive results are user-refined via filtering elements such as node value and edge weight sliders and graph manipulation options (e.g., node color and layout spread). The primary difference between CompositeView and other network visualization tools is its ability to auto-calculate and auto-update composite scores as the user interactively filters or aggregates data. CompositeView was developed to visualize network relevance rankings, but it performs well with non-network data. Three disparate CompositeView use cases are shown: relevance rankings from SemNet 2.0, an open-source knowledge graph relationship ranking software for biomedical literature-based discovery; Human Development Index (HDI) data; and the Framingham cardiovascular study. CompositeView was stress tested to construct reference benchmarks that define breadth and size of data effectively visualized. Finally, CompositeView is compared to Excel, Tableau, Cytoscape, neo4j, NodeXL, and Gephi.
Collapse
Affiliation(s)
- Stephen A. Allegri
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Kevin McCoy
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Cassie S. Mitchell
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Machine Learning Center at Georgia Tech, Georgia Institute of Technology, Atlanta, GA 30332, USA
- Correspondence:
| |
Collapse
|
14
|
Ott S, Barbosa-Silva A, Samwald M. LinkExplorer: predicting, explaining and exploring links in large biomedical knowledge graphs. Bioinformatics 2022; 38:2371-2373. [PMID: 35139158 DOI: 10.1093/bioinformatics/btac068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 01/26/2022] [Accepted: 02/02/2022] [Indexed: 02/03/2023] Open
Abstract
SUMMARY Machine learning algorithms for link prediction can be valuable tools for hypothesis generation. However, many current algorithms are black boxes or lack good user interfaces that could facilitate insight into why predictions are made. We present LinkExplorer, a software suite for predicting, explaining and exploring links in large biomedical knowledge graphs. LinkExplorer integrates our novel, rule-based link prediction engine SAFRAN, which was recently shown to outcompete other explainable algorithms and established black-box algorithms. Here, we demonstrate highly competitive evaluation results of our algorithm on multiple large biomedical knowledge graphs, and release a web interface that allows for interactive and intuitive exploration of predicted links and their explanations. AVAILABILITY AND IMPLEMENTATION A publicly hosted instance, source code and further documentation can be found at https://github.com/OpenBioLink/Explorer. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Simon Ott
- Institute of Artificial Intelligence, Medical University of Vienna, 1090 Vienna, Austria
| | - Adriano Barbosa-Silva
- Institute of Artificial Intelligence, Medical University of Vienna, 1090 Vienna, Austria
| | - Matthias Samwald
- Institute of Artificial Intelligence, Medical University of Vienna, 1090 Vienna, Austria
| |
Collapse
|
15
|
Detection of Target Genes for Drug Repurposing to Treat Skeletal Muscle Atrophy in Mice Flown in Spaceflight. Genes (Basel) 2022; 13:genes13030473. [PMID: 35328027 PMCID: PMC8953707 DOI: 10.3390/genes13030473] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Revised: 02/25/2022] [Accepted: 03/03/2022] [Indexed: 12/13/2022] Open
Abstract
Skeletal muscle atrophy is a common condition in aging, diabetes, and in long duration spaceflights due to microgravity. This article investigates multi-modal gene disease and disease drug networks via link prediction algorithms to select drugs for repurposing to treat skeletal muscle atrophy. Key target genes that cause muscle atrophy in the left and right extensor digitorum longus muscle tissue, gastrocnemius, quadriceps, and the left and right soleus muscles are detected using graph theoretic network analysis, by mining the transcriptomic datasets collected from mice flown in spaceflight made available by GeneLab. We identified the top muscle atrophy gene regulators by the Pearson correlation and Bayesian Markov blanket method. The gene disease knowledge graph was constructed using the scalable precision medicine knowledge engine. We computed node embeddings, random walk measures from the networks. Graph convolutional networks, graph neural networks, random forest, and gradient boosting methods were trained using the embeddings, network features for predicting links and ranking top gene-disease associations for skeletal muscle atrophy. Drugs were selected and a disease drug knowledge graph was constructed. Link prediction methods were applied to the disease drug networks to identify top ranked drugs for therapeutic treatment of skeletal muscle atrophy. The graph convolution network performs best in link prediction based on receiver operating characteristic curves and prediction accuracies. The key genes involved in skeletal muscle atrophy are associated with metabolic and neurodegenerative diseases. The drugs selected for repurposing using the graph convolution network method were nutrients, corticosteroids, anti-inflammatory medications, and others related to insulin.
Collapse
|
16
|
Domingo-Fernández D, Gadiya Y, Patel A, Mubeen S, Rivas-Barragan D, Diana CW, Misra BB, Healey D, Rokicki J, Colluru V. Causal reasoning over knowledge graphs leveraging drug-perturbed and disease-specific transcriptomic signatures for drug discovery. PLoS Comput Biol 2022; 18:e1009909. [PMID: 35213534 PMCID: PMC8906585 DOI: 10.1371/journal.pcbi.1009909] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 03/09/2022] [Accepted: 02/09/2022] [Indexed: 12/29/2022] Open
Abstract
Network-based approaches are becoming increasingly popular for drug discovery as they provide a systems-level overview of the mechanisms underlying disease pathophysiology. They have demonstrated significant early promise over other methods of biological data representation, such as in target discovery, side effect prediction and drug repurposing. In parallel, an explosion of -omics data for the deep characterization of biological systems routinely uncovers molecular signatures of disease for similar applications. Here, we present RPath, a novel algorithm that prioritizes drugs for a given disease by reasoning over causal paths in a knowledge graph (KG), guided by both drug-perturbed as well as disease-specific transcriptomic signatures. First, our approach identifies the causal paths that connect a drug to a particular disease. Next, it reasons over these paths to identify those that correlate with the transcriptional signatures observed in a drug-perturbation experiment, and anti-correlate to signatures observed in the disease of interest. The paths which match this signature profile are then proposed to represent the mechanism of action of the drug. We demonstrate how RPath consistently prioritizes clinically investigated drug-disease pairs on multiple datasets and KGs, achieving better performance over other similar methodologies. Furthermore, we present two case studies showing how one can deconvolute the predictions made by RPath as well as predict novel targets.
Collapse
Affiliation(s)
| | - Yojana Gadiya
- Enveda Biosciences, Boulder, Colorado, United States of America
| | - Abhishek Patel
- Enveda Biosciences, Boulder, Colorado, United States of America
| | - Sarah Mubeen
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | | | - Chris W. Diana
- Enveda Biosciences, Boulder, Colorado, United States of America
| | | | - David Healey
- Enveda Biosciences, Boulder, Colorado, United States of America
| | - Joe Rokicki
- Enveda Biosciences, Boulder, Colorado, United States of America
| | - Viswa Colluru
- Enveda Biosciences, Boulder, Colorado, United States of America
| |
Collapse
|
17
|
Li Z, Zhong Q, Yang J, Duan Y, Wang W, Wu C, He K. DeepKG: an end-to-end deep learning-based workflow for biomedical knowledge graph extraction, optimization and applications. Bioinformatics 2021; 38:1477-1479. [PMID: 34788369 PMCID: PMC8689937 DOI: 10.1093/bioinformatics/btab767] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 10/11/2021] [Accepted: 11/01/2021] [Indexed: 01/05/2023] Open
Abstract
SUMMARY DeepKG is an end-to-end deep learning-based workflow that helps researchers automatically mine valuable knowledge in biomedical literature. Users can utilize it to establish customized knowledge graphs in specified domains, thus facilitating in-depth understanding on disease mechanisms and applications on drug repurposing and clinical research. To improve the performance of DeepKG, a cascaded hybrid information extraction framework is developed for training model of 3-tuple extraction, and a novel AutoML-based knowledge representation algorithm (AutoTransX) is proposed for knowledge representation and inference. The system has been deployed in dozens of hospitals and extensive experiments strongly evidence the effectiveness. In the context of 144 900 COVID-19 scholarly full-text literature, DeepKG generates a high-quality knowledge graph with 7980 entities and 43 760 3-tuples, a candidate drug list, and relevant animal experimental studies are being carried out. To accelerate more studies, we make DeepKG publicly available and provide an online tool including the data of 3-tuples, potential drug list, question answering system, visualization platform. AVAILABILITY AND IMPLEMENTATION All the results are publicly available at the website (http://covidkg.ai/). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zongren Li
- Medical Big Data Research Center, Chinese PLA General Hospital, Beijing 100039, China,Medical Artificial Intelligence Research Center, Chinese PLA General Hospital, Beijing 100853, China
| | - Qin Zhong
- The Medical School of Chinese PLA, Chinese PLA General Hospital, Beijing 100039, China
| | - Jing Yang
- The Medical School of Chinese PLA, Chinese PLA General Hospital, Beijing 100039, China
| | - Yongjie Duan
- The Medical School of Chinese PLA, Chinese PLA General Hospital, Beijing 100039, China
| | - Wenjun Wang
- Bio-engineering Research Center, Chinese PLA General Hospital, Beijing 100039, China
| | - Chengkun Wu
- State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, Hunan, Changsha, 410073, China,To whom correspondence should be addressed. E-mail: or
| | - Kunlun He
- Medical Big Data Research Center, Chinese PLA General Hospital, Beijing 100039, China,To whom correspondence should be addressed. E-mail: or
| |
Collapse
|
18
|
Manian V, Orozco-Sandoval J, Diaz-Martinez V. An Integrative Network Science and Artificial Intelligence Drug Repurposing Approach for Muscle Atrophy in Spaceflight Microgravity. Front Cell Dev Biol 2021; 9:732370. [PMID: 34604234 PMCID: PMC8481783 DOI: 10.3389/fcell.2021.732370] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 08/12/2021] [Indexed: 12/19/2022] Open
Abstract
Muscle atrophy is a side effect of several terrestrial diseases which also affects astronauts severely in space missions due to the reduced gravity in spaceflight. An integrative graph-theoretic network-based drug repurposing methodology quantifying the interplay of key gene regulations and protein-protein interactions in muscle atrophy conditions is presented. Transcriptomic datasets from mice in spaceflight from GeneLab have been extensively mined to extract the key genes that cause muscle atrophy in organ muscle tissues such as the thymus, liver, and spleen. Top muscle atrophy gene regulators are selected by Bayesian Markov blanket method and gene-disease knowledge graph is constructed using the scalable precision medicine knowledge engine. A deep graph neural network is trained for predicting links in the network. The top ranked diseases are identified and drugs are selected for repurposing using drug bank resource. A disease drug knowledge graph is constructed and the graph neural network is trained for predicting new drugs. The results are compared with machine learning methods such as random forest, and gradient boosting classifiers. Network measure based methods shows that preferential attachment has good performance for link prediction in both the gene-disease and disease-drug graphs. The receiver operating characteristic curves, and prediction accuracies for each method show that the random walk similarity measure and deep graph neural network outperforms the other methods. Several key target genes identified by the graph neural network are associated with diseases such as cancer, diabetes, and neural disorders. The novel link prediction approach applied to the disease drug knowledge graph identifies the Monoclonal Antibodies drug therapy as suitable candidate for drug repurposing for spaceflight induced microgravity. There are a total of 21 drugs identified as possible candidates for treating muscle atrophy. Graph neural network is a promising deep learning architecture for link prediction from gene-disease, and disease-drug networks.
Collapse
Affiliation(s)
- Vidya Manian
- Laboratory for Applied Remote Sensing, Imaging, and Photonics, Department of Electrical and Computer Engineering, University of Puerto Rico, Mayaguez, PR, United States
| | | | | |
Collapse
|