1
|
Tyagin I, Safro I. Dyport: dynamic importance-based biomedical hypothesis generation benchmarking technique. BMC Bioinformatics 2024; 25:213. [PMID: 38872097 PMCID: PMC11177514 DOI: 10.1186/s12859-024-05812-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 05/16/2024] [Indexed: 06/15/2024] Open
Abstract
BACKGROUND Automated hypothesis generation (HG) focuses on uncovering hidden connections within the extensive information that is publicly available. This domain has become increasingly popular, thanks to modern machine learning algorithms. However, the automated evaluation of HG systems is still an open problem, especially on a larger scale. RESULTS This paper presents a novel benchmarking framework Dyport for evaluating biomedical hypothesis generation systems. Utilizing curated datasets, our approach tests these systems under realistic conditions, enhancing the relevance of our evaluations. We integrate knowledge from the curated databases into a dynamic graph, accompanied by a method to quantify discovery importance. This not only assesses hypotheses accuracy but also their potential impact in biomedical research which significantly extends traditional link prediction benchmarks. Applicability of our benchmarking process is demonstrated on several link prediction systems applied on biomedical semantic knowledge graphs. Being flexible, our benchmarking system is designed for broad application in hypothesis generation quality verification, aiming to expand the scope of scientific discovery within the biomedical research community. CONCLUSIONS Dyport is an open-source benchmarking framework designed for biomedical hypothesis generation systems evaluation, which takes into account knowledge dynamics, semantics and impact. All code and datasets are available at: https://github.com/IlyaTyagin/Dyport .
Collapse
Affiliation(s)
- Ilya Tyagin
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19713, USA.
| | - Ilya Safro
- Department of Computer and Information Sciences, University of Delaware, Newark, DE, 19716, USA.
| |
Collapse
|
2
|
Patidar K, Deng JH, Mitchell CS, Ford Versypt AN. Cross-Domain Text Mining of Pathophysiological Processes Associated with Diabetic Kidney Disease. Int J Mol Sci 2024; 25:4503. [PMID: 38674089 PMCID: PMC11050166 DOI: 10.3390/ijms25084503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 04/16/2024] [Accepted: 04/17/2024] [Indexed: 04/28/2024] Open
Abstract
Diabetic kidney disease (DKD) is the leading cause of end-stage renal disease worldwide. This study's goal was to identify the signaling drivers and pathways that modulate glomerular endothelial dysfunction in DKD via artificial intelligence-enabled literature-based discovery. Cross-domain text mining of 33+ million PubMed articles was performed with SemNet 2.0 to identify and rank multi-scalar and multi-factorial pathophysiological concepts related to DKD. A set of identified relevant genes and proteins that regulate different pathological events associated with DKD were analyzed and ranked using normalized mean HeteSim scores. High-ranking genes and proteins intersected three domains-DKD, the immune response, and glomerular endothelial cells. The top 10% of ranked concepts were mapped to the following biological functions: angiogenesis, apoptotic processes, cell adhesion, chemotaxis, growth factor signaling, vascular permeability, the nitric oxide response, oxidative stress, the cytokine response, macrophage signaling, NFκB factor activity, the TLR pathway, glucose metabolism, the inflammatory response, the ERK/MAPK signaling response, the JAK/STAT pathway, the T-cell-mediated response, the WNT/β-catenin pathway, the renin-angiotensin system, and NADPH oxidase activity. High-ranking genes and proteins were used to generate a protein-protein interaction network. The study results prioritized interactions or molecules involved in dysregulated signaling in DKD, which can be further assessed through biochemical network models or experiments.
Collapse
Affiliation(s)
- Krutika Patidar
- Department of Chemical and Biological Engineering, University at Buffalo, Buffalo, NY 14260, USA
| | - Jennifer H. Deng
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Cassie S. Mitchell
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Center for Machine Learning at Georgia Tech, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Ashlee N. Ford Versypt
- Department of Chemical and Biological Engineering, University at Buffalo, Buffalo, NY 14260, USA
- Department of Biomedical Engineering, University at Buffalo, Buffalo, NY 14260, USA
- Institute for Artificial Intelligence and Data Science, University at Buffalo, Buffalo, NY 14260, USA
| |
Collapse
|
3
|
Al-Hussaini I, White B, Varmeziar A, Mehra N, Sanchez M, Lee J, DeGroote NP, Miller TP, Mitchell CS. An Interpretable Machine Learning Framework for Rare Disease: A Case Study to Stratify Infection Risk in Pediatric Leukemia. J Clin Med 2024; 13:1788. [PMID: 38542012 PMCID: PMC10970787 DOI: 10.3390/jcm13061788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 03/08/2024] [Accepted: 03/12/2024] [Indexed: 04/18/2024] Open
Abstract
Background: Datasets on rare diseases, like pediatric acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL), have small sample sizes that hinder machine learning (ML). The objective was to develop an interpretable ML framework to elucidate actionable insights from small tabular rare disease datasets. Methods: The comprehensive framework employed optimized data imputation and sampling, supervised and unsupervised learning, and literature-based discovery (LBD). The framework was deployed to assess treatment-related infection in pediatric AML and ALL. Results: An interpretable decision tree classified the risk of infection as either "high risk" or "low risk" in pediatric ALL (n = 580) and AML (n = 132) with accuracy of ∼79%. Interpretable regression models predicted the discrete number of developed infections with a mean absolute error (MAE) of 2.26 for bacterial infections and an MAE of 1.29 for viral infections. Features that best explained the development of infection were the chemotherapy regimen, cancer cells in the central nervous system at initial diagnosis, chemotherapy course, leukemia type, Down syndrome, race, and National Cancer Institute risk classification. Finally, SemNet 2.0, an open-source LBD software that links relationships from 33+ million PubMed articles, identified additional features for the prediction of infection, like glucose, iron, neutropenia-reducing growth factors, and systemic lupus erythematosus (SLE). Conclusions: The developed ML framework enabled state-of-the-art, interpretable predictions using rare disease tabular datasets. ML model performance baselines were successfully produced to predict infection in pediatric AML and ALL.
Collapse
Affiliation(s)
- Irfan Al-Hussaini
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Brandon White
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Armon Varmeziar
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Nidhi Mehra
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Milagro Sanchez
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Judy Lee
- Aflac Cancer and Blood Disorders Center, Children’s Healthcare of Atlanta, Atlanta, GA 30322, USA (T.P.M.)
| | - Nicholas P. DeGroote
- Aflac Cancer and Blood Disorders Center, Children’s Healthcare of Atlanta, Atlanta, GA 30322, USA (T.P.M.)
| | - Tamara P. Miller
- Aflac Cancer and Blood Disorders Center, Children’s Healthcare of Atlanta, Atlanta, GA 30322, USA (T.P.M.)
- Department of Pediatrics, Division of Pediatric Hematology/Oncology, Emory University, Atlanta, GA 30332, USA
| | - Cassie S. Mitchell
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Machine Learning Center at Georgia Tech, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
4
|
Kartchner D, McCoy K, Dubey J, Zhang D, Zheng K, Umrani R, Kim JJ, Mitchell CS. Literature-Based Discovery to Elucidate the Biological Links between Resistant Hypertension and COVID-19. BIOLOGY 2023; 12:1269. [PMID: 37759668 PMCID: PMC10526006 DOI: 10.3390/biology12091269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 09/11/2023] [Accepted: 09/18/2023] [Indexed: 09/29/2023]
Abstract
Multiple studies have reported new or exacerbated persistent or resistant hypertension in patients previously infected with COVID-19. We used literature-based discovery to identify and prioritize multi-scalar explanatory biology that relates resistant hypertension to COVID-19. Cross-domain text mining of 33+ million PubMed articles within a comprehensive knowledge graph was performed using SemNet 2.0. Unsupervised rank aggregation determined which concepts were most relevant utilizing the normalized HeteSim score. A series of simulations identified concepts directly related to COVID-19 and resistant hypertension or connected via one of three renin-angiotensin-aldosterone system hub nodes (mineralocorticoid receptor, epithelial sodium channel, angiotensin I receptor). The top-ranking concepts relating COVID-19 to resistant hypertension included: cGMP-dependent protein kinase II, MAP3K1, haspin, ral guanine nucleotide exchange factor, N-(3-Oxododecanoyl)-L-homoserine lactone, aspartic endopeptidases, metabotropic glutamate receptors, choline-phosphate cytidylyltransferase, protein tyrosine phosphatase, tat genes, MAP3K10, uridine kinase, dicer enzyme, CMD1B, USP17L2, FLNA, exportin 5, somatotropin releasing hormone, beta-melanocyte stimulating hormone, pegylated leptin, beta-lipoprotein, corticotropin, growth hormone-releasing peptide 2, pro-opiomelanocortin, alpha-melanocyte stimulating hormone, prolactin, thyroid hormone, poly-beta-hydroxybutyrate depolymerase, CR 1392, BCR-ABL fusion gene, high density lipoprotein sphingomyelin, pregnancy-associated murine protein 1, recQ4 helicase, immunoglobulin heavy chain variable domain, aglycotransferrin, host cell factor C1, ATP6V0D1, imipramine demethylase, TRIM40, H3C2 gene, COL1A1+COL1A2 gene, QARS gene, VPS54, TPM2, MPST, EXOSC2, ribosomal protein S10, TAP-144, gonadotropins, human gonadotropin releasing hormone 1, beta-lipotropin, octreotide, salmon calcitonin, des-n-octanoyl ghrelin, liraglutide, gastrins. Concepts were mapped to six physiological themes: altered endocrine function, 23.1%; inflammation or cytokine storm, 21.3%; lipid metabolism and atherosclerosis, 17.6%; sympathetic input to blood pressure regulation, 16.7%; altered entry of COVID-19 virus, 14.8%; and unknown, 6.5%.
Collapse
Affiliation(s)
- David Kartchner
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Kevin McCoy
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Janhvi Dubey
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Dongyu Zhang
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Kevin Zheng
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Rushda Umrani
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - James J. Kim
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Cassie S. Mitchell
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Center for Machine Learning at Georgia Tech, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
5
|
Tandra G, Yoone A, Mathew R, Wang M, Hales CM, Mitchell CS. Literature-Based Discovery Predicts Antihistamines Are a Promising Repurposed Adjuvant Therapy for Parkinson's Disease. Int J Mol Sci 2023; 24:12339. [PMID: 37569714 PMCID: PMC10418861 DOI: 10.3390/ijms241512339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 07/24/2023] [Accepted: 07/31/2023] [Indexed: 08/13/2023] Open
Abstract
Parkinson's disease (PD) is a movement disorder caused by a dopamine deficit in the brain. Current therapies primarily focus on dopamine modulators or replacements, such as levodopa. Although dopamine replacement can help alleviate PD symptoms, therapies targeting the underlying neurodegenerative process are limited. The study objective was to use artificial intelligence to rank the most promising repurposed drug candidates for PD. Natural language processing (NLP) techniques were used to extract text relationships from 33+ million biomedical journal articles from PubMed and map relationships between genes, proteins, drugs, diseases, etc., into a knowledge graph. Cross-domain text mining, hub network analysis, and unsupervised learning rank aggregation were performed in SemNet 2.0 to predict the most relevant drug candidates to levodopa and PD using relevance-based HeteSim scores. The top predicted adjuvant PD therapies included ebastine, an antihistamine for perennial allergic rhinitis; levocetirizine, another antihistamine; vancomycin, a powerful antibiotic; captopril, an angiotensin-converting enzyme (ACE) inhibitor; and neramexane, an N-methyl-D-aspartate (NMDA) receptor agonist. Cross-domain text mining predicted that antihistamines exhibit the capacity to synergistically alleviate Parkinsonian symptoms when used with dopamine modulators like levodopa or levodopa-carbidopa. The relationship patterns among the identified adjuvant candidates suggest that the likely therapeutic mechanism(s) of action of antihistamines for combatting the multi-factorial PD pathology include counteracting oxidative stress, amending the balance of neurotransmitters, and decreasing the proliferation of inflammatory mediators. Finally, cross-domain text mining interestingly predicted a strong relationship between PD and liver disease.
Collapse
Affiliation(s)
- Gabriella Tandra
- Laboratory for Pathology Dynamics, Georgia Institute of Technology, Atlanta, GA 30332, USA
- Neural Engineering Center, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Amy Yoone
- Laboratory for Pathology Dynamics, Georgia Institute of Technology, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology, Emory University, Atlanta, GA 30332, USA
| | - Rhea Mathew
- Laboratory for Pathology Dynamics, Georgia Institute of Technology, Atlanta, GA 30332, USA
- College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Minzhi Wang
- Laboratory for Pathology Dynamics, Georgia Institute of Technology, Atlanta, GA 30332, USA
- Neural Engineering Center, Georgia Institute of Technology, Atlanta, GA 30332, USA
- College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Chadwick M. Hales
- Department of Neurology and Center for Neurodegenerative Disease, Emory University School of Medicine, Atlanta, GA 30322, USA;
| | - Cassie S. Mitchell
- Laboratory for Pathology Dynamics, Georgia Institute of Technology, Atlanta, GA 30332, USA
- Neural Engineering Center, Georgia Institute of Technology, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology, Emory University, Atlanta, GA 30332, USA
- Machine Learning Center at Georgia Tech, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
6
|
Ren ZH, You ZH, Zou Q, Yu CQ, Ma YF, Guan YJ, You HR, Wang XF, Pan J. DeepMPF: deep learning framework for predicting drug-target interactions based on multi-modal representation with meta-path semantic analysis. J Transl Med 2023; 21:48. [PMID: 36698208 PMCID: PMC9876420 DOI: 10.1186/s12967-023-03876-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 01/05/2023] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Drug-target interaction (DTI) prediction has become a crucial prerequisite in drug design and drug discovery. However, the traditional biological experiment is time-consuming and expensive, as there are abundant complex interactions present in the large size of genomic and chemical spaces. For alleviating this phenomenon, plenty of computational methods are conducted to effectively complement biological experiments and narrow the search spaces into a preferred candidate domain. Whereas, most of the previous approaches cannot fully consider association behavior semantic information based on several schemas to represent complex the structure of heterogeneous biological networks. Additionally, the prediction of DTI based on single modalities cannot satisfy the demand for prediction accuracy. METHODS We propose a multi-modal representation framework of 'DeepMPF' based on meta-path semantic analysis, which effectively utilizes heterogeneous information to predict DTI. Specifically, we first construct protein-drug-disease heterogeneous networks composed of three entities. Then the feature information is obtained under three views, containing sequence modality, heterogeneous structure modality and similarity modality. We proposed six representative schemas of meta-path to preserve the high-order nonlinear structure and catch hidden structural information of the heterogeneous network. Finally, DeepMPF generates highly representative comprehensive feature descriptors and calculates the probability of interaction through joint learning. RESULTS To evaluate the predictive performance of DeepMPF, comparison experiments are conducted on four gold datasets. Our method can obtain competitive performance in all datasets. We also explore the influence of the different feature embedding dimensions, learning strategies and classification methods. Meaningfully, the drug repositioning experiments on COVID-19 and HIV demonstrate DeepMPF can be applied to solve problems in reality and help drug discovery. The further analysis of molecular docking experiments enhances the credibility of the drug candidates predicted by DeepMPF. CONCLUSIONS All the results demonstrate the effectively predictive capability of DeepMPF for drug-target interactions. It can be utilized as a useful tool to prescreen the most potential drug candidates for the protein. The web server of the DeepMPF predictor is freely available at http://120.77.11.78/DeepMPF/ , which can help relevant researchers to further study.
Collapse
Affiliation(s)
- Zhong-Hao Ren
- grid.460132.20000 0004 1758 0275School of Information Engineering, Xijing University, Xi’an, 710100 China
| | - Zhu-Hong You
- grid.440588.50000 0001 0307 1240School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| | - Quan Zou
- grid.54549.390000 0004 0369 4060Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054 China
| | - Chang-Qing Yu
- grid.460132.20000 0004 1758 0275School of Information Engineering, Xijing University, Xi’an, 710100 China
| | - Yan-Fang Ma
- grid.417234.70000 0004 1808 3203Department of Galactophore, The Third People’s Hospital of Gansu Province, Lanzhou, 730020 China
| | - Yong-Jian Guan
- grid.460132.20000 0004 1758 0275School of Information Engineering, Xijing University, Xi’an, 710100 China
| | - Hai-Ru You
- grid.440588.50000 0001 0307 1240School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| | - Xin-Fei Wang
- grid.460132.20000 0004 1758 0275School of Information Engineering, Xijing University, Xi’an, 710100 China
| | - Jie Pan
- grid.460132.20000 0004 1758 0275School of Information Engineering, Xijing University, Xi’an, 710100 China
| |
Collapse
|
7
|
Mehra N, Varmeziar A, Chen X, Kronick O, Fisher R, Kota V, Mitchell CS. Cross-Domain Text Mining to Predict Adverse Events from Tyrosine Kinase Inhibitors for Chronic Myeloid Leukemia. Cancers (Basel) 2022; 14:4686. [PMID: 36230609 PMCID: PMC9563938 DOI: 10.3390/cancers14194686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 09/04/2022] [Accepted: 09/23/2022] [Indexed: 12/02/2022] Open
Abstract
Tyrosine kinase inhibitors (TKIs) are prescribed for chronic myeloid leukemia (CML) and some other cancers. The objective was to predict and rank TKI-related adverse events (AEs), including under-reported or preclinical AEs, using novel text mining. First, k-means clustering of 2575 clinical CML TKI abstracts separated TKIs by significant (p < 0.05) AE type: gastrointestinal (bosutinib); edema (imatinib); pulmonary (dasatinib); diabetes (nilotinib); cardiovascular (ponatinib). Next, we propose a novel cross-domain text mining method utilizing a knowledge graph, link prediction, and hub node network analysis to predict new relationships. Cross-domain text mining of 30+ million articles via SemNet predicted and ranked known and novel TKI AEs. Three physiology-based tiers were formed using unsupervised rank aggregation feature importance. Tier 1 ranked in the top 1%: hematology (anemia, neutropenia, thrombocytopenia, hypocellular marrow); glucose (diabetes, insulin resistance, metabolic syndrome); iron (deficiency, overload, metabolism), cardiovascular (hypertension, heart failure, vascular dilation); thyroid (hypothyroidism, hyperthyroidism, parathyroid). Tier 2 ranked in the top 5%: inflammation (chronic inflammatory disorder, autoimmune, periodontitis); kidney (glomerulonephritis, glomerulopathy, toxic nephropathy). Tier 3 ranked in the top 10%: gastrointestinal (bowel regulation, hepatitis, pancreatitis); neuromuscular (autonomia, neuropathy, muscle pain); others (secondary cancers, vitamin deficiency, edema). Results suggest proactive TKI patient AE surveillance levels: regular surveillance for tier 1, infrequent surveillance for tier 2, and symptom-based surveillance for tier 3.
Collapse
Affiliation(s)
- Nidhi Mehra
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology, Emory University School of Medicine, Atlanta, GA 30332, USA
| | - Armon Varmeziar
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology, Emory University School of Medicine, Atlanta, GA 30332, USA
| | - Xinyu Chen
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology, Emory University School of Medicine, Atlanta, GA 30332, USA
| | - Olivia Kronick
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology, Emory University School of Medicine, Atlanta, GA 30332, USA
| | - Rachel Fisher
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology, Emory University School of Medicine, Atlanta, GA 30332, USA
| | - Vamsi Kota
- Division of Hematology and Oncology, Georgia Cancer Center, Augusta University, Augusta, GA 30912, USA
| | - Cassie S. Mitchell
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology, Emory University School of Medicine, Atlanta, GA 30332, USA
- Center for Machine Learning, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
8
|
Allegri SA, McCoy K, Mitchell CS. CompositeView: A Network-Based Visualization Tool. BIG DATA AND COGNITIVE COMPUTING 2022; 6. [PMID: 35847767 PMCID: PMC9281616 DOI: 10.3390/bdcc6020066] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Large networks are quintessential to bioinformatics, knowledge graphs, social network analysis, and graph-based learning. CompositeView is a Python-based open-source application that improves interactive complex network visualization and extraction of actionable insight. CompositeView utilizes specifically formatted input data to calculate composite scores and display them using the Cytoscape component of Dash. Composite scores are defined representations of smaller sets of conceptually similar data that, when combined, generate a single score to reduce information overload. Visualized interactive results are user-refined via filtering elements such as node value and edge weight sliders and graph manipulation options (e.g., node color and layout spread). The primary difference between CompositeView and other network visualization tools is its ability to auto-calculate and auto-update composite scores as the user interactively filters or aggregates data. CompositeView was developed to visualize network relevance rankings, but it performs well with non-network data. Three disparate CompositeView use cases are shown: relevance rankings from SemNet 2.0, an open-source knowledge graph relationship ranking software for biomedical literature-based discovery; Human Development Index (HDI) data; and the Framingham cardiovascular study. CompositeView was stress tested to construct reference benchmarks that define breadth and size of data effectively visualized. Finally, CompositeView is compared to Excel, Tableau, Cytoscape, neo4j, NodeXL, and Gephi.
Collapse
Affiliation(s)
- Stephen A. Allegri
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Kevin McCoy
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Cassie S. Mitchell
- Laboratory for Pathology Dynamics, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Machine Learning Center at Georgia Tech, Georgia Institute of Technology, Atlanta, GA 30332, USA
- Correspondence:
| |
Collapse
|
9
|
Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0. BIG DATA AND COGNITIVE COMPUTING 2022; 6. [PMID: 35936510 PMCID: PMC9351549 DOI: 10.3390/bdcc6010027] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Literature-based discovery (LBD) summarizes information and generates insight from large text corpuses. The SemNet framework utilizes a large heterogeneous information network or “knowledge graph” of nodes and edges to compute relatedness and rank concepts pertinent to a user-specified target. SemNet provides a way to perform multi-factorial and multi-scalar analysis of complex disease etiology and therapeutic identification using the 33+ million articles in PubMed. The present work improves the efficacy and efficiency of LBD for end users by augmenting SemNet to create SemNet 2.0. A custom Python data structure replaced reliance on Neo4j to improve knowledge graph query times by several orders of magnitude. Additionally, two randomized algorithms were built to optimize the HeteSim metric calculation for computing metapath similarity. The unsupervised learning algorithm for rank aggregation (ULARA), which ranks concepts with respect to the user-specified target, was reconstructed using derived mathematical proofs of correctness and probabilistic performance guarantees for optimization. The upgraded ULARA is generalizable to other rank aggregation problems outside of SemNet. In summary, SemNet 2.0 is a comprehensive open-source software for significantly faster, more effective, and user-friendly means of automated biomedical LBD. An example case is performed to rank relationships between Alzheimer’s disease and metabolic co-morbidities.
Collapse
|
10
|
Biomedical Text Link Prediction for Drug Discovery: A Case Study with COVID-19. Pharmaceutics 2021; 13:pharmaceutics13060794. [PMID: 34073456 PMCID: PMC8230210 DOI: 10.3390/pharmaceutics13060794] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 05/05/2021] [Accepted: 05/19/2021] [Indexed: 01/03/2023] Open
Abstract
Link prediction in artificial intelligence is used to identify missing links or derive future relationships that can occur in complex networks. A link prediction model was developed using the complex heterogeneous biomedical knowledge graph, SemNet, to predict missing links in biomedical literature for drug discovery. A web application visualized knowledge graph embeddings and link prediction results using TransE, CompleX, and RotatE based methods. The link prediction model achieved up to 0.44 hits@10 on the entity prediction tasks. The recent outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), also known as COVID-19, served as a case study to demonstrate the efficacy of link prediction modeling for drug discovery. The link prediction algorithm guided identification and ranking of repurposed drug candidates for SARS-CoV-2 primarily by text mining biomedical literature from previous coronaviruses, including SARS and middle east respiratory syndrome (MERS). Repurposed drugs included potential primary SARS-CoV-2 treatment, adjunctive therapies, or therapeutics to treat side effects. The link prediction accuracy for nodes ranked highly for SARS coronavirus was 0.875 as calculated by human in the loop validation on existing COVID-19 specific data sets. Drug classes predicted as highly ranked include anti-inflammatory, nucleoside analogs, protease inhibitors, antimalarials, envelope proteins, and glycoproteins. Examples of highly ranked predicted links to SARS-CoV-2: human leukocyte interferon, recombinant interferon-gamma, cyclosporine, antiviral therapy, zidovudine, chloroquine, vaccination, methotrexate, artemisinin, alkaloids, glycyrrhizic acid, quinine, flavonoids, amprenavir, suramin, complement system proteins, fluoroquinolones, bone marrow transplantation, albuterol, ciprofloxacin, quinolone antibacterial agents, and hydroxymethylglutaryl-CoA reductase inhibitors. Approximately 40% of identified drugs were not previously connected to SARS, such as edetic acid or biotin. In summary, link prediction can effectively suggest repurposed drugs for emergent diseases.
Collapse
|