1
|
Taub-Tabib H, Shamay Y, Shlain M, Pinhasov M, Polak M, Tiktinsky A, Rahamimov S, Bareket D, Eyal B, Kassis M, Goldberg Y, Kaminski Rosenberg T, Vulfsons S, Ben Sasson M. Identifying symptom etiologies using syntactic patterns and large language models. Sci Rep 2024; 14:16190. [PMID: 39003296 PMCID: PMC11246441 DOI: 10.1038/s41598-024-65645-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 06/21/2024] [Indexed: 07/15/2024] Open
Abstract
Differential diagnosis is a crucial aspect of medical practice, as it guides clinicians to accurate diagnoses and effective treatment plans. Traditional resources, such as medical books and services like UpToDate, are constrained by manual curation, potentially missing out on novel or less common findings. This paper introduces and analyzes two novel methods to mine etiologies from scientific literature. The first method employs a traditional Natural Language Processing (NLP) approach based on syntactic patterns. By using a novel application of human-guided pattern bootstrapping patterns are derived quickly, and symptom etiologies are extracted with significant coverage. The second method utilizes generative models, specifically GPT-4, coupled with a fact verification pipeline, marking a pioneering application of generative techniques in etiology extraction. Analyzing this second method shows that while it is highly precise, it offers lesser coverage compared to the syntactic approach. Importantly, combining both methodologies yields synergistic outcomes, enhancing the depth and reliability of etiology mining.
Collapse
Affiliation(s)
| | - Yosi Shamay
- Faculty of Biomedical Engineering, Technion, Haifa, Israel
| | | | | | | | | | | | | | - Ben Eyal
- Allen Institute for AI, Seattle, USA
| | | | - Yoav Goldberg
- Allen Institute for AI, Seattle, USA
- Computer Science Department, Bar Ilan University, Ramat Gan, Israel
| | | | - Simon Vulfsons
- Institute for Pain Medicine, Rambam Health Campus, Haifa, Israel
| | - Maayan Ben Sasson
- Institute for Pain Medicine, Rambam Health Campus, Haifa, Israel.
- Alan Edwards Pain Management Unit, McGill University Health Centre, Montreal, QC, Canada.
| |
Collapse
|
2
|
García Sánchez N, Ugarte Carro E, Prieto-Santamaría L, Rodríguez-González A. Protein sequence analysis in the context of drug repurposing. BMC Med Inform Decis Mak 2024; 24:122. [PMID: 38741115 DOI: 10.1186/s12911-024-02531-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 05/08/2024] [Indexed: 05/16/2024] Open
Abstract
MOTIVATION Drug repurposing speeds up the development of new treatments, being less costly, risky, and time consuming than de novo drug discovery. There are numerous biological elements that contribute to the development of diseases and, as a result, to the repurposing of drugs. METHODS In this article, we analysed the potential role of protein sequences in drug repurposing scenarios. For this purpose, we embedded the protein sequences by performing four state of the art methods and validated their capacity to encapsulate essential biological information through visualization. Then, we compared the differences in sequence distance between protein-drug target pairs of drug repurposing and non - drug repurposing data. Thus, we were able to uncover patterns that define protein sequences in repurposing cases. RESULTS We found statistically significant sequence distance differences between protein pairs in the repurposing data and the rest of protein pairs in non-repurposing data. In this manner, we verified the potential of using numerical representations of sequences to generate repurposing hypotheses in the future.
Collapse
Affiliation(s)
- Natalia García Sánchez
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, 28223, Spain
| | - Esther Ugarte Carro
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, 28223, Spain
| | - Lucía Prieto-Santamaría
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, 28223, Spain
- ETS de Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, 28660, Spain
| | - Alejandro Rodríguez-González
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, 28223, Spain.
- ETS de Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, 28660, Spain.
| |
Collapse
|
3
|
Liu F, Patt A, Chen C, Huang R, Xu Y, Mathé EA, Zhu Q. Exploring NCATS in-house biomedical data for evidence-based drug repurposing. PLoS One 2024; 19:e0289518. [PMID: 38271343 PMCID: PMC10810548 DOI: 10.1371/journal.pone.0289518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 11/08/2023] [Indexed: 01/27/2024] Open
Abstract
Drug repurposing is a strategy for identifying new uses of approved or investigational drugs that are outside the scope of the original medical indication. Even though many repurposed drugs have been found serendipitously in the past, the increasing availability of large volumes of biomedical data has enabled more systemic, data-driven approaches for drug candidate identification. At National Center of Advancing Translational Sciences (NCATS), we invent new methods to generate new data and information publicly available to spur innovation and scientific discovery. In this study, we aimed to explore and demonstrate biomedical data generated and collected via two NCATS research programs, the Toxicology in the 21st Century program (Tox21) and the Biomedical Data Translator (Translator) for the application of drug repurposing. These two programs provide complementary types of biomedical data from uncovering underlying biological mechanisms with bioassay screening data from Tox21 for chemical clustering, to enrich clustered chemicals with scientific evidence mined from the Translator towards drug repurposing. 129 chemical clusters have been generated and three of them have been further investigated for drug repurposing candidate identification, which is detailed as case studies.
Collapse
Affiliation(s)
- Fang Liu
- Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Andrew Patt
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, Maryland, United States of America
| | - Chloe Chen
- Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Ruili Huang
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, Maryland, United States of America
| | - Yanji Xu
- Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Ewy A. Mathé
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, Maryland, United States of America
| | - Qian Zhu
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, Maryland, United States of America
| |
Collapse
|
4
|
Otero-Carrasco B, Ugarte Carro E, Prieto-Santamaría L, Diaz Uzquiano M, Caraça-Valente Hernández JP, Rodríguez-González A. Identifying patterns to uncover the importance of biological pathways on known drug repurposing scenarios. BMC Genomics 2024; 25:43. [PMID: 38191292 PMCID: PMC10775474 DOI: 10.1186/s12864-023-09913-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 12/15/2023] [Indexed: 01/10/2024] Open
Abstract
BACKGROUND Drug repurposing plays a significant role in providing effective treatments for certain diseases faster and more cost-effectively. Successful repurposing cases are mostly supported by a classical paradigm that stems from de novo drug development. This paradigm is based on the "one-drug-one-target-one-disease" idea. It consists of designing drugs specifically for a single disease and its drug's gene target. In this article, we investigated the use of biological pathways as potential elements to achieve effective drug repurposing. METHODS Considering a total of 4214 successful cases of drug repurposing, we identified cases in which biological pathways serve as the underlying basis for successful repurposing, referred to as DREBIOP. Once the repurposing cases based on pathways were identified, we studied their inherent patterns by considering the different biological elements associated with this dataset, as well as the pathways involved in these cases. Furthermore, we obtained gene-disease association values to demonstrate the diminished significance of the drug's gene target in these repurposing cases. To achieve this, we compared the values obtained for the DREBIOP set with the overall association values found in DISNET, as well as with the drug's target gene (DREGE) based repurposing cases using the Mann-Whitney U Test. RESULTS A collection of drug repurposing cases, known as DREBIOP, was identified as a result. DREBIOP cases exhibit distinct characteristics compared with DREGE cases. Notably, DREBIOP cases are associated with a higher number of biological pathways, with Vitamin D Metabolism and ACE inhibitors being the most prominent pathways. Additionally, it was observed that the association values of GDAs in DREBIOP cases were significantly lower than those in DREGE cases (p-value < 0.05). CONCLUSIONS Biological pathways assume a pivotal role in drug repurposing cases. This investigation successfully revealed patterns that distinguish drug repurposing instances associated with biological pathways. These identified patterns can be applied to any known repurposing case, enabling the detection of pathway-based repurposing scenarios or the classical paradigm.
Collapse
Affiliation(s)
- Belén Otero-Carrasco
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, 28223, Spain
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, 28660, Spain
| | - Esther Ugarte Carro
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, 28223, Spain
| | - Lucía Prieto-Santamaría
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, 28223, Spain
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, 28660, Spain
| | - Marina Diaz Uzquiano
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, 28223, Spain
| | | | - Alejandro Rodríguez-González
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, 28223, Spain.
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, 28660, Spain.
| |
Collapse
|
5
|
Ayuso-Muñoz A, Prieto-Santamaría L, Ugarte-Carro E, Serrano E, Rodríguez-González A. Uncovering hidden therapeutic indications through drug repurposing with graph neural networks and heterogeneous data. Artif Intell Med 2023; 145:102687. [PMID: 37925215 DOI: 10.1016/j.artmed.2023.102687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 10/04/2023] [Accepted: 10/13/2023] [Indexed: 11/06/2023]
Abstract
Drug repurposing has gained the attention of many in the recent years. The practice of repurposing existing drugs for new therapeutic uses helps to simplify the drug discovery process, which in turn reduces the costs and risks that are associated with de novo development. Representing biomedical data in the form of a graph is a simple and effective method to depict the underlying structure of the information. Using deep neural networks in combination with this data represents a promising approach to address drug repurposing. This paper presents BEHOR a more comprehensive version of the REDIRECTION model, which was previously presented. Both versions utilize the DISNET biomedical graph as the primary source of information, providing the model with extensive and intricate data to tackle the drug repurposing challenge. This new version's results for the reported metrics in the RepoDB test are 0.9604 for AUROC and 0.9518 for AUPRC. Additionally, a discussion is provided regarding some of the novel predictions to demonstrate the reliability of the model. The authors believe that BEHOR holds promise for generating drug repurposing hypotheses and could greatly benefit the field.
Collapse
Affiliation(s)
- Adrián Ayuso-Muñoz
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain; Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, 28223 Pozuelo de Alarcón, Madrid, Spain.
| | - Lucía Prieto-Santamaría
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain; Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, 28223 Pozuelo de Alarcón, Madrid, Spain.
| | - Esther Ugarte-Carro
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain; Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, 28223 Pozuelo de Alarcón, Madrid, Spain.
| | - Emilio Serrano
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain.
| | - Alejandro Rodríguez-González
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain; Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, 28223 Pozuelo de Alarcón, Madrid, Spain.
| |
Collapse
|
6
|
Liu F, Patt A, Chen C, Huang R, Xu Y, Mathé EA, Zhu Q. Exploring NCATS In-House Biomedical Data for Evidence-based Drug Repurposing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.21.550045. [PMID: 37546930 PMCID: PMC10401966 DOI: 10.1101/2023.07.21.550045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Drug repurposing is a strategy for identifying new uses of approved or investigational drugs that are outside the scope of the original medical indication. Even though many repurposed drugs have been found serendipitously in the past, the increasing availability of large volumes of biomedical data has enabled more systemic, data-driven approaches for drug candidate identification. At National Center of Advancing Translational Sciences (NCATS), we invent new methods to generate new data and information publicly available to spur innovation and scientific discovery. In this study, we aimed to explore and demonstrate biomedical data generated and collected via two NCATS research programs, the Toxicology in the 21st Century program (Tox21) and the Biomedical Data Translator (Translator) for the application of drug repurposing. These two programs provide complementary types of biomedical data from uncovering underlying biological mechanisms with bioassay screening data from Tox21 for chemical clustering, to enrich clustered chemicals with scientific evidence mined from the Translator towards drug repurposing. 129 chemical clusters have been generated and three of them have been further investigated for drug repurposing candidate identification, which is detailed as case studies.
Collapse
Affiliation(s)
- Fang Liu
- Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD
| | - Andrew Patt
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD
| | - Chloe Chen
- Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD
| | - Ruili Huang
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD
| | - Yanji Xu
- Division of Rare Diseases Research Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD
| | - Ewy A Mathé
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD
| | - Qian Zhu
- Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Rockville, MD
| |
Collapse
|
7
|
Repositioning Drugs for Rare Diseases Based on Biological Features and Computational Approaches. Healthcare (Basel) 2022; 10:healthcare10091784. [PMID: 36141396 PMCID: PMC9498751 DOI: 10.3390/healthcare10091784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 09/12/2022] [Accepted: 09/14/2022] [Indexed: 11/16/2022] Open
Abstract
Rare diseases are a group of uncommon diseases in the world population. To date, about 7000 rare diseases have been documented. However, most of them do not have a known treatment. As a result of the relatively low demand for their treatments caused by their scarce prevalence, the pharmaceutical industry has not sufficiently encouraged the research to develop drugs to treat them. This work aims to analyse potential drug-repositioning strategies for this kind of disease. Drug repositioning seeks to find new uses for existing drugs. In this context, it seeks to discover if rare diseases could be treated with medicines previously indicated to heal other diseases. Our approaches tackle the problem by employing computational methods that calculate similarities between rare and non-rare diseases, considering biological features such as genes, proteins, and symptoms. Drug candidates for repositioning will be checked against clinical trials found in the scientific literature. In this study, 13 different rare diseases have been selected for which potential drugs could be repositioned. By verifying these drugs in the scientific literature, successful cases were found for 75% of the rare diseases studied. The genetic associations and phenotypical features of the rare diseases were examined. In addition, the verified drugs were classified according to the anatomical therapeutic chemical (ATC) code to highlight the types with a higher predisposition to be repositioned. These promising results open the door for further research in this field of study.
Collapse
|
8
|
Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022; 23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open
Abstract
In recent decades, exploring potential relationships between diseases has been an active research field. With the rapid accumulation of disease-related biomedical data, a lot of computational methods and tools/platforms have been developed to reveal intrinsic relationship between diseases, which can provide useful insights to the study of complex diseases, e.g. understanding molecular mechanisms of diseases and discovering new treatment of diseases. Human complex diseases involve both external phenotypic abnormalities and complex internal molecular mechanisms in organisms. Computational methods with different types of biomedical data from phenotype to genotype can evaluate disease-disease associations at different levels, providing a comprehensive perspective for understanding diseases. In this review, available biomedical data and databases for evaluating disease-disease associations are first summarized. Then, existing computational methods for disease-disease associations are reviewed and classified into five groups in terms of the usages of biomedical data, including disease semantic-based, phenotype-based, function-based, representation learning-based and text mining-based methods. Further, we summarize software tools/platforms for computation and analysis of disease-disease associations. Finally, we give a discussion and summary on the research of disease-disease associations. This review provides a systematic overview for current disease association research, which could promote the development and applications of computational methods and tools/platforms for disease-disease associations.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, China
| | - Jiashuai Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, China
| | - Fang-Xiang Wu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- Division of Biomedical Engineering and Department of Mechanical Engineering at University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
9
|
Prieto Santamaría L, García Del Valle EP, Zanin M, Hernández Chan GS, Pérez Gallardo Y, Rodríguez-González A. Classifying diseases by using biological features to identify potential nosological models. Sci Rep 2021; 11:21096. [PMID: 34702888 PMCID: PMC8548311 DOI: 10.1038/s41598-021-00554-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 10/14/2021] [Indexed: 11/25/2022] Open
Abstract
Established nosological models have provided physicians an adequate enough classification of diseases so far. Such systems are important to correctly identify diseases and treat them successfully. However, these taxonomies tend to be based on phenotypical observations, lacking a molecular or biological foundation. Therefore, there is an urgent need to modernize them in order to include the heterogeneous information that is produced in the present, as could be genomic, proteomic, transcriptomic and metabolic data, leading this way to more comprehensive and robust structures. For that purpose, we have developed an extensive methodology to analyse the possibilities when it comes to generate new nosological models from biological features. Different datasets of diseases have been considered, and distinct features related to diseases, namely genes, proteins, metabolic pathways and genetical variants, have been represented as binary and numerical vectors. From those vectors, diseases distances have been computed on the basis of several metrics. Clustering algorithms have been implemented to group diseases, generating different models, each of them corresponding to the distinct combinations of the previous parameters. They have been evaluated by means of intrinsic metrics, proving that some of them are highly suitable to cover new nosologies. One of the clustering configurations has been deeply analysed, demonstrating its quality and validity in the research context, and further biological interpretations have been made. Such model was particularly generated by OPTICS clustering algorithm, by studying the distance between diseases based on gene sharedness and following cosine index metric. 729 clusters were formed in this model, which obtained a Silhouette coefficient of 0.43.
Collapse
Affiliation(s)
- Lucía Prieto Santamaría
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain. .,Ezeris Networks Global Services S.L., 28028, Madrid, Spain.
| | | | - Massimiliano Zanin
- Instituto de Física Interdisciplinar y Sistemas Complejos, CSIC-UIB, 07122, Palma de Mallorca, Spain
| | | | | | | |
Collapse
|
10
|
Prieto Santamaría L, Díaz Uzquiano M, Ugarte Carro E, Ortiz-Roldán N, Pérez Gallardo Y, Rodríguez-González A. Integrating heterogeneous data to facilitate COVID-19 drug repurposing. Drug Discov Today 2021; 27:558-566. [PMID: 34666181 PMCID: PMC8520166 DOI: 10.1016/j.drudis.2021.10.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Revised: 06/28/2021] [Accepted: 10/08/2021] [Indexed: 01/03/2023]
Abstract
In the COVID-19 pandemic, drug repositioning has presented itself as an alternative to the time-consuming process of generating new drugs. This review describes a drug repurposing process that is based on a new data-driven approach: we put forward five information paths that associate COVID-19-related genes and COVID-19 symptoms with drugs that directly target these gene products, that target the symptoms or that treat diseases that are symptomatically or genetically similar to COVID-19. The intersection of the five information paths results in a list of 13 drugs that we suggest as potential candidates against COVID-19. In addition, we have found information in published studies and in clinical trials that support the therapeutic potential of the drugs in our final list.
Collapse
Affiliation(s)
- Lucía Prieto Santamaría
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain; Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain; Ezeris Networks Global Services S.L., 28028 Madrid, Spain
| | - Marina Díaz Uzquiano
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain
| | - Esther Ugarte Carro
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain
| | - Nieves Ortiz-Roldán
- Facultativo Especialista Área (FEA), Anestesiología y Reanimación, Hospital Sierrallana, Servicio Cántabro de Salud, 39300 Torrelavega, Cantabria, Spain
| | | | - Alejandro Rodríguez-González
- ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain; Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain.
| |
Collapse
|
11
|
Prieto Santamaría L, Ugarte Carro E, Díaz Uzquiano M, Menasalvas Ruiz E, Pérez Gallardo Y, Rodríguez-González A. A data-driven methodology towards evaluating the potential of drug repurposing hypotheses. Comput Struct Biotechnol J 2021; 19:4559-4573. [PMID: 34471499 PMCID: PMC8387760 DOI: 10.1016/j.csbj.2021.08.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 07/08/2021] [Accepted: 08/03/2021] [Indexed: 12/14/2022] Open
Abstract
Drug repurposing has become a widely used strategy to accelerate the process of finding treatments. While classical de novo drug development involves high costs, risks, and time-consuming paths, drug repurposing allows to reuse already-existing and approved drugs for new indications. Numerous research has been carried out in this field, both in vitro and in silico. Computational drug repurposing methods make use of modern heterogeneous biomedical data to identify and prioritize new indications for old drugs. In the current paper, we present a new complete methodology to evaluate new potentially repurposable drugs based on disease-gene and disease-phenotype associations, identifying significant differences between repurposing and non-repurposing data. We have collected a set of known successful drug repurposing case studies from the literature and we have analysed their dissimilarities with other biomedical data not necessarily participating in repurposing processes. The information used has been obtained from the DISNET platform. We have performed three analyses (at the genetical, phenotypical, and categorization levels), to conclude that there is a statistically significant difference between actual repurposing-related information and non-repurposing data. The insights obtained could be relevant when suggesting new potential drug repurposing hypotheses.
Collapse
Key Words
- ACE, Angiotensin I Converting Enzyme
- AHR, Aryl Hydrocarbon Receptor
- ALK, Anaplastic Lymphoma Kinase
- API, Application Programming Interface
- CMap, Connectivity Map
- COX-2, Cyclooxygenase 2
- CUI, Concept Unique Identifier
- DISNET knowledge base
- DR, Drug Repurposing or Drug Repositioning
- DRD3, Dopamine Receptor D3
- Data integration
- Disease understanding
- Drug repositioning
- Drug repurposing
- Drug-disease validation
- ESR1, Estrogen Receptor 1
- ESR2, Estrogen Receptor 2
- FCGR2A, Fc Fragment Of IgG Receptor IIa
- FCGR3A, Fc Fragment Of IgG Receptor IIIa
- FCGR3B, Fc Fragment Of IgG Receptor IIIb
- GDA, Gene Disease Association
- ICD-10-CM, International Classification of Diseases, 10th revision, Clinical Modification
- ID, Identifier
- KDR, Kinase insert Domain Receptor
- LTα, Lymphotoxin alpha
- MeSH-PA, Medical Subject Headings – Pharmacological Action
- ND, New Disease
- NLM, National Library of Medicine
- OD, Original Disease
- PTGS2, Prostaglandin-endoperoxidase synthase 2
- SM, Supplementary Material
- SRD5A1, Steroid 5 Alpha-Reductase 1
- SRD5A2, Steroid 5 Alpha-Reductase 2
- TNFα, Tumour Necrosis Factor alpha
- UMLS, Unified Medical Language System
Collapse
Affiliation(s)
- Lucía Prieto Santamaría
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain.,ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain.,Ezeris Networks Global Services S.L., 28028 Madrid, Spain
| | - Esther Ugarte Carro
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain
| | - Marina Díaz Uzquiano
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain
| | - Ernestina Menasalvas Ruiz
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain.,ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain
| | | | - Alejandro Rodríguez-González
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain.,ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, 28660 Boadilla del Monte, Madrid, Spain
| |
Collapse
|
12
|
García Del Valle EP, Lagunes García G, Prieto Santamaría L, Zanin M, Menasalvas Ruiz E, Rodríguez-González A. DisMaNET: A network-based tool to cross map disease vocabularies. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 207:106233. [PMID: 34157517 DOI: 10.1016/j.cmpb.2021.106233] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 06/02/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVES The growing integration of healthcare sources is improving our understanding of diseases. Cross-mapping resources such as UMLS play a very important role in this area, but their coverage is still incomplete. With the aim to facilitate the integration and interoperability of biological, clinical and literary sources in studies of diseases, we built DisMaNET, a system to cross-map terms from disease vocabularies by leveraging the power and interpretability of network analysis. METHODS First, we collected and normalized data from 8 disease vocabularies and mapping sources to generate our datasets. Next, we built DisMaNET by integrating the generated datasets into a Neo4j graph database. Then we exploited the query mechanisms of Neo4j to cross-map disease terms of different vocabularies with a relevance score metric and contrasted the results with some state-of-the-art solutions. Finally, we made our system publicly available for its exploitation and evaluation both through a graphical user interface and REST APIs. RESULTS DisMaNET contains almost half a million nodes and near nine hundred thousand edges, including hierarchical and mapping relationships. Its query capabilities enabled the detection of connections between disease vocabularies that are not present in major mapping sources such as UMLS and the Disease Ontology, even for rare diseases. Furthermore, DisMaNET was capable of obtaining more than 80% of the mappings with UMLS reported in MonDO and DisGeNET, and it was successfully exploited to resolve the missing mappings in the DISNET project. CONCLUSIONS DisMaNET is a powerful, intuitive and publicly available system to cross-map terms from different disease vocabularies. Our study proves that it is a competitive alternative to existing mapping systems, incorporating the potential of network analysis and the interpretability of the results through a visual interface as its main advantages. Expansion with new sources, versioning and the improvement of the search and scoring algorithms are envisioned as future lines of work.
Collapse
Affiliation(s)
| | - Gerardo Lagunes García
- ETS de Ingenieros Informáticos. Universidad Politécnica de Madrid. Boadilla del Monte, Madrid, Spain; Centro de Tecnología Biomédica, ETS Ingenieros Informáticos. Universidad Politécnica de Madrid. Pozuelo de Alarcón, Madrid, Spain
| | - Lucía Prieto Santamaría
- Centro de Tecnología Biomédica, ETS Ingenieros Informáticos. Universidad Politécnica de Madrid. Pozuelo de Alarcón, Madrid, Spain
| | - Massimiliano Zanin
- Instituto de Física Interdisciplinar y Sistemas Complejos IFISC (CSIC-UIB), Campus UIB, Palma de Mallorca, Spain
| | - Ernestina Menasalvas Ruiz
- ETS de Ingenieros Informáticos. Universidad Politécnica de Madrid. Boadilla del Monte, Madrid, Spain; Centro de Tecnología Biomédica, ETS Ingenieros Informáticos. Universidad Politécnica de Madrid. Pozuelo de Alarcón, Madrid, Spain
| | - Alejandro Rodríguez-González
- ETS de Ingenieros Informáticos. Universidad Politécnica de Madrid. Boadilla del Monte, Madrid, Spain; Centro de Tecnología Biomédica, ETS Ingenieros Informáticos. Universidad Politécnica de Madrid. Pozuelo de Alarcón, Madrid, Spain
| |
Collapse
|
13
|
García Del Valle EP, Lagunes García G, Prieto Santamaría L, Zanin M, Menasalvas Ruiz E, Rodríguez-González A. Leveraging network analysis to evaluate biomedical named entity recognition tools. Sci Rep 2021; 11:13537. [PMID: 34188248 PMCID: PMC8242017 DOI: 10.1038/s41598-021-93018-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 06/18/2021] [Indexed: 02/06/2023] Open
Abstract
The ever-growing availability of biomedical text sources has resulted in a boost in clinical studies based on their exploitation. Biomedical named-entity recognition (bio-NER) techniques have evolved remarkably in recent years and their application in research is increasingly successful. Still, the disparity of tools and the limited available validation resources are barriers preventing a wider diffusion, especially within clinical practice. We here propose the use of omics data and network analysis as an alternative for the assessment of bio-NER tools. Specifically, our method introduces quality criteria based on edge overlap and community detection. The application of these criteria to four bio-NER solutions yielded comparable results to strategies based on annotated corpora, without suffering from their limitations. Our approach can constitute a guide both for the selection of the best bio-NER tool given a specific task, and for the creation and validation of novel approaches.
Collapse
Affiliation(s)
| | - Gerardo Lagunes García
- ETS de Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain
- Centro de Tecnología Biomédica, ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, Spain
| | - Lucía Prieto Santamaría
- Centro de Tecnología Biomédica, ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, Spain
| | - Massimiliano Zanin
- Instituto de Física Interdisciplinar y Sistemas Complejos IFISC (CSIC-UIB), Campus UIB, Palma de Mallorca, Spain
| | - Ernestina Menasalvas Ruiz
- ETS de Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain
- Centro de Tecnología Biomédica, ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, Spain
| | - Alejandro Rodríguez-González
- ETS de Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain
- Centro de Tecnología Biomédica, ETS Ingenieros Informáticos, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, Spain
| |
Collapse
|
14
|
Yu L, Yu S. Developing an automated mechanism to identify medical articles from wikipedia for knowledge extraction. Int J Med Inform 2020; 141:104234. [PMID: 32693245 PMCID: PMC7357526 DOI: 10.1016/j.ijmedinf.2020.104234] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 07/01/2020] [Accepted: 07/11/2020] [Indexed: 11/25/2022]
Abstract
Wikipedia contains rich biomedical information that can support medical informatics studies and applications. Identifying the subset of medical articles of Wikipedia has many benefits, such as facilitating medical knowledge extraction, serving as a corpus for language modeling, or simply making the size of data easy to work with. However, due to the extremely low prevalence of medical articles in the entire Wikipedia, articles identified by generic text classifiers would be bloated by irrelevant pages. To control the false discovery rate while maintaining a high recall, we developed a mechanism that leverages the rich page elements and the connected nature of Wikipedia and uses a crawling classification strategy to achieve accurate classification. Structured assertional knowledge in Infoboxes and Wikidata items associated with the identified medical articles were also extracted. This automatic mechanism is aimed to run periodically to update the results and share them with the informatics community.
Collapse
Affiliation(s)
- Lishan Yu
- Department of Mathematical Sciences, Tsinghua University, Beijing, China
| | - Sheng Yu
- Center for Statistical Science, Tsinghua University, Beijing, China; Department of Industrial Engineering, Tsinghua University, Beijing, China; Institute for Data Science, Tsinghua University, Beijing, China.
| |
Collapse
|