1
|
Ren Z, Zeng X, Lao Y, Zheng H, You Z, Xiang H, Zou Q. A spatial hierarchical network learning framework for drug repositioning allowing interpretation from macro to micro scale. Commun Biol 2024; 7:1413. [PMID: 39478146 PMCID: PMC11525566 DOI: 10.1038/s42003-024-07107-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Accepted: 10/21/2024] [Indexed: 11/02/2024] Open
Abstract
Biomedical network learning offers fresh prospects for expediting drug repositioning. However, traditional network architectures struggle to quantify the relationship between micro-scale drug spatial structures and corresponding macro-scale biomedical networks, limiting their ability to capture key pharmacological properties and complex biomedical information crucial for drug screening and therapeutic discovery. Moreover, challenges such as difficulty in capturing long-range dependencies hinder current network-based approaches. To address these limitations, we introduce the Spatial Hierarchical Network, modeling molecular 3D structures and biological associations into a unified network. We propose an end-to-end framework, SpHN-VDA, integrating spatial hierarchical information through triple attention mechanisms to enhance machine understanding of molecular functionality and improve the accuracy of virus-drug association identification. SpHN-VDA outperforms leading models across three datasets, particularly excelling in out-of-distribution and cold-start scenarios. It also exhibits enhanced robustness against data perturbation, ranging from 20% to 40%. It accurately identifies critical motifs for binding sites, even without protein residue annotations. Leveraging reliability of SpHN-VDA, we have identified 25 potential candidate drugs through gene expression analysis and CMap. Molecular docking experiments with the SARS-CoV-2 spike protein further corroborate the predictions. This research highlights the broad potential of SpHN-VDA to enhance drug repositioning and identify effective treatments for various diseases.
Collapse
Affiliation(s)
- Zhonghao Ren
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Yizhen Lao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Heping Zheng
- College of Biology, Department of Molecular Medicine, Hunan University, Changsha, China
| | - Zhuhong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Hongxin Xiang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
2
|
Gordillo-Marañón M, Schmidt AF, Warwick A, Tomlinson C, Ytsma C, Engmann J, Torralbo A, Maclean R, Sofat R, Langenberg C, Shah AD, Denaxas S, Pirmohamed M, Hemingway H, Hingorani AD, Finan C. Disease coverage of human genome-wide association studies and pharmaceutical research and development. COMMUNICATIONS MEDICINE 2024; 4:195. [PMID: 39379679 PMCID: PMC11461613 DOI: 10.1038/s43856-024-00625-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 09/25/2024] [Indexed: 10/10/2024] Open
Abstract
BACKGROUND Despite the growing interest in the use of human genomic data for drug target identification and validation, the extent to which the spectrum of human disease has been addressed by genome-wide association studies (GWAS), or by drug development, and the degree to which these efforts overlap remain unclear. METHODS In this study we harmonize and integrate different data sources to create a sample space of all the human drug targets and diseases and identify points of convergence or divergence of GWAS and drug development efforts. RESULTS We show that only 612 of 11,158 diseases listed in Human Disease Ontology have an approved drug treatment in at least one region of the world. Of the 1414 diseases that are the subject of preclinical or clinical phase drug development, only 666 have been investigated in GWAS. Conversely, of the 1914 human diseases that have been the subject of GWAS, 1121 have yet to be investigated in drug development. CONCLUSIONS We produce target-disease indication lists to help the pharmaceutical industry to prioritize future drug development efforts based on genetic evidence, academia to prioritize future GWAS for diseases without effective treatments, and both sectors to harness genetic evidence to expand the indications for licensed drugs or to identify repurposing opportunities for clinical candidates that failed in their originally intended indication.
Collapse
Affiliation(s)
- María Gordillo-Marañón
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, United Kingdom.
| | - Amand F Schmidt
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, United Kingdom
- Department of Cardiology, Amsterdam Cardiovascular Sciences, Amsterdam University Medical Centres, University of Amsterdam, Amsterdam, the Netherlands
- UCL British Heart Foundation Research Accelerator, London, United Kingdom
| | - Alasdair Warwick
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, United Kingdom
| | - Chris Tomlinson
- Institute of Health Informatics, Faculty of Population Health, University College London, London, United Kingdom
| | - Cai Ytsma
- Institute of Health Informatics, Faculty of Population Health, University College London, London, United Kingdom
| | - Jorgen Engmann
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, United Kingdom
| | - Ana Torralbo
- Institute of Health Informatics, Faculty of Population Health, University College London, London, United Kingdom
| | - Rory Maclean
- Institute of Health Informatics, Faculty of Population Health, University College London, London, United Kingdom
| | - Reecha Sofat
- Department of Pharmacology and Therapeutics, University of Liverpool, Liverpool, United Kingdom
- Health Data Research, London, United Kingdom
| | - Claudia Langenberg
- Precision Healthcare University Research Institute, Queen Mary University of London, London, United Kingdom
- Computational Medicine, Berlin Institute of Health at Charité Universitätsmedizin, Berlin, Germany
- MRC Epidemiology Unit, University of Cambridge, Cambridge, United Kingdom
| | - Anoop D Shah
- Institute of Health Informatics, Faculty of Population Health, University College London, London, United Kingdom
- NIHR Biomedical Research Centre at University College London Hospitals, London, United Kingdom
| | - Spiros Denaxas
- Institute of Health Informatics, Faculty of Population Health, University College London, London, United Kingdom
- NIHR Biomedical Research Centre at University College London Hospitals, London, United Kingdom
- British Heart Foundation Data Science Centre, London, United Kingdom
| | - Munir Pirmohamed
- Department of Pharmacology and Therapeutics, Centre for Drug Safety Science, University of Liverpool, Liverpool, United Kingdom
| | - Harry Hemingway
- Institute of Health Informatics, Faculty of Population Health, University College London, London, United Kingdom
- Health Data Research, London, United Kingdom
- NIHR Biomedical Research Centre at University College London Hospitals, London, United Kingdom
| | - Aroon D Hingorani
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, United Kingdom
- UCL British Heart Foundation Research Accelerator, London, United Kingdom
| | - Chris Finan
- Institute of Cardiovascular Science, Faculty of Population Health, University College London, London, United Kingdom
- UCL British Heart Foundation Research Accelerator, London, United Kingdom
| |
Collapse
|
3
|
Zhou C, Cai CP, Huang XT, Wu S, Yu JL, Wu JW, Fang JS, Li GB. TarKG: a comprehensive biomedical knowledge graph for target discovery. Bioinformatics 2024; 40:btae598. [PMID: 39392404 PMCID: PMC11513019 DOI: 10.1093/bioinformatics/btae598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 09/05/2024] [Accepted: 10/09/2024] [Indexed: 10/12/2024] Open
Abstract
MOTIVATION Target discovery is a crucial step in drug development, as it directly affects the success rate of clinical trials. Knowledge graphs (KGs) offer unique advantages in processing complex biological data and inferring new relationships. Existing biomedical KGs primarily focus on tasks such as drug repositioning and drug-target interactions, leaving a gap in the construction of KGs tailored for target discovery. RESULTS We established a comprehensive biomedical KG focusing on target discovery, termed TarKG, by integrating seven existing biomedical KGs, nine public databases, and traditional Chinese medicine knowledge databases. TarKG consists of 1 143 313 entities and 32 806 467 relations across 15 entity categories and 171 relation types, all centered around 3 core entity types: Disease, Gene, and Compound. TarKG provides specialized knowledges for the core entities including chemical structures, protein sequences, or text descriptions. By using different KG embedding algorithms, we assessed the knowledge completion capabilities of TarKG, particularly for disease-target link prediction. In case studies, we further examined TarKG's ability to predict potential protein targets for Alzheimer's disease (AD) and to identify diseases potentially associated with the metallo-deubiquitinase CSN5, using literature analysis for validation. Furthermore, we provided a user-friendly web server (https://tarkg.ddtmlab.org) that enables users to perform knowledge retrieval and relation inference using TarKG. AVAILABILITY AND IMPLEMENTATION TarKG is accessible at https://tarkg.ddtmlab.org.
Collapse
Affiliation(s)
- Cong Zhou
- Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry and Sichuan Province, Department of Medicinal Chemistry, West China School of Pharmacy, Sichuan University, Chengdu 610041, China
| | - Chui-Pu Cai
- Division of Data Intelligence, Department of Computer Science, Shantou University, Shantou 515063, China
| | - Xiao-Tian Huang
- Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry and Sichuan Province, Department of Medicinal Chemistry, West China School of Pharmacy, Sichuan University, Chengdu 610041, China
| | - Song Wu
- Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry and Sichuan Province, Department of Medicinal Chemistry, West China School of Pharmacy, Sichuan University, Chengdu 610041, China
| | - Jun-Lin Yu
- Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry and Sichuan Province, Department of Medicinal Chemistry, West China School of Pharmacy, Sichuan University, Chengdu 610041, China
| | - Jing-Wei Wu
- Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry and Sichuan Province, Department of Medicinal Chemistry, West China School of Pharmacy, Sichuan University, Chengdu 610041, China
| | - Jian-Song Fang
- Science and Technology Innovation Center, Guangzhou University of Chinese Medicine, Guangzhou 510405, China
| | - Guo-Bo Li
- Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry and Sichuan Province, Department of Medicinal Chemistry, West China School of Pharmacy, Sichuan University, Chengdu 610041, China
| |
Collapse
|
4
|
Perdomo-Quinteiro P, Belmonte-Hernández A. Knowledge Graphs for drug repurposing: a review of databases and methods. Brief Bioinform 2024; 25:bbae461. [PMID: 39325460 PMCID: PMC11426166 DOI: 10.1093/bib/bbae461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 08/07/2024] [Accepted: 09/11/2024] [Indexed: 09/27/2024] Open
Abstract
Drug repurposing has emerged as a effective and efficient strategy to identify new treatments for a variety of diseases. One of the most effective approaches for discovering potential new drug candidates involves the utilization of Knowledge Graphs (KGs). This review comprehensively explores some of the most prominent KGs, detailing their structure, data sources, and how they facilitate the repurposing of drugs. In addition to KGs, this paper delves into various artificial intelligence techniques that enhance the process of drug repurposing. These methods not only accelerate the identification of viable drug candidates but also improve the precision of predictions by leveraging complex datasets and advanced algorithms. Furthermore, the importance of explainability in drug repurposing is emphasized. Explainability methods are crucial as they provide insights into the reasoning behind AI-generated predictions, thereby increasing the trustworthiness and transparency of the repurposing process. We will discuss several techniques that can be employed to validate these predictions, ensuring that they are both reliable and understandable.
Collapse
Affiliation(s)
- Pablo Perdomo-Quinteiro
- Grupo de Aplicación de Telecomunicaciones Visuales, Escuela Técnica Superior de Ingenieros de Telecomunicación, Universidad Politécnica de Madrid, Avenida Complutense 30, 28040 Madrid, Spain
| | - Alberto Belmonte-Hernández
- Grupo de Aplicación de Telecomunicaciones Visuales, Escuela Técnica Superior de Ingenieros de Telecomunicación, Universidad Politécnica de Madrid, Avenida Complutense 30, 28040 Madrid, Spain
| |
Collapse
|
5
|
Mag P, Nemes-Terényi M, Jerzsele Á, Mátyus P. Some Aspects and Convergence of Human and Veterinary Drug Repositioning. Molecules 2024; 29:4475. [PMID: 39339469 PMCID: PMC11433938 DOI: 10.3390/molecules29184475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 09/11/2024] [Accepted: 09/18/2024] [Indexed: 09/30/2024] Open
Abstract
Drug innovation traditionally follows a de novo approach with new molecules through a complex preclinical and clinical pathway. In addition to this strategy, drug repositioning has also become an important complementary approach, which can be shorter, cheaper, and less risky. This review provides an overview of drug innovation in both human and veterinary medicine, with a focus on drug repositioning. The evolution of drug repositioning and the effectiveness of this approach are presented, including the growing role of data science and computational modeling methods in identifying drugs with potential for repositioning. Certain business aspects of drug innovation, especially the relevant factors of market exclusivity, are also discussed. Despite the promising potential of drug repositioning for innovation, it remains underutilized, especially in veterinary applications. To change this landscape for mutual benefits of human and veterinary drug innovation, further exploitation of the potency of drug repositioning is necessary through closer cooperation between all stakeholders, academia, industry, pharmaceutical authorities, and innovation policy makers, and the integration of human and veterinary repositioning into a unified innovation space. For this purpose, the establishment of the conceptually new "One Health Drug Repositioning Platform" is proposed. Oncology is one of the disease areas where this platform can significantly support the development of new drugs for human and dog (or other companion animals) anticancer therapies. As an example of the utilization of human and veterinary drugs for veterinary repositioning, the use of COX inhibitors to treat dog cancers is reviewed.
Collapse
Affiliation(s)
- Patrik Mag
- Department of Pharmacology and Toxicology, University of Veterinary Medicine, István Street 2, 1078 Budapest, Hungary
- National Laboratory of Infectious Animal Diseases, Antimicrobial Resistance, Veterinary Public Health and Food Chain Safety, University of Veterinary Medicine, István Street 2, 1078 Budapest, Hungary
| | - Melinda Nemes-Terényi
- Department of Pharmacology and Toxicology, University of Veterinary Medicine, István Street 2, 1078 Budapest, Hungary
- National Laboratory of Infectious Animal Diseases, Antimicrobial Resistance, Veterinary Public Health and Food Chain Safety, University of Veterinary Medicine, István Street 2, 1078 Budapest, Hungary
| | - Ákos Jerzsele
- Department of Pharmacology and Toxicology, University of Veterinary Medicine, István Street 2, 1078 Budapest, Hungary
- National Laboratory of Infectious Animal Diseases, Antimicrobial Resistance, Veterinary Public Health and Food Chain Safety, University of Veterinary Medicine, István Street 2, 1078 Budapest, Hungary
| | - Péter Mátyus
- National Laboratory of Infectious Animal Diseases, Antimicrobial Resistance, Veterinary Public Health and Food Chain Safety, University of Veterinary Medicine, István Street 2, 1078 Budapest, Hungary
| |
Collapse
|
6
|
Zhang Y, Mastouri M, Zhang Y. Accelerating drug discovery, development, and clinical trials by artificial intelligence. MED 2024; 5:1050-1070. [PMID: 39173629 DOI: 10.1016/j.medj.2024.07.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/21/2024] [Accepted: 07/25/2024] [Indexed: 08/24/2024]
Abstract
Artificial intelligence (AI) has profoundly advanced the field of biomedical research, which also demonstrates transformative capacity for innovation in drug development. This paper aims to deliver a comprehensive analysis of the progress in AI-assisted drug development, particularly focusing on small molecules, RNA, and antibodies. Moreover, this paper elucidates the current integration of AI methodologies within the industrial drug development framework. This encompasses a detailed examination of the industry-standard drug development process, supplemented by a review of medications presently undergoing clinical trials. Conclusively, the paper tackles a predominant obstacle within the AI pharmaceutical sector: the absence of AI-conceived drugs receiving approval. This paper also advocates for the adoption of large language models and diffusion models as a viable strategy to surmount this challenge. This review not only underscores the significant potential of AI in drug discovery but also deliberates on the challenges and prospects within this dynamically progressing field.
Collapse
Affiliation(s)
- Yilun Zhang
- College of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China; School of Medicine, The Chinese University of Hong Kong (Shenzhen), Shenzhen, Guangdong, China
| | - Mohamed Mastouri
- College of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China
| | - Yang Zhang
- College of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, China.
| |
Collapse
|
7
|
Steiert D, Wittig C, Banerjee P, Preissner R, Szulcek R. An exploration into CTEPH medications: Combining natural language processing, embedding learning, in vitro models, and real-world evidence for drug repurposing. PLoS Comput Biol 2024; 20:e1012417. [PMID: 39264975 PMCID: PMC11478854 DOI: 10.1371/journal.pcbi.1012417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 10/15/2024] [Accepted: 08/14/2024] [Indexed: 09/14/2024] Open
Abstract
BACKGROUND In the modern era, the growth of scientific literature presents a daunting challenge for researchers to keep informed of advancements across multiple disciplines. OBJECTIVE We apply natural language processing (NLP) and embedding learning concepts to design PubDigest, a tool that combs PubMed literature, aiming to pinpoint potential drugs that could be repurposed. METHODS Using NLP, especially term associations through word embeddings, we explored unrecognized relationships between drugs and diseases. To illustrate the utility of PubDigest, we focused on chronic thromboembolic pulmonary hypertension (CTEPH), a rare disease with an overall limited number of scientific publications. RESULTS Our literature analysis identified key clinical features linked to CTEPH by applying term frequency-inverse document frequency (TF-IDF) scoring, a technique measuring a term's significance in a text corpus. This allowed us to map related diseases. One standout was venous thrombosis (VT), which showed strong semantic links with CTEPH. Looking deeper, we discovered potential repurposing candidates for CTEPH through large-scale neural network-based contextualization of literature and predictive modeling on both the CTEPH and the VT literature corpora to find novel, yet unrecognized associations between the two diseases. Alongside the anti-thrombotic agent caplacizumab, benzofuran derivatives were an intriguing find. In particular, the benzofuran derivative amiodarone displayed potential anti-thrombotic properties in the literature. Our in vitro tests confirmed amiodarone's ability to reduce platelet aggregation significantly by 68% (p = 0.02). However, real-world clinical data indicated that CTEPH patients receiving amiodarone treatment faced a significant 15.9% higher mortality risk (p<0.001). CONCLUSIONS While NLP offers an innovative approach to interpreting scientific literature, especially for drug repurposing, it is crucial to combine it with complementary methods like in vitro testing and real-world evidence. Our exploration with benzofuran derivatives and CTEPH underscores this point. Thus, blending NLP with hands-on experiments and real-world clinical data can pave the way for faster and safer drug repurposing approaches, especially for rare diseases like CTEPH.
Collapse
Affiliation(s)
- Daniel Steiert
- Laboratory of in vitro modeling systems of pulmonary and thrombotic diseases, Institute of Physiology, Charité–Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Corey Wittig
- Laboratory of in vitro modeling systems of pulmonary and thrombotic diseases, Institute of Physiology, Charité–Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Priyanka Banerjee
- Structural Bioinformatics Group, Institute of Physiology, Charité–Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Robert Preissner
- Structural Bioinformatics Group, Institute of Physiology, Charité–Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Robert Szulcek
- Laboratory of in vitro modeling systems of pulmonary and thrombotic diseases, Institute of Physiology, Charité–Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Deutsches Herzzentrum der Charité, Department of Cardiac Anesthesiology and Intensive Care Medicine, Berlin, Germany
| |
Collapse
|
8
|
Johnson R, Li MM, Noori A, Queen O, Zitnik M. Graph Artificial Intelligence in Medicine. Annu Rev Biomed Data Sci 2024; 7:345-368. [PMID: 38749465 PMCID: PMC11344018 DOI: 10.1146/annurev-biodatasci-110723-024625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2024]
Abstract
In clinical artificial intelligence (AI), graph representation learning, mainly through graph neural networks and graph transformer architectures, stands out for its capability to capture intricate relationships and structures within clinical datasets. With diverse data-from patient records to imaging-graph AI models process data holistically by viewing modalities and entities within them as nodes interconnected by their relationships. Graph AI facilitates model transfer across clinical tasks, enabling models to generalize across patient populations without additional parameters and with minimal to no retraining. However, the importance of human-centered design and model interpretability in clinical decision-making cannot be overstated. Since graph AI models capture information through localized neural transformations defined on relational datasets, they offer both an opportunity and a challenge in elucidating model rationale. Knowledge graphs can enhance interpretability by aligning model-driven insights with medical knowledge. Emerging graph AI models integrate diverse data modalities through pretraining, facilitate interactive feedback loops, and foster human-AI collaboration, paving the way toward clinically meaningful predictions.
Collapse
Affiliation(s)
- Ruth Johnson
- Berkowitz Family Living Laboratory, Harvard Medical School, Boston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA;
| | - Michelle M Li
- Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA;
| | - Ayush Noori
- Department of Computer Science, Harvard John A. Paulson School of Engineering and Applied Sciences, Allston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA;
| | - Owen Queen
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA;
| | - Marinka Zitnik
- Harvard Data Science Initiative, Cambridge, Massachusetts, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA;
| |
Collapse
|
9
|
Zhang H, Zhou Y, Zhang Z, Sun H, Pan Z, Mou M, Zhang W, Ye Q, Hou T, Li H, Hsieh CY, Zhu F. Large Language Model-Based Natural Language Encoding Could Be All You Need for Drug Biomedical Association Prediction. Anal Chem 2024. [PMID: 39011990 DOI: 10.1021/acs.analchem.4c01793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
Analyzing drug-related interactions in the field of biomedicine has been a critical aspect of drug discovery and development. While various artificial intelligence (AI)-based tools have been proposed to analyze drug biomedical associations (DBAs), their feature encoding did not adequately account for crucial biomedical functions and semantic concepts, thereby still hindering their progress. Since the advent of ChatGPT by OpenAI in 2022, large language models (LLMs) have demonstrated rapid growth and significant success across various applications. Herein, LEDAP was introduced, which uniquely leveraged LLM-based biotext feature encoding for predicting drug-disease associations, drug-drug interactions, and drug-side effect associations. Benefiting from the large-scale knowledgebase pre-training, LLMs had great potential in drug development analysis owing to their holistic understanding of natural language and human topics. LEDAP illustrated its notable competitiveness in comparison with other popular DBA analysis tools. Specifically, even in simple conjunction with classical machine learning methods, LLM-based feature representations consistently enabled satisfactory performance across diverse DBA tasks like binary classification, multiclass classification, and regression. Our findings underpinned the considerable potential of LLMs in drug development research, indicating a catalyst for further progress in related fields.
Collapse
Affiliation(s)
- Hanyu Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, State Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Yuan Zhou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Zhichao Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Huaicheng Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Wei Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Qing Ye
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Honglin Li
- Innovation Center for AI and Drug Discovery, East China Normal University, Shanghai 200062, China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, State Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
10
|
Yang Y, Yu K, Gao S, Yu S, Xiong D, Qin C, Chen H, Tang J, Tang N, Zhu H. Alzheimer's Disease Knowledge Graph Enhances Knowledge Discovery and Disease Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.03.601339. [PMID: 39005357 PMCID: PMC11245034 DOI: 10.1101/2024.07.03.601339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Background Alzheimer's disease (AD), a progressive neurodegenerative disorder, continues to increase in prevalence without any effective treatments to date. In this context, knowledge graphs (KGs) have emerged as a pivotal tool in biomedical research, offering new perspectives on drug repurposing and biomarker discovery by analyzing intricate network structures. Our study seeks to build an AD-specific knowledge graph, highlighting interactions among AD, genes, variants, chemicals, drugs, and other diseases. The goal is to shed light on existing treatments, potential targets, and diagnostic methods for AD, thereby aiding in drug repurposing and the identification of biomarkers. Results We annotated 800 PubMed abstracts and leveraged GPT-4 for text augmentation to enrich our training data for named entity recognition (NER) and relation classification. A comprehensive data mining model, integrating NER and relationship classification, was trained on the annotated corpus. This model was subsequently applied to extract relation triplets from unannotated abstracts. To enhance entity linking, we utilized a suite of reference biomedical databases and refine the linking accuracy through abbreviation resolution. As a result, we successfully identified 3,199,276 entity mentions and 633,733 triplets, elucidating connections between 5,000 unique entities. These connections were pivotal in constructing a comprehensive Alzheimer's Disease Knowledge Graph (ADKG). We also integrated the ADKG constructed after entity linking with other biomedical databases. The ADKG served as a training ground for Knowledge Graph Embedding models with the high-ranking predicted triplets supported by evidence, underscoring the utility of ADKG in generating testable scientific hypotheses. Further application of ADKG in predictive modeling using the UK Biobank data revealed models based on ADKG outperforming others, as evidenced by higher values in the areas under the receiver operating characteristic (ROC) curves. Conclusion The ADKG is a valuable resource for generating hypotheses and enhancing predictive models, highlighting its potential to advance AD's disease research and treatment strategies.
Collapse
Affiliation(s)
- Yue Yang
- Department of Biostatistics, University of North Carolina at Chapel Hill
| | - Kaixian Yu
- Independent Researcher, Shanghai, P.R. China
| | - Shan Gao
- Department of Mathematics and Statistics, Yunnan University
| | - Sheng Yu
- Center for Statistics Science, Tsinghua University
| | - Di Xiong
- Department of Statistics, Shanghai University
| | - Chuanyang Qin
- Department of Mathematics and Statistics, Yunnan University
| | - Huiyuan Chen
- Department of Mathematics and Statistics, Yunnan University
| | - Jiarui Tang
- Department of Biostatistics, University of North Carolina at Chapel Hill
| | - Niansheng Tang
- Department of Mathematics and Statistics, Yunnan University
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill
| |
Collapse
|
11
|
Du X, Sun X, Li M. Knowledge Graph Convolutional Network with Heuristic Search for Drug Repositioning. J Chem Inf Model 2024; 64:4928-4937. [PMID: 38837744 DOI: 10.1021/acs.jcim.4c00737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2024]
Abstract
Drug repositioning is a strategy of repurposing approved drugs for treating new indications, which can accelerate the drug discovery process, reduce development costs, and lower the safety risk. The advancement of biotechnology has significantly accelerated the speed and scale of biological data generation, offering significant potential for drug repositioning through biomedical knowledge graphs that integrate diverse entities and relations from various biomedical sources. To fully learn the semantic information and topological structure information from the biological knowledge graph, we propose a knowledge graph convolutional network with a heuristic search, named KGCNH, which can effectively utilize the diversity of entities and relationships in biological knowledge graphs, as well as topological structure information, to predict the associations between drugs and diseases. Specifically, we design a relation-aware attention mechanism to compute the attention scores for each neighboring entity of a given entity under different relations. To address the challenge of randomness of the initial attention scores potentially impacting model performance and to expand the search scope of the model, we designed a heuristic search module based on Gumbel-Softmax, which uses attention scores as heuristic information and introduces randomness to assist the model in exploring more optimal embeddings of drugs and diseases. Following this module, we derive the relation weights, obtain the embeddings of drugs and diseases through neighborhood aggregation, and then predict drug-disease associations. Additionally, we employ feature-based augmented views to enhance model robustness and mitigate overfitting issues. We have implemented our method and conducted experiments on two data sets. The results demonstrate that KGCNH outperforms competing methods. In particular, case studies on lithium and quetiapine confirm that KGCNH can retrieve more actual drug-disease associations in the top prediction results.
Collapse
Affiliation(s)
- Xiang Du
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
- School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
| | - Xinliang Sun
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| |
Collapse
|
12
|
Chapman MA, Sorg BA. A Systematic Review of Extracellular Matrix-Related Alterations in Parkinson's Disease. Brain Sci 2024; 14:522. [PMID: 38928523 PMCID: PMC11201521 DOI: 10.3390/brainsci14060522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 05/13/2024] [Accepted: 05/15/2024] [Indexed: 06/28/2024] Open
Abstract
The role of the extracellular matrix (ECM) in Parkinson's disease (PD) is not well understood, even though it is critical for neuronal structure and signaling. This systematic review identified the top deregulated ECM-related pathways in studies that used gene set enrichment analyses (GSEA) to document transcriptomic, proteomic, or genomic alterations in PD. PubMed and Google scholar were searched for transcriptomics, proteomics, or genomics studies that employed GSEA on data from PD tissues or cells and reported ECM-related pathways among the top-10 most enriched versus controls. Twenty-seven studies were included, two of which used multiple omics analyses. Transcriptomics and proteomics studies were conducted on a variety of tissue and cell types. Of the 17 transcriptomics studies (16 data sets), 13 identified one or more adhesion pathways in the top-10 deregulated gene sets or pathways, primarily related to cell adhesion and focal adhesion. Among the 8 proteomics studies, 5 identified altered overarching ECM gene sets or pathways among the top 10. Among the 4 genomics studies, 3 identified focal adhesion pathways among the top 10. The findings summarized here suggest that ECM organization/structure and cell adhesion (particularly focal adhesion) are altered in PD and should be the focus of future studies.
Collapse
Affiliation(s)
| | - Barbara A. Sorg
- R.S. Dow Neurobiology, Legacy Research Institute, Portland, OR 97232, USA;
| |
Collapse
|
13
|
Labarga A, Martínez-Gonzalez J, Barajas M. Integrative Multi-Omics Analysis for Etiology Classification and Biomarker Discovery in Stroke: Advancing towards Precision Medicine. BIOLOGY 2024; 13:338. [PMID: 38785820 PMCID: PMC11149453 DOI: 10.3390/biology13050338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 05/02/2024] [Accepted: 05/06/2024] [Indexed: 05/25/2024]
Abstract
Recent advancements in high-throughput omics technologies have opened new avenues for investigating stroke at the molecular level and elucidating the intricate interactions among various molecular components. We present a novel approach for multi-omics data integration on knowledge graphs and have applied it to a stroke etiology classification task of 30 stroke patients through the integrative analysis of DNA methylation and mRNA, miRNA, and circRNA. This approach has demonstrated promising performance as compared to other existing single technology approaches.
Collapse
Affiliation(s)
- Alberto Labarga
- Health Science Department, Public University of Navarra, 31006 Pamplona, Spain;
| | | | - Miguel Barajas
- Health Science Department, Public University of Navarra, 31006 Pamplona, Spain;
| |
Collapse
|
14
|
Di Maria A, Bellomo L, Billeci F, Cardillo A, Alaimo S, Ferragina P, Ferro A, Pulvirenti A. NetMe 2.0: a web-based platform for extracting and modeling knowledge from biomedical literature as a labeled graph. Bioinformatics 2024; 40:btae194. [PMID: 38597890 PMCID: PMC11074003 DOI: 10.1093/bioinformatics/btae194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/29/2024] [Accepted: 04/08/2024] [Indexed: 04/11/2024] Open
Abstract
MOTIVATION The rapid increase of bio-medical literature makes it harder and harder for scientists to keep pace with the discoveries on which they build their studies. Therefore, computational tools have become more widespread, among which network analysis plays a crucial role in several life-science contexts. Nevertheless, building correct and complete networks about some user-defined biomedical topics on top of the available literature is still challenging. RESULTS We introduce NetMe 2.0, a web-based platform that automatically extracts relevant biomedical entities and their relations from a set of input texts-i.e. in the form of full-text or abstract of PubMed Central's papers, free texts, or PDFs uploaded by users-and models them as a BioMedical Knowledge Graph (BKG). NetMe 2.0 also implements an innovative Retrieval Augmented Generation module (Graph-RAG) that works on top of the relationships modeled by the BKG and allows the distilling of well-formed sentences that explain their content. The experimental results show that NetMe 2.0 can infer comprehensive and reliable biological networks with significant Precision-Recall metrics when compared to state-of-the-art approaches. AVAILABILITY AND IMPLEMENTATION https://netme.click/.
Collapse
Affiliation(s)
- Antonio Di Maria
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| | | | - Fabrizio Billeci
- Department of Computer Science, University of Catania, Catania, 95125, Italy
| | - Alfio Cardillo
- Department of Computer Science, University of Catania, Catania, 95125, Italy
| | - Salvatore Alaimo
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| | - Paolo Ferragina
- Department of Computer Science, University of Pisa, Pisa, 56126 , Italy
| | - Alfredo Ferro
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
| |
Collapse
|
15
|
Wei H, Gao L, Wu S, Jiang Y, Liu B. DiSMVC: a multi-view graph collaborative learning framework for measuring disease similarity. Bioinformatics 2024; 40:btae306. [PMID: 38715444 PMCID: PMC11256965 DOI: 10.1093/bioinformatics/btae306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/19/2024] [Accepted: 05/05/2024] [Indexed: 05/30/2024] Open
Abstract
MOTIVATION Exploring potential associations between diseases can help in understanding pathological mechanisms of diseases and facilitating the discovery of candidate biomarkers and drug targets, thereby promoting disease diagnosis and treatment. Some computational methods have been proposed for measuring disease similarity. However, these methods describe diseases without considering their latent multi-molecule regulation and valuable supervision signal, resulting in limited biological interpretability and efficiency to capture association patterns. RESULTS In this study, we propose a new computational method named DiSMVC. Different from existing predictors, DiSMVC designs a supervised graph collaborative framework to measure disease similarity. Multiple bio-entity associations related to genes and miRNAs are integrated via cross-view graph contrastive learning to extract informative disease representation, and then association pattern joint learning is implemented to compute disease similarity by incorporating phenotype-annotated disease associations. The experimental results show that DiSMVC can draw discriminative characteristics for disease pairs, and outperform other state-of-the-art methods. As a result, DiSMVC is a promising method for predicting disease associations with molecular interpretability. AVAILABILITY AND IMPLEMENTATION Datasets and source codes are available at https://github.com/Biohang/DiSMVC.
Collapse
Affiliation(s)
- Hang Wei
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710126, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710126, China
| | - Shuai Wu
- School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710126, China
| | - Yina Jiang
- Department of Basic Medicine, Shaanxi University of Chinese Medicine, Xianyang, Shaanxi 712046, China
| | - Bin Liu
- Faculty of Engineering, Shenzhen MSU-BIT University, Shenzhen, Guangdong 518172, China
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
16
|
Patidar K, Deng JH, Mitchell CS, Ford Versypt AN. Cross-Domain Text Mining of Pathophysiological Processes Associated with Diabetic Kidney Disease. Int J Mol Sci 2024; 25:4503. [PMID: 38674089 PMCID: PMC11050166 DOI: 10.3390/ijms25084503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 04/16/2024] [Accepted: 04/17/2024] [Indexed: 04/28/2024] Open
Abstract
Diabetic kidney disease (DKD) is the leading cause of end-stage renal disease worldwide. This study's goal was to identify the signaling drivers and pathways that modulate glomerular endothelial dysfunction in DKD via artificial intelligence-enabled literature-based discovery. Cross-domain text mining of 33+ million PubMed articles was performed with SemNet 2.0 to identify and rank multi-scalar and multi-factorial pathophysiological concepts related to DKD. A set of identified relevant genes and proteins that regulate different pathological events associated with DKD were analyzed and ranked using normalized mean HeteSim scores. High-ranking genes and proteins intersected three domains-DKD, the immune response, and glomerular endothelial cells. The top 10% of ranked concepts were mapped to the following biological functions: angiogenesis, apoptotic processes, cell adhesion, chemotaxis, growth factor signaling, vascular permeability, the nitric oxide response, oxidative stress, the cytokine response, macrophage signaling, NFκB factor activity, the TLR pathway, glucose metabolism, the inflammatory response, the ERK/MAPK signaling response, the JAK/STAT pathway, the T-cell-mediated response, the WNT/β-catenin pathway, the renin-angiotensin system, and NADPH oxidase activity. High-ranking genes and proteins were used to generate a protein-protein interaction network. The study results prioritized interactions or molecules involved in dysregulated signaling in DKD, which can be further assessed through biochemical network models or experiments.
Collapse
Affiliation(s)
- Krutika Patidar
- Department of Chemical and Biological Engineering, University at Buffalo, Buffalo, NY 14260, USA
| | - Jennifer H. Deng
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Cassie S. Mitchell
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Center for Machine Learning at Georgia Tech, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Ashlee N. Ford Versypt
- Department of Chemical and Biological Engineering, University at Buffalo, Buffalo, NY 14260, USA
- Department of Biomedical Engineering, University at Buffalo, Buffalo, NY 14260, USA
- Institute for Artificial Intelligence and Data Science, University at Buffalo, Buffalo, NY 14260, USA
| |
Collapse
|
17
|
Rani N, Kaushik A, Kardam S, Kag S, Raj VS, Ambasta RK, Kumar P. Reimagining old drugs with new tricks: Mechanisms, strategies and notable success stories in drug repurposing for neurological diseases. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2024; 205:23-70. [PMID: 38789181 DOI: 10.1016/bs.pmbts.2024.03.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
Recent evolution in drug repurposing has brought new anticipation, especially in the conflict against neurodegenerative diseases (NDDs). The traditional approach to developing novel drugs for these complex disorders is laborious, time-consuming, and often abortive. However, drug reprofiling which is the implementation of illuminating novel therapeutic applications of existing approved drugs, has shown potential as a promising strategy to accelerate the hunt for therapeutics. The advancement of computational approaches and artificial intelligence has expedited drug repurposing. These progressive technologies have enabled scientists to analyse extensive datasets and predict potential drug-disease interactions. By prospecting into the existing pharmacological knowledge, scientists can recognise potential therapeutic candidates for reprofiling, saving precious time and resources. Preclinical models have also played a pivotal role in this field, confirming the effectiveness and mechanisms of action of repurposed drugs. Several studies have occurred in recent years, including the discovery of available drugs that demonstrate significant protective effects in NDDs, relieve debilitating symptoms, or slow down the progression of the disease. These findings highlight the potential of repurposed drugs to change the landscape of NDD treatment. Here, we present an overview of recent developments and major advances in drug repurposing intending to provide an in-depth analysis of traditional drug discovery and the strategies, approaches and technologies that have contributed to drug repositioning. In addition, this chapter attempts to highlight successful case studies of drug repositioning in various therapeutic areas related to NDDs and explore the clinical trials, challenges and limitations faced by researchers in the field. Finally, the importance of drug repositioning in drug discovery and development and its potential to address discontented medical needs is also highlighted.
Collapse
Affiliation(s)
- Neetu Rani
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University, Delhi, India
| | - Aastha Kaushik
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University, Delhi, India
| | - Shefali Kardam
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University, Delhi, India
| | - Sonika Kag
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University, Delhi, India
| | - V Samuel Raj
- Department of Biotechnology and Microbiology, SRM University, Sonepat, Haryana, India
| | - Rashmi K Ambasta
- Department of Biotechnology and Microbiology, SRM University, Sonepat, Haryana, India
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University, Delhi, India.
| |
Collapse
|
18
|
Yang C, Chen X, Huang J, An Y, Huang Z, Sun Y. A few-shot link prediction framework to drug repurposing using multi-level attention network. Comput Biol Med 2024; 170:107936. [PMID: 38244473 DOI: 10.1016/j.compbiomed.2024.107936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/05/2023] [Accepted: 01/01/2024] [Indexed: 01/22/2024]
Abstract
Drug repurposing is a strategy aiming at uncovering novel medical indications of approved drugs. This process of discovery can be effectively represented as a link prediction task within a medical knowledge graph by predicting the missing relation between the disease entity and the drug entity. Typically, the links to be predicted pertain to rare types, thereby necessitating the task of few-shot link prediction. However, the sparsity of neighborhood information and weak triplet interactions result in less effective representations, which brings great challenges to the few-shot link prediction. Therefore, in this paper, we proposed a meta-learning framework based on a multi-level attention network (MLAN) to capture valuable information in the few-shot scenario for drug repurposing. First, the proposed method utilized a gating mechanism and a graph attention network to effectively filter noise information and highlight the valuable neighborhood information, respectively. Second, the proposed commonality relation learner, employing a set transformer, effectively captured triplet-level interactions while remaining insensitive to the size of the support set. Finally, a model-agnostic meta-learning training strategy was employed to optimize the model quickly on each meta task. We conducted validation of the proposed method on two datasets specifically designed for few-shot link prediction in medical field: COVID19-One and BIOKG-One. Experimental results showed that the proposed model had significant advantages over state-of-the-art few-shot link prediction methods. Results also highlighted the valuable insights of the proposed method, which successfully integrated the components within a unified meta-learning framework for drug repurposing.
Collapse
Affiliation(s)
- Chenglin Yang
- Big Data Institute, Central South University, Changsha, 410083, China; School of Life Sciences, Central South University, Changsha, 410083, China
| | - Xianlai Chen
- Big Data Institute, Central South University, Changsha, 410083, China.
| | - Jincai Huang
- Big Data Institute, Central South University, Changsha, 410083, China.
| | - Ying An
- Big Data Institute, Central South University, Changsha, 410083, China
| | - Zhenyu Huang
- Big Data Institute, Central South University, Changsha, 410083, China
| | - Yu Sun
- Big Data Institute, Central South University, Changsha, 410083, China
| |
Collapse
|
19
|
Jeong D, Koo B, Oh M, Kim TB, Kim S. GOAT: Gene-level biomarker discovery from multi-Omics data using graph ATtention neural network for eosinophilic asthma subtype. Bioinformatics 2023; 39:btad582. [PMID: 37740295 PMCID: PMC10547929 DOI: 10.1093/bioinformatics/btad582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 08/21/2023] [Accepted: 09/20/2023] [Indexed: 09/24/2023] Open
Abstract
MOTIVATION Asthma is a heterogeneous disease where various subtypes are established and molecular biomarkers of the subtypes are yet to be discovered. Recent availability of multi-omics data paved a way to discover molecular biomarkers for the subtypes. However, multi-omics biomarker discovery is challenging because of the complex interplay between different omics layers. RESULTS We propose a deep attention model named Gene-level biomarker discovery from multi-Omics data using graph ATtention neural network (GOAT) for identifying molecular biomarkers for eosinophilic asthma subtypes with multi-omics data. GOAT identifies genes that discriminate subtypes using a graph neural network by modeling complex interactions among genes as the attention mechanism in the deep learning model. In experiments with multi-omics profiles of the COREA (Cohort for Reality and Evolution of Adult Asthma in Korea) asthma cohort of 300 patients, GOAT outperforms existing models and suggests interpretable biological mechanisms underlying asthma subtypes. Importantly, GOAT identified genes that are distinct only in terms of relationship with other genes through attention. To better understand the role of biomarkers, we further investigated two transcription factors, CTNNB1 and JUN, captured by GOAT. We were successful in showing the role of the transcription factors in eosinophilic asthma pathophysiology in a network propagation and transcriptional network analysis, which were not distinct in terms of gene expression level differences. AVAILABILITY AND IMPLEMENTATION Source code is available https://github.com/DabinJeong/Multi-omics_biomarker. The preprocessed data underlying this article is accessible in data folder of the github repository. Raw data are available in Multi-Omics Platform at http://203.252.206.90:5566/, and it can be accessible when requested.
Collapse
Affiliation(s)
- Dabin Jeong
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Republic of Korea
| | - Bonil Koo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Republic of Korea
- AIGENDRUG Co., Ltd, Seoul 08826, Republic of Korea
| | - Minsik Oh
- School of Software Convergence, Myongji University, Seoul 03674, Republic of Korea
| | - Tae-Bum Kim
- Department of Allergy and Clinical Immunology, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Republic of Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Republic of Korea
- AIGENDRUG Co., Ltd, Seoul 08826, Republic of Korea
- Department of Computer Science and Engineering, Seoul National University, Seoul 08826, Republic of Korea
- Interdisciplinary Program in Artificial Intelligence,, Seoul National University, Seoul 08826, Republic of Korea
| |
Collapse
|
20
|
Gu J, Bang D, Yi J, Lee S, Kim DK, Kim S. A model-agnostic framework to enhance knowledge graph-based drug combination prediction with drug-drug interaction data and supervised contrastive learning. Brief Bioinform 2023; 24:bbad285. [PMID: 37544660 DOI: 10.1093/bib/bbad285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 07/05/2023] [Accepted: 07/21/2023] [Indexed: 08/08/2023] Open
Abstract
Combination therapies have brought significant advancements to the treatment of various diseases in the medical field. However, searching for effective drug combinations remains a major challenge due to the vast number of possible combinations. Biomedical knowledge graph (KG)-based methods have shown potential in predicting effective combinations for wide spectrum of diseases, but the lack of credible negative samples has limited the prediction performance of machine learning models. To address this issue, we propose a novel model-agnostic framework that leverages existing drug-drug interaction (DDI) data as a reliable negative dataset and employs supervised contrastive learning (SCL) to transform drug embedding vectors to be more suitable for drug combination prediction. We conducted extensive experiments using various network embedding algorithms, including random walk and graph neural networks, on a biomedical KG. Our framework significantly improved performance metrics compared to the baseline framework. We also provide embedding space visualizations and case studies that demonstrate the effectiveness of our approach. This work highlights the potential of using DDI data and SCL in finding tighter decision boundaries for predicting effective drug combinations.
Collapse
Affiliation(s)
- Jeonghyeon Gu
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, 1, Gwanak-ro, 08826 Seoul, Republic of Korea
| | - Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, 1, Gwanak-ro, 08826 Seoul, Republic of Korea
- AIGENDRUG Co., Ltd., 1, Gwanak-ro, 08826 Seoul, Republic of Korea
| | - Jungseob Yi
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, 1, Gwanak-ro, 08826 Seoul, Republic of Korea
| | - Sangseon Lee
- Institute of Computer Technology Seoul National University, 1, Gwanak-ro, 08826 Seoul, Republic of Korea
| | - Dong Kyu Kim
- PHARMGENSCIENCE Co., Ltd., 216, Dongjak-daero, 06554 Seoul, Republic of Korea
| | - Sun Kim
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, 1, Gwanak-ro, 08826 Seoul, Republic of Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, 1, Gwanak-ro, 08826 Seoul, Republic of Korea
- Department of Computer Science and Engineering, Seoul National University, 1, Gwanak-ro, 08826 Seoul, Republic of Korea
- AIGENDRUG Co., Ltd., 1, Gwanak-ro, 08826 Seoul, Republic of Korea
- Institute of Computer Technology, Seoul National University, 1, Gwanak-ro, 08826 Seoul, Republic of Korea
| |
Collapse
|