1
|
Messa L, Testa C, Carelli S, Rey F, Jacchetti E, Cereda C, Raimondi MT, Ceri S, Pinoli P. Non-Negative Matrix Tri-Factorization for Representation Learning in Multi-Omics Datasets with Applications to Drug Repurposing and Selection. Int J Mol Sci 2024; 25:9576. [PMID: 39273521 PMCID: PMC11394968 DOI: 10.3390/ijms25179576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 08/18/2024] [Accepted: 08/20/2024] [Indexed: 09/15/2024] Open
Abstract
The vast corpus of heterogeneous biomedical data stored in databases, ontologies, and terminologies presents a unique opportunity for drug design. Integrating and fusing these sources is essential to develop data representations that can be analyzed using artificial intelligence methods to generate novel drug candidates or hypotheses. Here, we propose Non-Negative Matrix Tri-Factorization as an invaluable tool for integrating and fusing data, as well as for representation learning. Additionally, we demonstrate how representations learned by Non-Negative Matrix Tri-Factorization can effectively be utilized by traditional artificial intelligence methods. While this approach is domain-agnostic and applicable to any field with vast amounts of structured and semi-structured data, we apply it specifically to computational pharmacology and drug repurposing. This field is poised to benefit significantly from artificial intelligence, particularly in personalized medicine. We conducted extensive experiments to evaluate the performance of the proposed method, yielding exciting results, particularly compared to traditional methods. Novel drug-target predictions have also been validated in the literature, further confirming their validity. Additionally, we tested our method to predict drug synergism, where constructing a classical matrix dataset is challenging. The method demonstrated great flexibility, suggesting its applicability to a wide range of tasks in drug design and discovery.
Collapse
Affiliation(s)
- Letizia Messa
- Department of Electronics, Information and Bioengineering (DEIB), Politecnico di Milano, 20133 Milan, Italy
| | - Carolina Testa
- Department of Electronics, Information and Bioengineering (DEIB), Politecnico di Milano, 20133 Milan, Italy
| | - Stephana Carelli
- Center of Functional Genomics and Rare Diseases, Buzzi Children's Hospital, 20154 Milan, Italy
- Pediatric Clinical Research Center "Fondazione Romeo ed Enrica Invernizzi", Department of Biomedical and Clinical Sciences, Università degli Studi di Milano, 20157 Milan, Italy
| | - Federica Rey
- Pediatric Clinical Research Center "Fondazione Romeo ed Enrica Invernizzi", Department of Biomedical and Clinical Sciences, Università degli Studi di Milano, 20157 Milan, Italy
| | - Emanuela Jacchetti
- Department of Chemistry, Materials and Chemical Engineering "Giulio Natta", Politecnico di Milano, 20133 Milan, Italy
| | - Cristina Cereda
- Center of Functional Genomics and Rare Diseases, Buzzi Children's Hospital, 20154 Milan, Italy
| | - Manuela Teresa Raimondi
- Department of Chemistry, Materials and Chemical Engineering "Giulio Natta", Politecnico di Milano, 20133 Milan, Italy
| | - Stefano Ceri
- Department of Electronics, Information and Bioengineering (DEIB), Politecnico di Milano, 20133 Milan, Italy
| | - Pietro Pinoli
- Department of Electronics, Information and Bioengineering (DEIB), Politecnico di Milano, 20133 Milan, Italy
| |
Collapse
|
2
|
Yang JJ, Goff A, Wild DJ, Ding Y, Annis A, Kerber R, Foote B, Passi A, Duerksen JL, London S, Puhl AC, Lane TR, Braunstein M, Waddell SJ, Ekins S. Computational drug repositioning identifies niclosamide and tribromsalan as inhibitors of Mycobacterium tuberculosis and Mycobacterium abscessus. Tuberculosis (Edinb) 2024; 146:102500. [PMID: 38432118 PMCID: PMC10978224 DOI: 10.1016/j.tube.2024.102500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/20/2024] [Accepted: 02/24/2024] [Indexed: 03/05/2024]
Abstract
Tuberculosis (TB) is still a major global health challenge, killing over 1.5 million people each year, and hence, there is a need to identify and develop novel treatments for Mycobacterium tuberculosis (M. tuberculosis). The prevalence of infections caused by nontuberculous mycobacteria (NTM) is also increasing and has overtaken TB cases in the United States and much of the developed world. Mycobacterium abscessus (M. abscessus) is one of the most frequently encountered NTM and is difficult to treat. We describe the use of drug-disease association using a semantic knowledge graph approach combined with machine learning models that has enabled the identification of several molecules for testing anti-mycobacterial activity. We established that niclosamide (M. tuberculosis IC90 2.95 μM; M. abscessus IC90 59.1 μM) and tribromsalan (M. tuberculosis IC90 76.92 μM; M. abscessus IC90 147.4 μM) inhibit M. tuberculosis and M. abscessus in vitro. To investigate the mode of action, we determined the transcriptional response of M. tuberculosis and M. abscessus to both compounds in axenic log phase, demonstrating a broad effect on gene expression that differed from known M. tuberculosis inhibitors. Both compounds elicited transcriptional responses indicative of respiratory pathway stress and the dysregulation of fatty acid metabolism.
Collapse
Affiliation(s)
- Jeremy J Yang
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA; Data2Discovery, Inc., Bloomington, IN, USA; Department of Internal Medicine Translational Informatics Division, University of New Mexico, Albuquerque, NM, USA
| | - Aaron Goff
- Department of Global Health and Infection, Brighton & Sussex Medical School, University of Sussex, UK
| | - David J Wild
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA; Data2Discovery, Inc., Bloomington, IN, USA
| | - Ying Ding
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA; Data2Discovery, Inc., Bloomington, IN, USA; School of Information, Dell Medical School, University of Texas, Austin, TX, USA
| | - Ayano Annis
- Department of Microbiology and Immunology, School of Medicine, University of North Carolina at Chapel Hill, NC, 27599, USA
| | | | | | - Anurag Passi
- Department of Pediatrics, UC San Diego, San Diego, CA, USA
| | | | | | - Ana C Puhl
- Collaborations Pharmaceuticals Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Thomas R Lane
- Collaborations Pharmaceuticals Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Miriam Braunstein
- Department of Microbiology and Immunology, School of Medicine, University of North Carolina at Chapel Hill, NC, 27599, USA
| | - Simon J Waddell
- Department of Global Health and Infection, Brighton & Sussex Medical School, University of Sussex, UK
| | - Sean Ekins
- Collaborations Pharmaceuticals Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA.
| |
Collapse
|
3
|
Sun G, Dong D, Dong Z, Zhang Q, Fang H, Wang C, Zhang S, Wu S, Dong Y, Wan Y. Drug repositioning: A bibliometric analysis. Front Pharmacol 2022; 13:974849. [PMID: 36225586 PMCID: PMC9549161 DOI: 10.3389/fphar.2022.974849] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 08/12/2022] [Indexed: 11/14/2022] Open
Abstract
Drug repurposing has become an effective approach to drug discovery, as it offers a new way to explore drugs. Based on the Science Citation Index Expanded (SCI-E) and Social Sciences Citation Index (SSCI) databases of the Web of Science core collection, this study presents a bibliometric analysis of drug repurposing publications from 2010 to 2020. Data were cleaned, mined, and visualized using Derwent Data Analyzer (DDA) software. An overview of the history and development trend of the number of publications, major journals, major countries, major institutions, author keywords, major contributors, and major research fields is provided. There were 2,978 publications included in the study. The findings show that the United States leads in this area of research, followed by China, the United Kingdom, and India. The Chinese Academy of Science published the most research studies, and NIH ranked first on the h-index. The Icahn School of Medicine at Mt Sinai leads in the average number of citations per study. Sci Rep, Drug Discov. Today, and Brief. Bioinform. are the three most productive journals evaluated from three separate perspectives, and pharmacology and pharmacy are unquestionably the most commonly used subject categories. Cheng, FX; Mucke, HAM; and Butte, AJ are the top 20 most prolific and influential authors. Keyword analysis shows that in recent years, most research has focused on drug discovery/drug development, COVID-19/SARS-CoV-2/coronavirus, molecular docking, virtual screening, cancer, and other research areas. The hotspots have changed in recent years, with COVID-19/SARS-CoV-2/coronavirus being the most popular topic for current drug repurposing research.
Collapse
Affiliation(s)
- Guojun Sun
- Institute of Pharmaceutical Preparations, Department of Pharmacy, Zhejiang University of Technology, Hangzhou, China
| | - Dashun Dong
- Institute of Pharmaceutical Preparations, Department of Pharmacy, Zhejiang University of Technology, Hangzhou, China
| | - Zuojun Dong
- Institute of Pharmaceutical Preparations, Department of Pharmacy, Zhejiang University of Technology, Hangzhou, China
| | - Qian Zhang
- Institute of Pharmaceutical Preparations, Department of Pharmacy, Zhejiang University of Technology, Hangzhou, China
| | - Hui Fang
- Institute of Information Resource, Zhejiang University of Technology, Hangzhou, China
| | - Chaojun Wang
- Hangzhou Aeronautical Sanatorium for Special Service of Chinese Air Force, Hangzhou, China
| | - Shaoya Zhang
- Institute of Pharmaceutical Preparations, Department of Pharmacy, Zhejiang University of Technology, Hangzhou, China
| | - Shuaijun Wu
- Institute of Pharmaceutical Preparations, Department of Pharmacy, Zhejiang University of Technology, Hangzhou, China
| | - Yichen Dong
- Faculty of Chinese Medicine, Macau University of Science and Technology, Macau, China
| | - Yuehua Wan
- Institute of Information Resource, Zhejiang University of Technology, Hangzhou, China
| |
Collapse
|
4
|
Computational Methods for Drug Repurposing. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1361:119-141. [PMID: 35230686 DOI: 10.1007/978-3-030-91836-1_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The wealth of knowledge and multi-omics data available in drug research has allowed the rise of several computational methods in the drug discovery field, resulting in a novel and exciting strategy called drug repurposing. Drug repurposing consists in finding new applications for existing drugs. Numerous computational methods perform a high-level integration of different knowledge sources to facilitate the discovery of unknown mechanisms. In this chapter, we present a survey of data resources and computational tools available for drug repositioning.
Collapse
|
5
|
Roberti A, Chaffey LE, Greaves DR. NF-κB Signaling and Inflammation-Drug Repurposing to Treat Inflammatory Disorders? BIOLOGY 2022; 11:372. [PMID: 35336746 PMCID: PMC8945680 DOI: 10.3390/biology11030372] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 02/12/2022] [Accepted: 02/15/2022] [Indexed: 12/15/2022]
Abstract
NF-κB is a central mediator of inflammation, response to DNA damage and oxidative stress. As a result of its central role in so many important cellular processes, NF-κB dysregulation has been implicated in the pathology of important human diseases. NF-κB activation causes inappropriate inflammatory responses in diseases including rheumatoid arthritis (RA) and multiple sclerosis (MS). Thus, modulation of NF-κB signaling is being widely investigated as an approach to treat chronic inflammatory diseases, autoimmunity and cancer. The emergence of COVID-19 in late 2019, the subsequent pandemic and the huge clinical burden of patients with life-threatening SARS-CoV-2 pneumonia led to a massive scramble to repurpose existing medicines to treat lung inflammation in a wide range of healthcare systems. These efforts continue and have proven to be controversial. Drug repurposing strategies are a promising alternative to de novo drug development, as they minimize drug development timelines and reduce the risk of failure due to unexpected side effects. Different experimental approaches have been applied to identify existing medicines which inhibit NF-κB that could be repurposed as anti-inflammatory drugs.
Collapse
Affiliation(s)
| | | | - David R. Greaves
- Sir William Dunn School of Pathology, University of Oxford, South Parks Road, Oxford OX1 3RE, UK; (A.R.); (L.E.C.)
| |
Collapse
|
6
|
Selvaraj N, Swaroop AK, Nidamanuri BSS, Kumar R R, Natarajan J, Selvaraj J. Network-based drug repurposing: A critical review. Curr Drug Res Rev 2022; 14:116-131. [PMID: 35156575 DOI: 10.2174/2589977514666220214120403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 11/17/2021] [Accepted: 11/30/2021] [Indexed: 11/22/2022]
Abstract
New drug development for a disease is a tedious time taking, complex and expensive process. Even if it is done, still the chances for success of newly developed drugs are very low. Modern reports state that repurposing the pre-existing drugs will have more efficient functioning than newly developed drugs. This repurposing process will save time, reduce expenses and provide more success rate. The only limitation for this repurposing is getting a desired pharmacological and characteristic parameter of various drugs from vast data available about a huge number of drugs, their effects, and target mechanisms. This drawback can be avoided by introducing computational methods of analysis. This includes various network analysis types that use various biological processes and relationships with various drugs to make data interpretation a simple process. Some of the data sets now available in standard and simplified forms include gene expression, drug-target interactions, protein networks, electronic health records, clinical trial results, and drug adverse event reports. Integrating various data sets and interpretation methods gives way for a more efficient and easy way to repurpose an exact drug for desired target and effect. In this review, we are going to discuss briefly various computational biological network analysis methods like gene regulatory networks, metabolic networks, protein-protein interaction networks, drug-target interaction networks, drug-disease association networks, drug-drug interaction networks, drug-side effects networks, integrated network-based methods, semantic link networks, and isoform-isoform networks. Along with these, we have also briefly presented limitations, predicting methods, data sets used of various biological networks used of the drug for drug repurposing.
Collapse
Affiliation(s)
- Nagaraj Selvaraj
- Department of Pharmaceutical Chemistry, JSS College of Pharmacy, JSS Academy of Higher Education &Research Ooty, Nilgiris, Tamilnadu, India
| | - Akey Krishna Swaroop
- Department of Pharmaceutical Chemistry, JSS College of Pharmacy, JSS Academy of Higher Education &Research Ooty, Nilgiris, Tamilnadu, India
| | - Bala Sai Soujith Nidamanuri
- Department of Pharmaceutics, JSS College of Pharmacy, JSS Academy of Higher Education &Research Ooty, Nilgiris, Tamilnadu, India
| | - Rajesh Kumar R
- Department of Pharmaceutical Biotechnology, JSS College of Pharmacy, JSS Academy of Higher Education &Research Ooty, Nilgiris, Tamilnadu, India
| | - Jawahar Natarajan
- Department of Pharmaceutics, JSS College of Pharmacy, JSS Academy of Higher Education &Research Ooty, Nilgiris, Tamilnadu, India
| | - Jubie Selvaraj
- Department of Pharmaceutical Chemistry, JSS College of Pharmacy, JSS Academy of Higher Education &Research Ooty, Nilgiris, Tamilnadu, India
| |
Collapse
|
7
|
Popescu VB, Kanhaiya K, Năstac DI, Czeizler E, Petre I. Network controllability solutions for computational drug repurposing using genetic algorithms. Sci Rep 2022; 12:1437. [PMID: 35082323 PMCID: PMC8791995 DOI: 10.1038/s41598-022-05335-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 12/29/2021] [Indexed: 12/22/2022] Open
Abstract
Control theory has seen recently impactful applications in network science, especially in connections with applications in network medicine. A key topic of research is that of finding minimal external interventions that offer control over the dynamics of a given network, a problem known as network controllability. We propose in this article a new solution for this problem based on genetic algorithms. We tailor our solution for applications in computational drug repurposing, seeking to maximize its use of FDA-approved drug targets in a given disease-specific protein-protein interaction network. We demonstrate our algorithm on several cancer networks and on several random networks with their edges distributed according to the Erdős-Rényi, the Scale-Free, and the Small World properties. Overall, we show that our new algorithm is more efficient in identifying relevant drug targets in a disease network, advancing the computational solutions needed for new therapeutic and drug repurposing approaches.
Collapse
Affiliation(s)
| | | | - Dumitru Iulian Năstac
- POLITEHNICA University of Bucharest, Faculty of Electronics, Telecommunications and Information Technology, 061071, Bucharest, Romania
| | - Eugen Czeizler
- Computer Science, Åbo Akademi University, 20500, Turku, Finland
- National Institute for Research and Development in Biological Sciences, 060031, Bucharest, Romania
| | - Ion Petre
- Department of Mathematics and Statistics, University of Turku, 20014, Turku, Finland.
- National Institute for Research and Development in Biological Sciences, 060031, Bucharest, Romania.
| |
Collapse
|
8
|
Yang JJ, Gessner CR, Duerksen JL, Biber D, Binder JL, Ozturk M, Foote B, McEntire R, Stirling K, Ding Y, Wild DJ. Knowledge graph analytics platform with LINCS and IDG for Parkinson's disease target illumination. BMC Bioinformatics 2022; 23:37. [PMID: 35021991 PMCID: PMC8756622 DOI: 10.1186/s12859-021-04530-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 12/13/2021] [Indexed: 11/12/2022] Open
Abstract
Background LINCS, "Library of Integrated Network-based Cellular Signatures", and IDG, "Illuminating the Druggable Genome", are both NIH projects and consortia that have generated rich datasets for the study of the molecular basis of human health and disease. LINCS L1000 expression signatures provide unbiased systems/omics experimental evidence. IDG provides compiled and curated knowledge for illumination and prioritization of novel drug target hypotheses. Together, these resources can support a powerful new approach to identifying novel drug targets for complex diseases, such as Parkinson's disease (PD), which continues to inflict severe harm on human health, and resist traditional research approaches. Results Integrating LINCS and IDG, we built the Knowledge Graph Analytics Platform (KGAP) to support an important use case: identification and prioritization of drug target hypotheses for associated diseases. The KGAP approach includes strong semantics interpretable by domain scientists and a robust, high performance implementation of a graph database and related analytical methods. Illustrating the value of our approach, we investigated results from queries relevant to PD. Approved PD drug indications from IDG’s resource DrugCentral were used as starting points for evidence paths exploring chemogenomic space via LINCS expression signatures for associated genes, evaluated as target hypotheses by integration with IDG. The KG-analytic scoring function was validated against a gold standard dataset of genes associated with PD as elucidated, published mechanism-of-action drug targets, also from DrugCentral. IDG's resource TIN-X was used to rank and filter KGAP results for novel PD targets, and one, SYNGR3 (Synaptogyrin-3), was manually investigated further as a case study and plausible new drug target for PD. Conclusions The synergy of LINCS and IDG, via KG methods, empowers graph analytics methods for the investigation of the molecular basis of complex diseases, and specifically for identification and prioritization of novel drug targets. The KGAP approach enables downstream applications via integration with resources similarly aligned with modern KG methodology. The generality of the approach indicates that KGAP is applicable to many disease areas, in addition to PD, the focus of this paper. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04530-9.
Collapse
|
9
|
Ruan D, Ji S, Yan C, Zhu J, Zhao X, Yang Y, Gao Y, Zou C, Dai Q. Exploring complex and heterogeneous correlations on hypergraph for the prediction of drug-target interactions. PATTERNS 2021; 2:100390. [PMID: 34950907 PMCID: PMC8672193 DOI: 10.1016/j.patter.2021.100390] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 07/23/2021] [Accepted: 10/21/2021] [Indexed: 01/04/2023]
Abstract
The continuous emergence of drug-target interaction data provides an opportunity to construct a biological network for systematically discovering unknown interactions. However, this is challenging due to complex and heterogeneous correlations between drug and target. Here, we describe a heterogeneous hypergraph-based framework for drug-target interaction (HHDTI) predictions by modeling biological networks through a hypergraph, where each vertex represents a drug or a target and a hyperedge indicates existing similar interactions or associations between the connected vertices. The hypergraph is then trained to generate suitably structured embeddings for discovering unknown interactions. Comprehensive experiments performed on four public datasets demonstrate that HHDTI achieves significant and consistently improved predictions compared with state-of-the-art methods. Our analysis indicates that this superior performance is due to the ability to integrate heterogeneous high-order information from the hypergraph learning. These results suggest that HHDTI is a scalable and practical tool for uncovering novel drug-target interactions. A hypergraph framework to model high-order correlations in heterogenous biological network An embedding learning method for drugs and targets using hypergraphs High-order correlation between drugs and targets can contribute to DTI predictions
The prediction of drug-target interactions (DTIs) plays a crucial role in drug discovery. In this work, we discover that the high-order correlations in heterogeneous biological networks are essential for DTI predictions. The hypergraph structure is ultilized to model the high-order correlations in the biological networks, then the embeddings are generated for the drugs and targets, respectively. Finally, the interaction between them can be predicted according to the similarity of the embeddings. Our proposed method has been evaluated on multiple public datasets and the improved performance demonstrates that the high-order correlations among drugs and targets contribute significantly on DTI predictions, and other associations besides DTIs are also useful in this task. Our method can also be used in other scenarios containing complex correlations.
Collapse
Affiliation(s)
- Ding Ruan
- School of Automation, Hangzhou Dianzi University, Hangzhou, China
| | - Shuyi Ji
- School of Software, KLISS, BNRist, Tsinghua University, Beijing, China
- Institute for Brain and Cognitive Sciences, Tsinghua University, Beijing, China
| | - Chenggang Yan
- School of Automation, Hangzhou Dianzi University, Hangzhou, China
| | - Junjie Zhu
- School of Software, KLISS, BNRist, Tsinghua University, Beijing, China
| | - Xibin Zhao
- School of Software, KLISS, BNRist, Tsinghua University, Beijing, China
| | - Yuedong Yang
- School of Computer Science, Sun Yat-sen University, Guangzhou, China
| | - Yue Gao
- School of Software, KLISS, BNRist, Tsinghua University, Beijing, China
- Institute for Brain and Cognitive Sciences, Tsinghua University, Beijing, China
- Corresponding author
| | - Changqing Zou
- Huawei Vancouver Research Center, Huawei Canada Technologies, Vancouver, Canada
- Corresponding author
| | - Qionghai Dai
- Institute for Brain and Cognitive Sciences, Tsinghua University, Beijing, China
- Department of Automation, Tsinghua University, Beijing, China
- Corresponding author
| |
Collapse
|
10
|
A network representation approach for COVID-19 drug recommendation. Methods 2021; 198:3-10. [PMID: 34562584 PMCID: PMC8458160 DOI: 10.1016/j.ymeth.2021.09.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 08/30/2021] [Accepted: 09/19/2021] [Indexed: 12/15/2022] Open
Abstract
The coronavirus disease 2019 (COVID-19) has outbreak since early December 2019, and COVID-19 has caused over 100 million cases and 2 million deaths around the world. After one year of the COVID-19 outbreak, there is no certain and approve medicine against it. Drug repositioning has become one line of scientific research that is being pursued to develop an effective drug. However, due to the lack of COVID-19 data, there is still no specific drug repositioning targeting the COVID-19. In this paper, we propose a framework for COVID-19 drug repositioning. This framework has several advantages that can be exploited: one is that a local graph aggregating representation is used across a heterogeneous network to address the data sparsity problem; another is the multi-hop neighbors of the heterogeneous graph are aggregated to recall as many COVID-19 potential drugs as possible. Our experimental results show that our COVDR framework performs significantly better than baseline methods, and the docking simulation verifies that our three potential drugs have the ability to against COVID-19 disease.
Collapse
|
11
|
Mathai N, Chen Y, Kirchmair J. Validation strategies for target prediction methods. Brief Bioinform 2021; 21:791-802. [PMID: 31220208 PMCID: PMC7299289 DOI: 10.1093/bib/bbz026] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 01/14/2019] [Accepted: 02/17/2019] [Indexed: 12/11/2022] Open
Abstract
Computational methods for target prediction, based on molecular similarity and network-based approaches, machine learning, docking and others, have evolved as valuable and powerful tools to aid the challenging task of mode of action identification for bioactive small molecules such as drugs and drug-like compounds. Critical to discerning the scope and limitations of a target prediction method is understanding how its performance was evaluated and reported. Ideally, large-scale prospective experiments are conducted to validate the performance of a model; however, this expensive and time-consuming endeavor is often not feasible. Therefore, to estimate the predictive power of a method, statistical validation based on retrospective knowledge is commonly used. There are multiple statistical validation techniques that vary in rigor. In this review we discuss the validation strategies employed, highlighting the usefulness and constraints of the validation schemes and metrics that are employed to measure and describe performance. We address the limitations of measuring only generalized performance, given that the underlying bioactivity and structural data are biased towards certain small-molecule scaffolds and target families, and suggest additional aspects of performance to consider in order to produce more detailed and realistic estimates of predictive power. Finally, we describe the validation strategies that were employed by some of the most thoroughly validated and accessible target prediction methods.
Collapse
Affiliation(s)
- Neann Mathai
- Department of Chemistry, University of Bergen, Bergen, Norway.,Computational Biology Unit (CBU), University of Bergen, Bergen, Norway.,Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany
| | - Ya Chen
- Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany
| | - Johannes Kirchmair
- Department of Chemistry, University of Bergen, Bergen, Norway.,Computational Biology Unit (CBU), University of Bergen, Bergen, Norway.,Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany
| |
Collapse
|
12
|
Fecho K, Bizon C, Miller F, Schurman S, Schmitt C, Xue W, Morton K, Wang P, Tropsha A. A Biomedical Knowledge Graph System to Propose Mechanistic Hypotheses for Real-World Environmental Health Observations: Cohort Study and Informatics Application. JMIR Med Inform 2021; 9:e26714. [PMID: 34283031 PMCID: PMC8335603 DOI: 10.2196/26714] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 04/26/2021] [Accepted: 04/27/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Knowledge graphs are a common form of knowledge representation in biomedicine and many other fields. We developed an open biomedical knowledge graph-based system termed Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways (ROBOKOP). ROBOKOP consists of both a front-end user interface and a back-end knowledge graph. The ROBOKOP user interface allows users to posit questions and explore answer subgraphs. Users can also posit questions through direct Cypher query of the underlying knowledge graph, which currently contains roughly 6 million nodes or biomedical entities and 140 million edges or predicates describing the relationship between nodes, drawn from over 30 curated data sources. OBJECTIVE We aimed to apply ROBOKOP to survey data on workplace exposures and immune-mediated diseases from the Environmental Polymorphisms Registry (EPR) within the National Institute of Environmental Health Sciences. METHODS We analyzed EPR survey data and identified 45 associations between workplace chemical exposures and immune-mediated diseases, as self-reported by study participants (n= 4574), with 20 associations significant at P<.05 after false discovery rate correction. We then used ROBOKOP to (1) validate the associations by determining whether plausible connections exist within the ROBOKOP knowledge graph and (2) propose biological mechanisms that might explain them and serve as hypotheses for subsequent testing. We highlight the following three exemplar associations: carbon monoxide-multiple sclerosis, ammonia-asthma, and isopropanol-allergic disease. RESULTS ROBOKOP successfully returned answer sets for three queries that were posed in the context of the driving examples. The answer sets included potential intermediary genes, as well as supporting evidence that might explain the observed associations. CONCLUSIONS We demonstrate real-world application of ROBOKOP to generate mechanistic hypotheses for associations between workplace chemical exposures and immune-mediated diseases. We expect that ROBOKOP will find broad application across many biomedical fields and other scientific disciplines due to its generalizability, speed to discovery and generation of mechanistic hypotheses, and open nature.
Collapse
Affiliation(s)
- Karamarie Fecho
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.,Copperline Professional Solutions, Pittsboro, NC, United States
| | - Chris Bizon
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Frederick Miller
- National Institute of Environmental Health Sciences, Durham, NC, United States
| | - Shepherd Schurman
- National Institute of Environmental Health Sciences, Durham, NC, United States
| | - Charles Schmitt
- National Institute of Environmental Health Sciences, Durham, NC, United States
| | - William Xue
- National Institute of Environmental Health Sciences, Durham, NC, United States
| | | | - Patrick Wang
- CoVar Applied Technologies, Durham, NC, United States
| | - Alexander Tropsha
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.,Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| |
Collapse
|
13
|
Wang L, Xie H, Han W, Yang X, Shi L, Dong J, Jiang K, Wu H. Construction of a knowledge graph for diabetes complications from expert-reviewed clinical evidences. Comput Assist Surg (Abingdon) 2021; 25:29-35. [PMID: 33275462 DOI: 10.1080/24699322.2020.1850866] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
A knowledge graph is a structured representation of data that can express entity and relational knowledge. More attention has been paid to the study of a clinical knowledge graph, especially in the field of chronic diseases. However, knowledge graph construction is based mainly on electronic medical records and other data sources, and the authority of the constructed knowledge graph presents some problems. Therefore, regarding the quality of evidence, this study, in combination with experimental research on system evaluation and meta-analysis presents some new information, On the basis of evidence-based medicine (EBM), the secondary results of systematic evaluation and meta-analyses of social, psychological, and behavioral aspects were extracted as data for the core nodes and edges of a knowledge graph to construct a graph of type 2 diabetes (T2D) and its complications. In this study, relevant life-style evidence that are factors for the risk of diabetic retinopathy (DR), diabetic nephropathy (DN), diabetic foot (DF), and diabetic depression (DD), and the results of several of the relevant clinical test, including bariatric surgery, myopia, lipid-lowering drugs, lipid-lowering drug duration, blood glucose control, disease course, glycosylated hemoglobin, fasting blood glucose, hypertension, sex, smoking and other common lifestyle characteristics were finally extracted. The evidence-based knowledge graph of the DM complications was constructed by extracting relevant disease, risk factors, risk outcomes, and other diabetes entities and the strength of the data for the odds ratio (OR) or relative risk (RR) correlations from clinical evidence. Moreover, the risk prediction models constructed using a logistic model were incorporated into the knowledge graph to visualize the risk score of DM complications for each user. In short, the EBM-powered construction of the knowledge graph could provide high-quality information to support decisions for the prevention and control of diabetes and its complications.
Collapse
Affiliation(s)
- Lei Wang
- Department of Medical Informatics, Medical School of Nantong University, Nantong, China
| | - Huimin Xie
- Department of Medical Informatics, Medical School of Nantong University, Nantong, China
| | - Wentao Han
- Department of Medical Informatics, Medical School of Nantong University, Nantong, China
| | - Xiao Yang
- Department of Medical Informatics, Medical School of Nantong University, Nantong, China
| | - Lili Shi
- Department of Medical Informatics, Medical School of Nantong University, Nantong, China
| | - Jiancheng Dong
- Department of Medical Informatics, Medical School of Nantong University, Nantong, China
| | - Kui Jiang
- Department of Medical Informatics, Medical School of Nantong University, Nantong, China
| | - Huiqun Wu
- Department of Medical Informatics, Medical School of Nantong University, Nantong, China
| |
Collapse
|
14
|
Sadeghi SS, Keyvanpour MR. An Analytical Review of Computational Drug Repurposing. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:472-488. [PMID: 31403439 DOI: 10.1109/tcbb.2019.2933825] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Drug repurposing is a vital function in pharmaceutical fields and has gained popularity in recent years in both the pharmaceutical industry and research community. It refers to the process of discovering new uses and indications for existing or failed drugs. It is cost-effective and reliable in contrast to experimental drug discovery, which is a costly, time-consuming, and risky process and limited to a relatively small number of targets. Accordingly, a plethora of computational methodologies have been propounded to repurpose drugs on a large scale by utilizing available high throughput data. The available literature, however, lacks a contemporary and comprehensive analysis of the current computational drug repurposing methodologies. In this paper, we presented a systematic analysis of computational drug repurposing which consists of three main sections: Initially, we categorize the computational drug repurposing methods based on their technical approach and artificial intelligence perspective and discuss the strengths and weaknesses of various methods. Secondly, some general criteria are recommended to analyze our proposed categorization. In the third and final section, a qualitative comparison is made between each approach which is a guide to understanding their preference to one another. Further, this systematic analysis can help in the efficient selection and improvement of drug repurposing techniques based on the nature of computational methods implemented on biological resources.
Collapse
|
15
|
Shi W, Chen X, Deng L. A Review of Recent Developments and Progress in Computational Drug Repositioning. Curr Pharm Des 2021; 26:3059-3068. [PMID: 31951162 DOI: 10.2174/1381612826666200116145559] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 01/09/2020] [Indexed: 12/27/2022]
Abstract
Computational drug repositioning is an efficient approach towards discovering new indications for existing drugs. In recent years, with the accumulation of online health-related information and the extensive use of biomedical databases, computational drug repositioning approaches have achieved significant progress in drug discovery. In this review, we summarize recent advancements in drug repositioning. Firstly, we explicitly demonstrated the available data source information which is conducive to identifying novel indications. Furthermore, we provide a summary of the commonly used computing approaches. For each method, we briefly described techniques, case studies, and evaluation criteria. Finally, we discuss the limitations of the existing computing approaches.
Collapse
Affiliation(s)
- Wanwan Shi
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xuegong Chen
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
16
|
Zhou R, Lu Z, Luo H, Xiang J, Zeng M, Li M. NEDD: a network embedding based method for predicting drug-disease associations. BMC Bioinformatics 2020; 21:387. [PMID: 32938396 PMCID: PMC7495830 DOI: 10.1186/s12859-020-03682-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Drug discovery is known for the large amount of money and time it consumes and the high risk it takes. Drug repositioning has, therefore, become a popular approach to save time and cost by finding novel indications for approved drugs. In order to distinguish these novel indications accurately in a great many of latent associations between drugs and diseases, it is necessary to exploit abundant heterogeneous information about drugs and diseases. RESULTS In this article, we propose a meta-path-based computational method called NEDD to predict novel associations between drugs and diseases using heterogeneous information. First, we construct a heterogeneous network as an undirected graph by integrating drug-drug similarity, disease-disease similarity, and known drug-disease associations. NEDD uses meta paths of different lengths to explicitly capture the indirect relationships, or high order proximity, within drugs and diseases, by which the low dimensional representation vectors of drugs and diseases are obtained. NEDD then uses a random forest classifier to predict novel associations between drugs and diseases. CONCLUSIONS The experiments on a gold standard dataset which contains 1933 validated drug-disease associations show that NEDD produces superior prediction results compared with the state-of-the-art approaches.
Collapse
Affiliation(s)
- Renyi Zhou
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Zhangli Lu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Huimin Luo
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
- School of Computer and Information Engineering, Henan University, Kaifeng, 475001, China
| | - Ju Xiang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
- Neuroscience Research Center & School of Basic Medical Sciences, Changsha Medical University, Changsha, China
| | - Min Zeng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China.
| |
Collapse
|
17
|
Jarada TN, Rokne JG, Alhajj R. A review of computational drug repositioning: strategies, approaches, opportunities, challenges, and directions. J Cheminform 2020; 12:46. [PMID: 33431024 PMCID: PMC7374666 DOI: 10.1186/s13321-020-00450-7] [Citation(s) in RCA: 148] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Accepted: 07/13/2020] [Indexed: 01/13/2023] Open
Abstract
Drug repositioning is the process of identifying novel therapeutic potentials for existing drugs and discovering therapies for untreated diseases. Drug repositioning, therefore, plays an important role in optimizing the pre-clinical process of developing novel drugs by saving time and cost compared to the traditional de novo drug discovery processes. Since drug repositioning relies on data for existing drugs and diseases the enormous growth of publicly available large-scale biological, biomedical, and electronic health-related data along with the high-performance computing capabilities have accelerated the development of computational drug repositioning approaches. Multidisciplinary researchers and scientists have carried out numerous attempts, with different degrees of efficiency and success, to computationally study the potential of repositioning drugs to identify alternative drug indications. This study reviews recent advancements in the field of computational drug repositioning. First, we highlight different drug repositioning strategies and provide an overview of frequently used resources. Second, we summarize computational approaches that are extensively used in drug repositioning studies. Third, we present different computing and experimental models to validate computational methods. Fourth, we address prospective opportunities, including a few target areas. Finally, we discuss challenges and limitations encountered in computational drug repositioning and conclude with an outline of further research directions.
Collapse
Affiliation(s)
- Tamer N Jarada
- Department of Computer Science, University of Calgary, Calgary, Alberta, Canada
| | - Jon G Rokne
- Department of Computer Science, University of Calgary, Calgary, Alberta, Canada
| | - Reda Alhajj
- Department of Computer Science, University of Calgary, Calgary, Alberta, Canada.
- Department of Computer Engineering, Istanbul Medipol University, Istanbul, Turkey.
| |
Collapse
|
18
|
Li X, Rousseau JF, Ding Y, Song M, Lu W. Understanding Drug Repurposing From the Perspective of Biomedical Entities and Their Evolution: Bibliographic Research Using Aspirin. JMIR Med Inform 2020; 8:e16739. [PMID: 32543442 PMCID: PMC7327595 DOI: 10.2196/16739] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Revised: 01/08/2020] [Accepted: 03/31/2020] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Drug development is still a costly and time-consuming process with a low rate of success. Drug repurposing (DR) has attracted significant attention because of its significant advantages over traditional approaches in terms of development time, cost, and safety. Entitymetrics, defined as bibliometric indicators based on biomedical entities (eg, diseases, drugs, and genes) studied in the biomedical literature, make it possible for researchers to measure knowledge evolution and the transfer of drug research. OBJECTIVE The purpose of this study was to understand DR from the perspective of biomedical entities (diseases, drugs, and genes) and their evolution. METHODS In the work reported in this paper, we extended the bibliometric indicators of biomedical entities mentioned in PubMed to detect potential patterns of biomedical entities in various phases of drug research and investigate the factors driving DR. We used aspirin (acetylsalicylic acid) as the subject of the study since it can be repurposed for many applications. We propose 4 easy, transparent measures based on entitymetrics to investigate DR for aspirin: Popularity Index (P1), Promising Index (P2), Prestige Index (P3), and Collaboration Index (CI). RESULTS We found that the maxima of P1, P3, and CI are closely associated with the different repurposing phases of aspirin. These metrics enabled us to observe the way in which biomedical entities interacted with the drug during the various phases of DR and to analyze the potential driving factors for DR at the entity level. P1 and CI were indicative of the dynamic trends of a specific biomedical entity over a long time period, while P2 was more sensitive to immediate changes. P3 reflected the early signs of the practical value of biomedical entities and could be valuable for tracking the research frontiers of a drug. CONCLUSIONS In-depth studies of side effects and mechanisms, fierce market competition, and advanced life science technologies are driving factors for DR. This study showcases the way in which researchers can examine the evolution of DR using entitymetrics, an approach that can be valuable for enhancing decision making in the field of drug discovery and development.
Collapse
Affiliation(s)
- Xin Li
- Information Retrieval and Knowledge Mining Laboratory, School of Information Management, Wuhan University, Wuhan, China.,School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, United States
| | - Justin F Rousseau
- Department of Population Health and Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX, United States
| | - Ying Ding
- School of Information, Dell Medical School, The University of Texas Austin, Austin, TX, United States
| | - Min Song
- Department of Library and Information Science, Yonsei University, Seoul, Republic of Korea
| | - Wei Lu
- Information Retrieval and Knowledge Mining Laboratory, School of Information Management, Wuhan University, Wuhan, China
| |
Collapse
|
19
|
Nicholson DN, Greene CS. Constructing knowledge graphs and their biomedical applications. Comput Struct Biotechnol J 2020; 18:1414-1428. [PMID: 32637040 PMCID: PMC7327409 DOI: 10.1016/j.csbj.2020.05.017] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 05/22/2020] [Accepted: 05/23/2020] [Indexed: 12/31/2022] Open
Abstract
Knowledge graphs can support many biomedical applications. These graphs represent biomedical concepts and relationships in the form of nodes and edges. In this review, we discuss how these graphs are constructed and applied with a particular focus on how machine learning approaches are changing these processes. Biomedical knowledge graphs have often been constructed by integrating databases that were populated by experts via manual curation, but we are now seeing a more robust use of automated systems. A number of techniques are used to represent knowledge graphs, but often machine learning methods are used to construct a low-dimensional representation that can support many different applications. This representation is designed to preserve a knowledge graph's local and/or global structure. Additional machine learning methods can be applied to this representation to make predictions within genomic, pharmaceutical, and clinical domains. We frame our discussion first around knowledge graph construction and then around unifying representational learning techniques and unifying applications. Advances in machine learning for biomedicine are creating new opportunities across many domains, and we note potential avenues for future work with knowledge graphs that appear particularly promising.
Collapse
Affiliation(s)
- David N. Nicholson
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, United States
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, United States
| |
Collapse
|
20
|
Xu L, Wei X, Cao J, Yu PS. ICANE: interaction content-aware network embedding via co-embedding of nodes and edges. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2020. [DOI: 10.1007/s41060-018-0164-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
21
|
Bellera CL, Alberca LN, Sbaraglini ML, Talevi A. In Silico Drug Repositioning for Chagas Disease. Curr Med Chem 2020; 27:662-675. [PMID: 31622200 DOI: 10.2174/0929867326666191016114839] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2017] [Revised: 09/12/2019] [Accepted: 09/23/2019] [Indexed: 12/18/2022]
Abstract
Chagas disease is an infectious tropical disease included within the group of neglected tropical diseases. Though historically endemic to Latin America, it has lately spread to high-income countries due to human migration. At present, there are only two available drugs, nifurtimox and benznidazole, approved for this treatment, both with considerable side-effects (which often result in treatment interruption) and limited efficacy in the chronic stage of the disease in adults. Drug repositioning involves finding novel therapeutic indications for known drugs, including approved, withdrawn, abandoned and investigational drugs. It is today a broadly applied approach to develop innovative medications, since indication shifts are built on existing safety, ADME and manufacturing information, thus greatly shortening development timeframes. Drug repositioning has been signaled as a particularly interesting strategy to search for new therapeutic solutions for neglected and rare conditions, which traditionally present limited commercial interest and are mostly covered by the public sector and not-for-profit initiatives and organizations. Here, we review the applications of computer-aided technologies as systematic approaches to drug repositioning in the field of Chagas disease. In silico screening represents the most explored approach, whereas other rational methods such as network-based and signature-based approximations have still not been applied.
Collapse
Affiliation(s)
- Carolina L Bellera
- Laboratory of Bioactive Research and Development (LIDeB), Faculty of Exact Sciences, University of La Plata (UNLP), La Plata, Argentina
| | - Lucas N Alberca
- Laboratory of Bioactive Research and Development (LIDeB), Faculty of Exact Sciences, University of La Plata (UNLP), La Plata, Argentina
| | - María L Sbaraglini
- Laboratory of Bioactive Research and Development (LIDeB), Faculty of Exact Sciences, University of La Plata (UNLP), La Plata, Argentina
| | - Alan Talevi
- Laboratory of Bioactive Research and Development (LIDeB), Faculty of Exact Sciences, University of La Plata (UNLP), La Plata, Argentina
| |
Collapse
|
22
|
Hao M, Bryant SH, Wang Y. Open-source chemogenomic data-driven algorithms for predicting drug-target interactions. Brief Bioinform 2020; 20:1465-1474. [PMID: 29420684 DOI: 10.1093/bib/bby010] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Revised: 01/18/2018] [Indexed: 12/25/2022] Open
Abstract
While novel technologies such as high-throughput screening have advanced together with significant investment by pharmaceutical companies during the past decades, the success rate for drug development has not yet been improved prompting researchers looking for new strategies of drug discovery. Drug repositioning is a potential approach to solve this dilemma. However, experimental identification and validation of potential drug targets encoded by the human genome is both costly and time-consuming. Therefore, effective computational approaches have been proposed to facilitate drug repositioning, which have proved to be successful in drug discovery. Doubtlessly, the availability of open-accessible data from basic chemical biology research and the success of human genome sequencing are crucial to develop effective in silico drug repositioning methods allowing the identification of potential targets for existing drugs. In this work, we review several chemogenomic data-driven computational algorithms with source codes publicly accessible for predicting drug-target interactions (DTIs). We organize these algorithms by model properties and model evolutionary relationships. We re-implemented five representative algorithms in R programming language, and compared these algorithms by means of mean percentile ranking, a new recall-based evaluation metric in the DTI prediction research field. We anticipate that this review will be objective and helpful to researchers who would like to further improve existing algorithms or need to choose appropriate algorithms to infer potential DTIs in the projects. The source codes for DTI predictions are available at: https://github.com/minghao2016/chemogenomicAlg4DTIpred.
Collapse
|
23
|
Bagherian M, Sabeti E, Wang K, Sartor MA, Nikolovska-Coleska Z, Najarian K. Machine learning approaches and databases for prediction of drug-target interaction: a survey paper. Brief Bioinform 2020; 22:247-269. [PMID: 31950972 PMCID: PMC7820849 DOI: 10.1093/bib/bbz157] [Citation(s) in RCA: 172] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Revised: 11/01/2019] [Accepted: 11/07/2019] [Indexed: 12/12/2022] Open
Abstract
The task of predicting the interactions between drugs and targets plays a key role in the process of drug discovery. There is a need to develop novel and efficient prediction approaches in order to avoid costly and laborious yet not-always-deterministic experiments to determine drug–target interactions (DTIs) by experiments alone. These approaches should be capable of identifying the potential DTIs in a timely manner. In this article, we describe the data required for the task of DTI prediction followed by a comprehensive catalog consisting of machine learning methods and databases, which have been proposed and utilized to predict DTIs. The advantages and disadvantages of each set of methods are also briefly discussed. Lastly, the challenges one may face in prediction of DTI using machine learning approaches are highlighted and we conclude by shedding some lights on important future research directions.
Collapse
Affiliation(s)
- Maryam Bagherian
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Elyas Sabeti
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Kai Wang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Maureen A Sartor
- Department of Pathology, University of Michigan, Ann Arbor, MI, 48109, USA
| | | | - Kayvan Najarian
- Department of Electrical Engineering and Computer Science, College of Engineering, University of Michigan, Ann Arbor, MI, 48109, USA
| |
Collapse
|
24
|
Jeong HJ, Kim MH. Utilizing adjacency of colleagues and type correlations for enhanced link prediction. DATA KNOWL ENG 2020. [DOI: 10.1016/j.datak.2019.101785] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
25
|
Kumar R, Harilal S, Gupta SV, Jose J, Thomas Parambi DG, Uddin MS, Shah MA, Mathew B. Exploring the new horizons of drug repurposing: A vital tool for turning hard work into smart work. Eur J Med Chem 2019; 182:111602. [PMID: 31421629 PMCID: PMC7127402 DOI: 10.1016/j.ejmech.2019.111602] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 08/07/2019] [Accepted: 08/07/2019] [Indexed: 02/07/2023]
Abstract
Drug discovery and development are long and financially taxing processes. On an average it takes 12-15 years and costs 1.2 billion USD for successful drug discovery and approval for clinical use. Many lead molecules are not developed further and their potential is not tapped to the fullest due to lack of resources or time constraints. In order for a drug to be approved by FDA for clinical use, it must have excellent therapeutic potential in the desired area of target with minimal toxicities as supported by both pre-clinical and clinical studies. The targeted clinical evaluations fail to explore other potential therapeutic applications of the candidate drug. Drug repurposing or repositioning is a fast and relatively cheap alternative to the lengthy and expensive de novo drug discovery and development. Drug repositioning utilizes the already available clinical trials data for toxicity and adverse effects, at the same time explores the drug's therapeutic potential for a different disease. This review addresses recent developments and future scope of drug repositioning strategy.
Collapse
Affiliation(s)
- Rajesh Kumar
- Department of Pharmacy, Kerala University of Health Sciences, Thrissur, Kerala, India
| | - Seetha Harilal
- Department of Pharmacy, Kerala University of Health Sciences, Thrissur, Kerala, India
| | - Sheeba Varghese Gupta
- Department of Pharmaceutical Sciences, College of Pharmacy, University of South Florida, Tampa, FL, 33612, USA
| | - Jobin Jose
- Department of Pharmaceutics, NGSM Institute of Pharmaceutical Science, NITTE Deemed to be University, Manglore, 575018, India
| | - Della Grace Thomas Parambi
- Department of Pharmaceutical Chemistry, College of Pharmacy, Jouf University, Sakaka, Al Jouf, 2014, Saudi Arabia
| | - Md Sahab Uddin
- Department of Pharmacy, Southeast University, Dhaka, Bangladesh; Pharmakon Neuroscience Research Network, Dhaka, Bangladesh
| | - Muhammad Ajmal Shah
- Department of Pharmacogonosy, Faculty of Pharmaceutical Sciences, Government College University, Faisalabad, Pakistan
| | - Bijo Mathew
- Division of Drug Design and Medicinal Chemistry Research Lab, Department of Pharmaceutical Chemistry, Ahalia School of Pharmacy, Palakkad, 678557, Kerala, India.
| |
Collapse
|
26
|
Chen Z, Wang X, Gao P, Liu H, Song B. Predicting Disease Related microRNA Based on Similarity and Topology. Cells 2019; 8:cells8111405. [PMID: 31703479 PMCID: PMC6912199 DOI: 10.3390/cells8111405] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 10/31/2019] [Accepted: 11/05/2019] [Indexed: 12/19/2022] Open
Abstract
It is known that many diseases are caused by mutations or abnormalities in microRNA (miRNA). The usual method to predict miRNA disease relationships is to build a high-quality similarity network of diseases and miRNAs. All unobserved associations are ranked by their similarity scores, such that a higher score indicates a greater probability of a potential connection. However, this approach does not utilize information within the network. Therefore, in this study, we propose a machine learning method, called STIM, which uses network topology information to predict disease-miRNA associations. In contrast to the conventional approach, STIM constructs features according to information on similarity and topology in networks and then uses a machine learning model to predict potential associations. To verify the reliability and accuracy of our method, we compared STIM to other classical algorithms. The results of fivefold cross validation demonstrated that STIM outperforms many existing methods, particularly in terms of the area under the curve. In addition, the top 30 candidate miRNAs recommended by STIM in a case study of lung neoplasm have been confirmed in previous experiments, which proved the validity of the method.
Collapse
Affiliation(s)
- Zhihua Chen
- Institute of Computing Science and Technology, Guangzhou University, Guangzhou 510006, China
| | - Xinke Wang
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Peng Gao
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Hongju Liu
- College of Information Technology and Computer Science, University of the Cordilleras, Baguio 2600, Philippines
| | - Bosheng Song
- School of Information Science and Engineering, Hunan University, Changsha 410082, China
| |
Collapse
|
27
|
Park K. A review of computational drug repurposing. Transl Clin Pharmacol 2019; 27:59-63. [PMID: 32055582 PMCID: PMC6989243 DOI: 10.12793/tcp.2019.27.2.59] [Citation(s) in RCA: 110] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Revised: 06/23/2019] [Accepted: 06/24/2019] [Indexed: 12/21/2022] Open
Abstract
Although sciences and technology have progressed rapidly, de novo drug development has been a costly and time-consuming process over the past decades. In view of these circumstances, ‘drug repurposing’ (or ‘drug repositioning’) has appeared as an alternative tool to accelerate drug development process by seeking new indications for already approved drugs rather than discovering de novo drug compounds, nowadays accounting for 30% of newly marked drugs in the U.S. In the meantime, the explosive and large-scale growth of molecular, genomic and phenotypic data of pharmacological compounds is enabling the development of new area of drug repurposing called computational drug repurposing. This review provides an overview of recent progress in the area of computational drug repurposing. First, it summarizes available repositioning strategies, followed by computational methods commonly used. Then, it describes validation techniques for repurposing studies. Finally, it concludes by discussing the remaining challenges in computational repurposing.
Collapse
Affiliation(s)
- Kyungsoo Park
- Department of Pharmacology, Yonsei University College of Medicine, Seoul 03722, Korea
| |
Collapse
|
28
|
Gao Z, Fu G, Ouyang C, Tsutsui S, Liu X, Yang J, Gessner C, Foote B, Wild D, Ding Y, Yu Q. edge2vec: Representation learning using edge semantics for biomedical knowledge discovery. BMC Bioinformatics 2019; 20:306. [PMID: 31238875 PMCID: PMC6593489 DOI: 10.1186/s12859-019-2914-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 05/24/2019] [Indexed: 11/23/2022] Open
Abstract
Background Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. Results In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, edge2vec significantly outperforms state-of-the-art models on all three tasks. Conclusions We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.
Collapse
Affiliation(s)
- Zheng Gao
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA
| | - Gang Fu
- Microsoft Corporation, Seattle, Washington, USA
| | | | - Satoshi Tsutsui
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA
| | - Xiaozhong Liu
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA
| | - Jeremy Yang
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA.,Microsoft Corporation, Seattle, Washington, USA.,School of Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Christopher Gessner
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA
| | | | - David Wild
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA.,Data2Discovery, Inc., Bloomington, IN, USA
| | - Ying Ding
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA.,Data2Discovery, Inc., Bloomington, IN, USA
| | - Qi Yu
- School of Management, Shanxi Medical University, Taiyuan, Shanxi, China.
| |
Collapse
|
29
|
García del Valle EP, Lagunes García G, Prieto Santamaría L, Zanin M, Menasalvas Ruiz E, Rodríguez-González A. Disease networks and their contribution to disease understanding: A review of their evolution, techniques and data sources. J Biomed Inform 2019; 94:103206. [DOI: 10.1016/j.jbi.2019.103206] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Revised: 04/14/2019] [Accepted: 05/06/2019] [Indexed: 12/14/2022]
|
30
|
Qian T, Zhu S, Hoshida Y. Use of big data in drug development for precision medicine: an update. EXPERT REVIEW OF PRECISION MEDICINE AND DRUG DEVELOPMENT 2019; 4:189-200. [PMID: 31286058 PMCID: PMC6613936 DOI: 10.1080/23808993.2019.1617632] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Accepted: 05/08/2019] [Indexed: 02/08/2023]
Abstract
INTRODUCTION Big-data-driven drug development resources and methodologies have been evolving with ever-expanding data from large-scale biological experiments, clinical trials, and medical records from participants in data collection initiatives. The enrichment of biological- and clinical-context-specific large-scale data has enabled computational inference more relevant to real-world biomedical research, particularly identification of therapeutic targets and drugs for specific diseases and clinical scenarios. AREAS COVERED Here we overview recent progresses made in the fields: new big-data-driven approach to therapeutic target discovery, candidate drug prioritization, inference of clinical toxicity, and machine-learning methods in drug discovery. EXPERT OPINION In the near future, much larger volumes and complex datasets for precision medicine will be generated, e.g., individual and longitudinal multi-omic, and direct-to-consumer datasets. Closer collaborations between experts with different backgrounds would also be required to better translate analytic results into prognosis and treatment in the clinical practice. Meanwhile, cloud computing with protected patient privacy would become more routine analytic practice to fill the gaps within data integration along with the advent of big-data. To conclude, integration of multitudes of data generated for each individual along with techniques tailored for big-data analytics may eventually enable us to achieve precision medicine.
Collapse
Affiliation(s)
- Tongqi Qian
- Department of Genetics and Genomic Sciences and Icahn
Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount
Sinai, New York, NY, USA
| | - Shijia Zhu
- Liver Tumor Translational Research Program, Simmons
Comprehensive Cancer Center, Division of Digestive and Liver Diseases, Department of
Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX
75390, USA
| | - Yujin Hoshida
- Liver Tumor Translational Research Program, Simmons
Comprehensive Cancer Center, Division of Digestive and Liver Diseases, Department of
Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX
75390, USA
| |
Collapse
|
31
|
Lin CH, Konecki DM, Liu M, Wilson SJ, Nassar H, Wilkins AD, Gleich DF, Lichtarge O. Multimodal network diffusion predicts future disease-gene-chemical associations. Bioinformatics 2019; 35:1536-1543. [PMID: 30304494 PMCID: PMC6499233 DOI: 10.1093/bioinformatics/bty858] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 09/14/2018] [Accepted: 10/08/2018] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Precision medicine is an emerging field with hopes to improve patient treatment and reduce morbidity and mortality. To these ends, computational approaches have predicted associations among genes, chemicals and diseases. Such efforts, however, were often limited to using just some available association types. This lowers prediction coverage and, since prior evidence shows that integrating heterogeneous data is likely beneficial, it may limit accuracy. Therefore, we systematically tested whether using more association types improves prediction. RESULTS We study multimodal networks linking diseases, genes and chemicals (drugs) by applying three diffusion algorithms and varying information content. Ten-fold cross-validation shows that these networks are internally consistent, both within and across association types. Also, diffusion methods recovered missing edges, even if all the edges from an entire mode of association were removed. This suggests that information is transferable between these association types. As a realistic validation, time-stamped experiments simulated the predictions of future associations based solely on information known prior to a given date. The results show that many future published results are predictable from current associations. Moreover, in most cases, using more association types increases prediction coverage without significantly decreasing sensitivity and specificity. In case studies, literature-supported validation shows that these predictions mimic human-formulated hypotheses. Overall, this study suggests that diffusion over a more comprehensive multimodal network will generate more useful hypotheses of associations among diseases, genes and chemicals, which may guide the development of precision therapies. AVAILABILITY AND IMPLEMENTATION Code and data are available at https://github.com/LichtargeLab/multimodal-network-diffusion. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chih-Hsu Lin
- Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX, USA
| | - Daniel M Konecki
- Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX, USA
| | - Meng Liu
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Stephen J Wilson
- Department of Biochemistry and Molecular Biology, Houston, TX, USA
| | - Huda Nassar
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Angela D Wilkins
- Departments of Molecular and Human Genetics, and Pharmacology, Houston, TX, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX, USA
| | - David F Gleich
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Olivier Lichtarge
- Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX, USA
- Department of Biochemistry and Molecular Biology, Houston, TX, USA
- Departments of Molecular and Human Genetics, and Pharmacology, Houston, TX, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
32
|
Kanza S, Frey JG. A new wave of innovation in Semantic web tools for drug discovery. Expert Opin Drug Discov 2019; 14:433-444. [DOI: 10.1080/17460441.2019.1586880] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Samantha Kanza
- Department of Chemistry, Highfield Campus, University of Southampton, Southampton, UK
| | - Jeremy Graham Frey
- Department of Chemistry, Highfield Campus, University of Southampton, Southampton, UK
| |
Collapse
|
33
|
Polamreddy P, Gattu N. The drug repurposing landscape from 2012 to 2017: evolution, challenges, and possible solutions. Drug Discov Today 2019; 24:789-795. [DOI: 10.1016/j.drudis.2018.11.022] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 11/15/2018] [Accepted: 11/27/2018] [Indexed: 01/13/2023]
|
34
|
Inferring Drug-Protein⁻Side Effect Relationships from Biomedical Text. Genes (Basel) 2019; 10:genes10020159. [PMID: 30791472 PMCID: PMC6409686 DOI: 10.3390/genes10020159] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 02/13/2019] [Accepted: 02/14/2019] [Indexed: 11/16/2022] Open
Abstract
Background: Although there are many studies of drugs and their side effects, the underlying mechanisms of these side effects are not well understood. It is also difficult to understand the specific pathways between drugs and side effects. Objective: The present study seeks to construct putative paths between drugs and their side effects by applying text-mining techniques to free text of biomedical studies, and to develop ranking metrics that could identify the most-likely paths. Materials and Methods: We extracted three types of relationships—drug-protein, protein-protein, and protein–side effect—from biomedical texts by using text mining and predefined relation-extraction rules. Based on the extracted relationships, we constructed whole drug-protein–side effect paths. For each path, we calculated its ranking score by a new ranking function that combines corpus- and ontology-based semantic similarity as well as co-occurrence frequency. Results: We extracted 13 plausible biomedical paths connecting drugs and their side effects from cancer-related abstracts in the PubMed database. The top 20 paths were examined, and the proposed ranking function outperformed the other methods tested, including co-occurrence, COALS, and UMLS by P@5-P@20. In addition, we confirmed that the paths are novel hypotheses that are worth investigating further. Discussion: The risk of side effects has been an important issue for the US Food and Drug Administration (FDA). However, the causes and mechanisms of such side effects have not been fully elucidated. This study extends previous research on understanding drug side effects by using various techniques such as Named Entity Recognition (NER), Relation Extraction (RE), and semantic similarity. Conclusion: It is not easy to reveal the biomedical mechanisms of side effects due to a huge number of possible paths. However, we automatically generated predictable paths using the proposed approach, which could provide meaningful information to biomedical researchers to generate plausible hypotheses for the understanding of such mechanisms.
Collapse
|
35
|
Abstract
Recent advances in technology have led to the exponential growth of scientific literature in biomedical sciences. This rapid increase in information has surpassed the threshold for manual curation efforts, necessitating the use of text mining approaches in the field of life sciences. One such application of text mining is in fostering in silico drug discovery such as drug target screening, pharmacogenomics, adverse drug event detection, etc. This chapter serves as an introduction to the applications of various text mining approaches in drug discovery. It is divided into two parts with the first half as an overview of text mining in the biosciences. The second half of the chapter reviews strategies and methods for four unique applications of text mining in drug discovery.
Collapse
Affiliation(s)
- Si Zheng
- Institute of Medical Information and Library, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Shazia Dharssi
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Meng Wu
- Institute of Medical Information and Library, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jiao Li
- Institute of Medical Information and Library, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA.
| |
Collapse
|
36
|
Abstract
Drugs modulate disease states through their actions on targets in the body. Determining these targets aids the focused development of new treatments, and helps to better characterize those already employed. One means of accomplishing this is through the deployment of in silico methodologies, harnessing computational analytical and predictive power to produce educated hypotheses for experimental verification. Here, we provide an overview of the current state of the art, describe some of the well-established methods in detail, and reflect on how they, and emerging technologies promoting the incorporation of complex and heterogeneous data-sets, can be employed to improve our understanding of (poly)pharmacology.
Collapse
Affiliation(s)
- Ryan Byrne
- Department of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland.
| |
Collapse
|
37
|
Tripartite Network-Based Repurposing Method Using Deep Learning to Compute Similarities for Drug-Target Prediction. Methods Mol Biol 2019; 1903:317-328. [PMID: 30547451 DOI: 10.1007/978-1-4939-8955-3_19] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
The drug discovery process is conventionally regarded as resource intensive and complex. Therefore, research effort has been put into a process called drug repositioning with the use of computational methods. Similarity-based methods are common in predicting drug-target association or the interaction between drugs and targets based on various features the drugs and targets have. Heterogeneous network topology involving many biomedical entities interactions has yet to be used in drug-target association. Deep learning can disclose features of vertices in a large network, which can be incorporated with heterogeneous network topology in order to assist similarity-based solutions to provide more flexibility for drug-target prediction. Here we describe a similarity-based drug-target prediction method that utilizes a topology-based similarity measure and two inference methods based on the similarities. We used DeepWalk, a deep learning method, to calculate the vertex similarities based on Linked Tripartite Network (LTN), which is a heterogeneous network created from different biomedical-linked datasets. The similarities are further used to feed to the inference methods, drug-based similarity inference (DBSI) and target-based similarity inference (TBSI), to obtain the predicted drug-target associations. Our previous experiments have shown that by utilizing deep learning and heterogeneous network topology, the proposed method can provide more promising results than current topology-based similarity computation methods.
Collapse
|
38
|
Tian Z, Teng Z, Cheng S, Guo M. Computational drug repositioning using meta-path-based semantic network analysis. BMC SYSTEMS BIOLOGY 2018; 12:134. [PMID: 30598084 PMCID: PMC6311940 DOI: 10.1186/s12918-018-0658-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
BACKGROUND Drug repositioning is a promising and efficient way to discover new indications for existing drugs, which holds the great potential for precision medicine in the post-genomic era. Many network-based approaches have been proposed for drug repositioning based on similarity networks, which integrate multiple sources of drugs and diseases. However, these methods may simply view nodes as the same-typed and neglect the semantic meanings of different meta-paths in the heterogeneous network. Therefore, it is urgent to develop a rational method to infer new indications for approved drugs. RESULTS In this study, we proposed a novel methodology named HeteSim_DrugDisease (HSDD) for the prediction of drug repositioning. Firstly, we build the drug-drug similarity network and disease-disease similarity network by integrating the information of drugs and diseases. Secondly, a drug-disease heterogeneous network is constructed, which combines the drug similarity network, disease similarity network as well as the known drug-disease association network. Finally, HSDD predicts novel drug-disease associations based on the HeteSim scores of different meta-paths. The experimental results show that HSDD performs significantly better than the existing state-of-the-art approaches. HSDD achieves an AUC score of 0.8994 in the leave-one-out cross validation experiment. Moreover, case studies for selected drugs further illustrate the practical usefulness of HSDD. CONCLUSIONS HSDD can be an effective and feasible way to infer the associations between drugs and diseases using on meta-path-based semantic network analysis.
Collapse
Affiliation(s)
- Zhen Tian
- School of Information Engineering, Zhengzhou University, Zhengzhou, 450001, People's Republic of China
| | - Zhixia Teng
- School of information and computer engineering, Northeast Forestry, Harbin, 150001, People's Republic of China
| | - Shuang Cheng
- Institute of Materials, China Academy of Engineering Physics, Jiang You, 621907, Sichuan, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, 100044, People's Republic of China. .,Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing, 100044, China.
| |
Collapse
|
39
|
Olayan RS, Ashoor H, Bajic VB. DDR: efficient computational method to predict drug-target interactions using graph mining and machine learning approaches. Bioinformatics 2018; 34:1164-1173. [PMID: 29186331 PMCID: PMC5998943 DOI: 10.1093/bioinformatics/btx731] [Citation(s) in RCA: 107] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Accepted: 11/23/2017] [Indexed: 02/06/2023] Open
Abstract
Motivation Finding computationally drug–target interactions (DTIs) is a convenient strategy to identify new DTIs at low cost with reasonable accuracy. However, the current DTI prediction methods suffer the high false positive prediction rate. Results We developed DDR, a novel method that improves the DTI prediction accuracy. DDR is based on the use of a heterogeneous graph that contains known DTIs with multiple similarities between drugs and multiple similarities between target proteins. DDR applies non-linear similarity fusion method to combine different similarities. Before fusion, DDR performs a pre-processing step where a subset of similarities is selected in a heuristic process to obtain an optimized combination of similarities. Then, DDR applies a random forest model using different graph-based features extracted from the DTI heterogeneous graph. Using 5-repeats of 10-fold cross-validation, three testing setups, and the weighted average of area under the precision-recall curve (AUPR) scores, we show that DDR significantly reduces the AUPR score error relative to the next best start-of-the-art method for predicting DTIs by 31% when the drugs are new, by 23% when targets are new and by 34% when the drugs and the targets are known but not all DTIs between them are not known. Using independent sources of evidence, we verify as correct 22 out of the top 25 DDR novel predictions. This suggests that DDR can be used as an efficient method to identify correct DTIs. Availability and implementation The data and code are provided at https://bitbucket.org/RSO24/ddr/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rawan S Olayan
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia
| | - Haitham Ashoor
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Vladimir B Bajic
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia
| |
Collapse
|
40
|
Xu L, Wei X, Cao J, Yu PS. Multi-task network embedding. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2018. [DOI: 10.1007/s41060-018-0166-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
41
|
La MK, Sedykh A, Fourches D, Muratov E, Tropsha A. Predicting Adverse Drug Effects from Literature- and Database-Mined Assertions. Drug Saf 2018; 41:1059-1072. [PMID: 29876834 PMCID: PMC6212308 DOI: 10.1007/s40264-018-0688-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
INTRODUCTION Given that adverse drug effects (ADEs) have led to post-market patient harm and subsequent drug withdrawal, failure of candidate agents in the drug development process, and other negative outcomes, it is essential to attempt to forecast ADEs and other relevant drug-target-effect relationships as early as possible. Current pharmacologic data sources, providing multiple complementary perspectives on the drug-target-effect paradigm, can be integrated to facilitate the inference of relationships between these entities. OBJECTIVE This study aims to identify both existing and unknown relationships between chemicals (C), protein targets (T), and ADEs (E) based on evidence in the literature. MATERIALS AND METHODS Cheminformatics and data mining approaches were employed to integrate and analyze publicly available clinical pharmacology data and literature assertions interrelating drugs, targets, and ADEs. Based on these assertions, a C-T-E relationship knowledge base was developed. Known pairwise relationships between chemicals, targets, and ADEs were collected from several pharmacological and biomedical data sources. These relationships were curated and integrated according to Swanson's paradigm to form C-T-E triangles. Missing C-E edges were then inferred as C-E relationships. RESULTS Unreported associations between drugs, targets, and ADEs were inferred, and inferences were prioritized as testable hypotheses. Several C-E inferences, including testosterone → myocardial infarction, were identified using inferences based on the literature sources published prior to confirmatory case reports. Timestamping approaches confirmed the predictive ability of this inference strategy on a larger scale. CONCLUSIONS The presented workflow, based on free-access databases and an association-based inference scheme, provided novel C-E relationships that have been validated post hoc in case reports. With refinement of prioritization schemes for the generated C-E inferences, this workflow may provide an effective computational method for the early detection of potential drug candidate ADEs that can be followed by targeted experimental investigations.
Collapse
Affiliation(s)
- Mary K La
- Division of Practice Advancement and Clinical Education, UNC Eshelman School of Pharmacy, 301 Pharmacy Lane, Chapel Hill, NC, 27599, USA
| | - Alexander Sedykh
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, 301 Pharmacy Lane, Chapel Hill, NC, 27599, USA
- Sciome LLC, 2 Davis Drive, Research Triangle Park, NC, 27709, USA
| | - Denis Fourches
- Department of Chemistry, North Carolina State University, Raleigh, NC, 27695, USA
| | - Eugene Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, 301 Pharmacy Lane, Chapel Hill, NC, 27599, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, 301 Pharmacy Lane, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
42
|
Van Vleet TR, Liguori MJ, Lynch JJ, Rao M, Warder S. Screening Strategies and Methods for Better Off-Target Liability Prediction and Identification of Small-Molecule Pharmaceuticals. SLAS DISCOVERY 2018; 24:1-24. [PMID: 30196745 DOI: 10.1177/2472555218799713] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Pharmaceutical discovery and development is a long and expensive process that, unfortunately, still results in a low success rate, with drug safety continuing to be a major impedance. Improved safety screening strategies and methods are needed to more effectively fill this critical gap. Recent advances in informatics are now making it possible to manage bigger data sets and integrate multiple sources of screening data in a manner that can potentially improve the selection of higher-quality drug candidates. Integrated screening paradigms have become the norm in Pharma, both in discovery screening and in the identification of off-target toxicity mechanisms during later-stage development. Furthermore, advances in computational methods are making in silico screens more relevant and suggest that they may represent a feasible option for augmenting the current screening paradigm. This paper outlines several fundamental methods of the current drug screening processes across Pharma and emerging techniques/technologies that promise to improve molecule selection. In addition, the authors discuss integrated screening strategies and provide examples of advanced screening paradigms.
Collapse
Affiliation(s)
- Terry R Van Vleet
- 1 Department of Investigative Toxicology and Pathology, AbbVie, N Chicago, IL, USA
| | - Michael J Liguori
- 1 Department of Investigative Toxicology and Pathology, AbbVie, N Chicago, IL, USA
| | - James J Lynch
- 2 Department of Integrated Science and Technology, AbbVie, N Chicago, IL, USA
| | - Mohan Rao
- 1 Department of Investigative Toxicology and Pathology, AbbVie, N Chicago, IL, USA
| | - Scott Warder
- 3 Department of Target Enabling Science and Technology, AbbVie, N Chicago, IL, USA
| |
Collapse
|
43
|
Luechtefeld T, Hartung T. Computational approaches to chemical hazard assessment. ALTEX-ALTERNATIVES TO ANIMAL EXPERIMENTATION 2018; 34:459-478. [PMID: 29101769 PMCID: PMC5848496 DOI: 10.14573/altex.1710141] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2017] [Indexed: 01/10/2023]
Abstract
Computational prediction of toxicity has reached new heights as a result of decades of growth in the magnitude and diversity of biological data. Public packages for statistics and machine learning make model creation faster. New theory in machine learning and cheminformatics enables integration of chemical structure, toxicogenomics, simulated and physical data in the prediction of chemical health hazards, and other toxicological information. Our earlier publications have characterized a toxicological dataset of unprecedented scale resulting from the European REACH legislation (Registration Evaluation Authorisation and Restriction of Chemicals). These publications dove into potential use cases for regulatory data and some models for exploiting this data. This article analyzes the options for the identification and categorization of chemicals, moves on to the derivation of descriptive features for chemicals, discusses different kinds of targets modeled in computational toxicology, and ends with a high-level perspective of the algorithms used to create computational toxicology models.
Collapse
Affiliation(s)
- Thomas Luechtefeld
- Johns Hopkins Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
| | - Thomas Hartung
- Johns Hopkins Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA.,CAAT-Europe, University of Konstanz, Konstanz, Germany
| |
Collapse
|
44
|
|
45
|
Xue H, Li J, Xie H, Wang Y. Review of Drug Repositioning Approaches and Resources. Int J Biol Sci 2018; 14:1232-1244. [PMID: 30123072 PMCID: PMC6097480 DOI: 10.7150/ijbs.24612] [Citation(s) in RCA: 327] [Impact Index Per Article: 54.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2017] [Accepted: 06/12/2018] [Indexed: 12/23/2022] Open
Abstract
Drug discovery is a time-consuming, high-investment, and high-risk process in traditional drug development. Drug repositioning has become a popular strategy in recent years. Different from traditional drug development strategies, the strategy is efficient, economical and riskless. There are usually three kinds of approaches: computational approaches, biological experimental approaches, and mixed approaches, all of which are widely used in drug repositioning. In this paper, we reviewed computational approaches and highlighted their characteristics to provide references for researchers to develop more powerful approaches. At the same time, the important findings obtained using these approaches are listed. Furthermore, we summarized 76 important resources about drug repositioning. Finally, challenges and opportunities in drug repositioning are discussed from multiple perspectives, including technology, commercial models, patents and investment.
Collapse
Affiliation(s)
- Hanqing Xue
- School of Computer Science and Technology, Harbin Institute of Technology, 150001, Harbin, China
| | - Jie Li
- School of Computer Science and Technology, Harbin Institute of Technology, 150001, Harbin, China
| | - Haozhe Xie
- School of Computer Science and Technology, Harbin Institute of Technology, 150001, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, 150001, Harbin, China
| |
Collapse
|
46
|
Zong N, Kim H, Ngo V, Harismendy O. Deep mining heterogeneous networks of biomedical linked data to predict novel drug-target associations. Bioinformatics 2018; 33:2337-2344. [PMID: 28430977 DOI: 10.1093/bioinformatics/btx160] [Citation(s) in RCA: 98] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2016] [Accepted: 03/21/2017] [Indexed: 12/20/2022] Open
Abstract
Motivation A heterogeneous network topology possessing abundant interactions between biomedical entities has yet to be utilized in similarity-based methods for predicting drug-target associations based on the array of varying features of drugs and their targets. Deep learning reveals features of vertices of a large network that can be adapted in accommodating the similarity-based solutions to provide a flexible method of drug-target prediction. Results We propose a similarity-based drug-target prediction method that enhances existing association discovery methods by using a topology-based similarity measure. DeepWalk, a deep learning method, is adopted in this study to calculate the similarities within Linked Tripartite Network (LTN), a heterogeneous network generated from biomedical linked datasets. This proposed method shows promising results for drug-target association prediction: 98.96% AUC ROC score with a 10-fold cross-validation and 99.25% AUC ROC score with a Monte Carlo cross-validation with LTN. By utilizing DeepWalk, we demonstrate that: (i) this method outperforms other existing topology-based similarity computation methods, (ii) the performance is better for tripartite than with bipartite networks and (iii) the measure of similarity using network topology outperforms the ones derived from chemical structure (drugs) or genomic sequence (targets). Our proposed methodology proves to be capable of providing a promising solution for drug-target prediction based on topological similarity with a heterogeneous network, and may be readily re-purposed and adapted in the existing of similarity-based methodologies. Availability and Implementation The proposed method has been developed in JAVA and it is available, along with the data at the following URL: https://github.com/zongnansu1982/drug-target-prediction . Contact nazong@ucsd.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nansu Zong
- Department of Biomedical Informatics, School of Medicine, UC, San Diego, CA 92093, USA
| | - Hyeoneui Kim
- Department of Biomedical Informatics, School of Medicine, UC, San Diego, CA 92093, USA
| | - Victoria Ngo
- Betty Irene Moore School of Nursing, UC Davis, Sacramento, CA 95817, USA
| | - Olivier Harismendy
- Department of Biomedical Informatics, School of Medicine, UC, San Diego, CA 92093, USA.,Moores Cancer Center, UC, San Diego, CA 92093, USA
| |
Collapse
|
47
|
Talevi A. Drug repositioning: current approaches and their implications in the precision medicine era. EXPERT REVIEW OF PRECISION MEDICINE AND DRUG DEVELOPMENT 2018. [DOI: 10.1080/23808993.2018.1424535] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Alan Talevi
- Laboratory of Research and Development of Bioactive Compounds – Medicinal Chemistry, Department of Biological Sciences, Faculty of Exact Sciences, University of La Plata, La Plata, Argentina
| |
Collapse
|
48
|
Abstract
Following the elucidation of the human genome, chemogenomics emerged in the beginning of the twenty-first century as an interdisciplinary research field with the aim to accelerate target and drug discovery by making best usage of the genomic data and the data linkable to it. What started as a systematization approach within protein target families now encompasses all types of chemical compounds and gene products. A key objective of chemogenomics is the establishment, extension, analysis, and prediction of a comprehensive SAR matrix which by application will enable further systematization in drug discovery. Herein we outline future perspectives of chemogenomics including the extension to new molecular modalities, or the potential extension beyond the pharma to the agro and nutrition sectors, and the importance for environmental protection. The focus is on computational sciences with potential applications for compound library design, virtual screening, hit assessment, analysis of phenotypic screens, lead finding and optimization, and systems biology-based prediction of toxicology and translational research.
Collapse
Affiliation(s)
- Edgar Jacoby
- Janssen Research & Development, Beerse, Belgium.
| | - J B Brown
- Life Science Informatics Research Unit, Laboratory of Molecular Biosciences, Kyoto University Graduate School of Medicine, Kyoto, Japan
| |
Collapse
|
49
|
Sam E, Athri P. Web-based drug repurposing tools: a survey. Brief Bioinform 2017; 20:299-316. [DOI: 10.1093/bib/bbx125] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Indexed: 12/15/2022] Open
Affiliation(s)
- Elizabeth Sam
- Department of Computer Science & Engineering Amrita, University Bengaluru, India
| | - Prashanth Athri
- Department of Computer Science & Engineering Amrita, University Bengaluru, India
| |
Collapse
|
50
|
Zhang J, Tang J, Ma C, Tong H, Jing Y, Li J, Luyten W, Moens MF. Fast and Flexible Top-
k
Similarity Search on Large Networks. ACM T INFORM SYST 2017. [DOI: 10.1145/3086695] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Similarity search is a fundamental problem in network analysis and can be applied in many applications, such as collaborator recommendation in coauthor networks, friend recommendation in social networks, and relation prediction in medical information networks. In this article, we propose a sampling-based method using random paths to estimate the similarities based on both common neighbors and structural contexts efficiently in very large homogeneous or heterogeneous information networks. We give a theoretical guarantee that the sampling size depends on the error-bound ε, the confidence level (1-δ), and the path length
T
of each random walk. We perform an extensive empirical study on a Tencent microblogging network of 1,000,000,000 edges. We show that our algorithm can return top-
k
similar vertices for any vertex in a network 300× faster than the state-of-the-art methods. We develop a prototype system of recommending similar authors to demonstrate the effectiveness of our method.
Collapse
Affiliation(s)
- Jing Zhang
- Tsinghua University, Renmin University of China
| | - Jie Tang
- Tsinghua University, Beijing, China
| | - Cong Ma
- Tsinghua University, Beijing, China
| | | | - Yu Jing
- Tsinghua University, Beijing, China
| | | | | | | |
Collapse
|