1
|
Li Y, Yang Y, Tong Z, Wang Y, Mi Q, Bai M, Liang G, Li B, Shu K. A comparative benchmarking and evaluation framework for heterogeneous network-based drug repositioning methods. Brief Bioinform 2024; 25:bbae172. [PMID: 38647153 PMCID: PMC11033846 DOI: 10.1093/bib/bbae172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 02/25/2024] [Accepted: 04/02/2024] [Indexed: 04/25/2024] Open
Abstract
Computational drug repositioning, which involves identifying new indications for existing drugs, is an increasingly attractive research area due to its advantages in reducing both overall cost and development time. As a result, a growing number of computational drug repositioning methods have emerged. Heterogeneous network-based drug repositioning methods have been shown to outperform other approaches. However, there is a dearth of systematic evaluation studies of these methods, encompassing performance, scalability and usability, as well as a standardized process for evaluating new methods. Additionally, previous studies have only compared several methods, with conflicting results. In this context, we conducted a systematic benchmarking study of 28 heterogeneous network-based drug repositioning methods on 11 existing datasets. We developed a comprehensive framework to evaluate their performance, scalability and usability. Our study revealed that methods such as HGIMC, ITRPCA and BNNR exhibit the best overall performance, as they rely on matrix completion or factorization. HINGRL, MLMC, ITRPCA and HGIMC demonstrate the best performance, while NMFDR, GROBMC and SCPMF display superior scalability. For usability, HGIMC, DRHGCN and BNNR are the top performers. Building on these findings, we developed an online tool called HN-DREP (http://hn-drep.lyhbio.com/) to facilitate researchers in viewing all the detailed evaluation results and selecting the appropriate method. HN-DREP also provides an external drug repositioning prediction service for a specific disease or drug by integrating predictions from all methods. Furthermore, we have released a Snakemake workflow named HN-DRES (https://github.com/lyhbio/HN-DRES) to facilitate benchmarking and support the extension of new methods into the field.
Collapse
Affiliation(s)
- Yinghong Li
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| | - Yinqi Yang
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| | - Zhuohao Tong
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| | - Yu Wang
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| | - Qin Mi
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| | - Mingze Bai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, P. R. China
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, P. R. China
| | - Kunxian Shu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
| |
Collapse
|
2
|
Kim Y, Ahn I, Cho HN, Gwon H, Kang HJ, Seo H, Choi H, Kim KP, Jun TJ, Kim YH. RIDAB: Electronic medical record-integrated real world data platform for predicting and summarizing interactions in biomedical research from heterogeneous data resources. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 221:106866. [PMID: 35594580 DOI: 10.1016/j.cmpb.2022.106866] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 04/27/2022] [Accepted: 05/07/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND AND OBJECTIVE With the advent of bioinformatics, biological databases have been constructed to computerize data. Biological systems can be described as interactions and relationships between elements constituting the systems, and they are organized in various biomedical open databases. These open databases have been used in approaches to predict functional interactions such as protein-protein interactions (PPI), drug-drug interactions (DDI) and disease-disease relationships (DDR). However, just combining interaction data has limited effectiveness in predicting the complex relationships occurring in a whole context. Each contributing source contains information on each element in a specific field of knowledge but there is a lack of inter-disciplinary insight in combining them. METHODS In this study, we propose the RWD Integrated platform for Discovering Associations in Biomedical research (RIDAB) to predict interactions between biomedical entities. RIDAB is established as a graph network to construct a platform that predicts the interactions of target entities. Biomedical open database is combined with EMRs each representing a biomedical network and a real-world data. To integrate databases from different domains to build the platform, mapping of the vocabularies was required. In addition, the appropriate structure of the network and the graph embedding method to be used were needed to be selected to fit the tasks. RESULTS The feasibility of the platform was evaluated using node similarity and link prediction for drug repositioning task, a commonly used task for biomedical network. In addition, we compared the US Food and Drug Administration (FDA)-approved repositioned drugs with the predicted result. By integrating EMR database with biomedical networks, the platform showed increased f1 score in predicting repositioned drugs, from 45.62% to 57.26%, compared to platforms based on biomedical networks alone. CONCLUSIONS This study demonstrates that the elements of biomedical research findings can be reflected by integrating EMR data with open-source biomedical networks. In addition, showed the feasibility of using the established platform to represent the integration of biomedical networks and reflected the relationship between real world networks.
Collapse
Affiliation(s)
- Yunha Kim
- Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Republic of Korea.
| | - Imjin Ahn
- Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Republic of Korea.
| | - Ha Na Cho
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Republic of Korea.
| | - Hansle Gwon
- Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Republic of Korea.
| | - Hee Jun Kang
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Republic of Korea.
| | - Hyeram Seo
- Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Republic of Korea.
| | - Heejung Choi
- Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Republic of Korea.
| | - Kyu-Pyo Kim
- Department of Oncology, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Republic of Korea.
| | - Tae Joon Jun
- Big Data Research Center, Asan Institute for Life Sciences, Asan Medical Center, Seoul 05505, Republic of Korea.
| | - Young-Hak Kim
- Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Republic of Korea.
| |
Collapse
|
3
|
Wang L, Tan Y, Yang X, Kuang L, Ping P. Review on predicting pairwise relationships between human microbes, drugs and diseases: from biological data to computational models. Brief Bioinform 2022; 23:6553604. [PMID: 35325024 DOI: 10.1093/bib/bbac080] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 02/14/2022] [Accepted: 02/15/2022] [Indexed: 12/11/2022] Open
Abstract
In recent years, with the rapid development of techniques in bioinformatics and life science, a considerable quantity of biomedical data has been accumulated, based on which researchers have developed various computational approaches to discover potential associations between human microbes, drugs and diseases. This paper provides a comprehensive overview of recent advances in prediction of potential correlations between microbes, drugs and diseases from biological data to computational models. Firstly, we introduced the widely used datasets relevant to the identification of potential relationships between microbes, drugs and diseases in detail. And then, we divided a series of a lot of representative computing models into five major categories including network, matrix factorization, matrix completion, regularization and artificial neural network for in-depth discussion and comparison. Finally, we analysed possible challenges and opportunities in this research area, and at the same time we outlined some suggestions for further improvement of predictive performances as well.
Collapse
Affiliation(s)
- Lei Wang
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China.,Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, Hunan, China
| | - Yaqin Tan
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China.,Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, Hunan, China
| | - Xiaoyu Yang
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China.,Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, Hunan, China
| | - Linai Kuang
- Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, Hunan, China
| | - Pengyao Ping
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, 410022, Hunan, China
| |
Collapse
|
4
|
Wang W, Zhang X, Dai DQ. springD2A: capturing uncertainty in disease-drug association prediction with model integration. Bioinformatics 2022; 38:1353-1360. [PMID: 34864881 DOI: 10.1093/bioinformatics/btab820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 11/23/2021] [Accepted: 11/30/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Drug repositioning that aims to find new indications for existing drugs has been an efficient strategy for drug discovery. In the scenario where we only have confirmed disease-drug associations as positive pairs, a negative set of disease-drug pairs is usually constructed from the unknown disease-drug pairs in previous studies, where we do not know whether drugs and diseases can be associated, to train a model for disease-drug association prediction (drug repositioning). Drugs and diseases in these negative pairs can potentially be associated, but most studies have ignored them. RESULTS We present a method, springD2A, to capture the uncertainty in the negative pairs, and to discriminate between positive and unknown pairs because the former are more reliable. In springD2A, we introduce a spring-like penalty for the loss of negative pairs, which is strong if they are too close in a unit sphere, but mild if they are at a moderate distance. We also design a sequential sampling in which the probability of an unknown disease-drug pair sampled as negative is proportional to its score predicted as positive. Multiple models are learned during sequential sampling, and we adopt parameter- and feature-based ensemble schemes to boost performance. Experiments show springD2A is an effective tool for drug-repositioning. AVAILABILITY AND IMPLEMENTATION A python implementation of springD2A and datasets used in this study are available at https://github.com/wangyuanhao/springD2A. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Weiwen Wang
- Intelligent Data Center, School of Mathematics, Sun Yat-Sen University, Guangzhou 510000, China
| | - Xiwen Zhang
- Intelligent Data Center, School of Mathematics, Sun Yat-Sen University, Guangzhou 510000, China
| | - Dao-Qing Dai
- Intelligent Data Center, School of Mathematics, Sun Yat-Sen University, Guangzhou 510000, China
| |
Collapse
|
5
|
Wang X, Yan R, Wang Y. Computational identification of human ubiquitination sites using convolutional and recurrent neural networks. Mol Omics 2021; 17:948-955. [PMID: 34515266 DOI: 10.1039/d0mo00183j] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Ubiquitination is a very important protein post-translational modification in humans, which is closely related to many human diseases such as cancers. Although some methods have been elegantly proposed to predict human ubiquitination sites, the accuracy of these methods is generally not very satisfactory. In order to improve the prediction accuracy of human ubiquitination sites, we propose a new ensemble method HUbipPred, which takes the binary encoding and physicochemical properties of amino acids as training features, and integrates two intensively trained convolutional neural networks and two recurrent neural networks to build the model. Finally, HUbiPred achieves AUC values of 0.852 and 0.844 in five-fold cross-validation and independent tests, respectively, which greatly improves the prediction accuracy compared to previous predictors. We also analyze the physicochemical properties of amino acids around ubiquitination sites, study the important roles of architectures (i.e., convolution, long short-term memory (LSTM) and fully connected hidden layers) in the networks for prediction performance, and also predict potential ubiquitination sites in humans using HUbiPred. The training and test datasets, predicted human ubiquitination sites, and source codes of HUbiPred are publicly available at https://github.com/amituofo-xf/HUbiPred.
Collapse
Affiliation(s)
- Xiaofeng Wang
- College of Mathematics and Computer Sciences, Shanxi Normal University, Linfen 041004, China.
| | - Renxiang Yan
- School of Biological Sciences and Engineering, Fujian Key Laboratory of Marine Enzyme Engineering, Fuzhou University, Fuzhou 350002, China.
| | - Yongji Wang
- College of Life Sciences, Shanxi Normal University, Linfen 041000, China
| |
Collapse
|