51
|
Mahmud SMH, Chen W, Meng H, Jahan H, Liu Y, Hasan SMM. Prediction of drug-target interaction based on protein features using undersampling and feature selection techniques with boosting. Anal Biochem 2019; 589:113507. [PMID: 31734254 DOI: 10.1016/j.ab.2019.113507] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Revised: 11/05/2019] [Accepted: 11/08/2019] [Indexed: 12/29/2022]
Abstract
Accurate identification of drug-target interaction (DTI) is a crucial and challenging task in the drug discovery process, having enormous benefit to the patients and pharmaceutical company. The traditional wet-lab experiments of DTI is expensive, time-consuming, and labor-intensive. Therefore, many computational techniques have been established for this purpose; although a huge number of interactions are still undiscovered. Here, we present pdti-EssB, a new computational model for identification of DTI using protein sequence and drug molecular structure. More specifically, each drug molecule is transformed as the molecular substructure fingerprint. For a protein sequence, different descriptors are utilized to represent its evolutionary, sequence, and structural information. Besides, our proposed method uses data balancing techniques to handle the imbalance problem and applies a novel feature eliminator to extract the best optimal features for accurate prediction. In this paper, four classes of DTI benchmark datasets are used to construct a predictive model with XGBoost. Here, the auROC is utilized as an evaluation metric to compare the performance of pdti-EssB method with recent methods, applying five-fold cross-validation. Finally, the experimental results indicate that our proposed method is able to outperform other approaches in predicting DTI, and introduces new drug-target interaction samples based on prediction probability scores. pdti-EssB webserver is available online at http://pdtiessb-uestc.com/.
Collapse
Affiliation(s)
- S M Hasan Mahmud
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China.
| | - Wenyu Chen
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China.
| | - Han Meng
- School of Political Science and Public Administration, University of Electronic Science and Technology of China, Chengdu, 611731, China.
| | - Hosney Jahan
- College of Computer Science, Sichuan University, Chengdu, 610065, China.
| | - Yongsheng Liu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China.
| | - S M Mamun Hasan
- Department of Internal Medicine, Rangpur Medical College, Rangpur, 5400, Bangladesh.
| |
Collapse
|
52
|
Pliakos K, Vens C. Network inference with ensembles of bi-clustering trees. BMC Bioinformatics 2019; 20:525. [PMID: 31660848 PMCID: PMC6819564 DOI: 10.1186/s12859-019-3104-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Accepted: 09/20/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Network inference is crucial for biomedicine and systems biology. Biological entities and their associations are often modeled as interaction networks. Examples include drug protein interaction or gene regulatory networks. Studying and elucidating such networks can lead to the comprehension of complex biological processes. However, usually we have only partial knowledge of those networks and the experimental identification of all the existing associations between biological entities is very time consuming and particularly expensive. Many computational approaches have been proposed over the years for network inference, nonetheless, efficiency and accuracy are still persisting open problems. Here, we propose bi-clustering tree ensembles as a new machine learning method for network inference, extending the traditional tree-ensemble models to the global network setting. The proposed approach addresses the network inference problem as a multi-label classification task. More specifically, the nodes of a network (e.g., drugs or proteins in a drug-protein interaction network) are modelled as samples described by features (e.g., chemical structure similarities or protein sequence similarities). The labels in our setting represent the presence or absence of links connecting the nodes of the interaction network (e.g., drug-protein interactions in a drug-protein interaction network). RESULTS We extended traditional tree-ensemble methods, such as extremely randomized trees (ERT) and random forests (RF) to ensembles of bi-clustering trees, integrating background information from both node sets of a heterogeneous network into the same learning framework. We performed an empirical evaluation, comparing the proposed approach to currently used tree-ensemble based approaches as well as other approaches from the literature. We demonstrated the effectiveness of our approach in different interaction prediction (network inference) settings. For evaluation purposes, we used several benchmark datasets that represent drug-protein and gene regulatory networks. We also applied our proposed method to two versions of a chemical-protein association network extracted from the STITCH database, demonstrating the potential of our model in predicting non-reported interactions. CONCLUSIONS Bi-clustering trees outperform existing tree-based strategies as well as machine learning methods based on other algorithms. Since our approach is based on tree-ensembles it inherits the advantages of tree-ensemble learning, such as handling of missing values, scalability and interpretability.
Collapse
Affiliation(s)
- Konstantinos Pliakos
- KU Leuven, Campus KULAK, Department of Public Health and Primary Care, Faculty of Medicine, Kortrijk, Belgium. .,ITEC, imec research group at KU Leuven, Kortrijk, Belgium.
| | - Celine Vens
- KU Leuven, Campus KULAK, Department of Public Health and Primary Care, Faculty of Medicine, Kortrijk, Belgium.,ITEC, imec research group at KU Leuven, Kortrijk, Belgium
| |
Collapse
|
53
|
Wu Z, Lei T, Shen C, Wang Z, Cao D, Hou T. ADMET Evaluation in Drug Discovery. 19. Reliable Prediction of Human Cytochrome P450 Inhibition Using Artificial Intelligence Approaches. J Chem Inf Model 2019; 59:4587-4601. [DOI: 10.1021/acs.jcim.9b00801] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
| | | | | | | | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, P. R. China
| | | |
Collapse
|
54
|
A Multi-Label Learning Framework for Drug Repurposing. Pharmaceutics 2019; 11:pharmaceutics11090466. [PMID: 31505805 PMCID: PMC6781509 DOI: 10.3390/pharmaceutics11090466] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2019] [Revised: 08/22/2019] [Accepted: 09/05/2019] [Indexed: 01/10/2023] Open
Abstract
Drug repurposing plays an important role in screening old drugs for new therapeutic efficacy. The existing methods commonly treat prediction of drug-target interaction as a problem of binary classification, in which a large number of randomly sampled drug-target pairs accounting for over 50% of the entire training dataset are necessarily required. Such a large number of negative examples that do not come from experimental observations inevitably decrease the credibility of predictions. In this study, we propose a multi-label learning framework to find new uses for old drugs and discover new drugs for known target genes. In the framework, each drug is treated as a class label and its target genes are treated as the class-specific training data to train a supervised learning model of l2-regularized logistic regression. As such, the inter-drug associations are explicitly modelled into the framework and all the class-specific training data come from experimental observations. In addition, the data constraint is less demanding, for instance, the chemical substructures of a drug are no longer needed and the novel target genes are inferred only from the underlying patterns of the known genes targeted by the drug. Stratified multi-label cross-validation shows that 84.9% of known target genes have at least one drug correctly recognized, and the proposed framework correctly recognizes 86.73% of the independent test drug-target interactions (DTIs) from DrugBank. These results show that the proposed framework could generalize well in the large drug/class space without the information of drug chemical structures and target protein structures. Furthermore, we use the trained model to predict new drugs for the known target genes, identify new genes for the old drugs, and infer new associations between old drugs and new disease phenotypes via the OMIM database. Gene ontology (GO) enrichment analyses and the disease associations reported in recent literature provide supporting evidences to the computational results, which potentially shed light on new clinical therapies for new and/or old disease phenotypes.
Collapse
|
55
|
Gao YL, Cui Z, Liu JX, Wang J, Zheng CH. NPCMF: Nearest Profile-based Collaborative Matrix Factorization method for predicting miRNA-disease associations. BMC Bioinformatics 2019; 20:353. [PMID: 31234797 PMCID: PMC6591872 DOI: 10.1186/s12859-019-2956-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Accepted: 06/17/2019] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Predicting meaningful miRNA-disease associations (MDAs) is costly. Therefore, an increasing number of researchers are beginning to focus on methods to predict potential MDAs. Thus, prediction methods with improved accuracy are under development. An efficient computational method is proposed to be crucial for predicting novel MDAs. For improved experimental productivity, large biological datasets are used by researchers. Although there are many effective and feasible methods to predict potential MDAs, the possibility remains that these methods are flawed. RESULTS A simple and effective method, known as Nearest Profile-based Collaborative Matrix Factorization (NPCMF), is proposed to identify novel MDAs. The nearest profile is introduced to our method to achieve the highest AUC value compared with other advanced methods. For some miRNAs and diseases without any association, we use the nearest neighbour information to complete the prediction. CONCLUSIONS To evaluate the performance of our method, five-fold cross-validation is used to calculate the AUC value. At the same time, three disease cases, gastric neoplasms, rectal neoplasms and colonic neoplasms, are used to predict novel MDAs on a gold-standard dataset. We predict the vast majority of known MDAs and some novel MDAs. Finally, the prediction accuracy of our method is determined to be better than that of other existing methods. Thus, the proposed prediction model can obtain reliable experimental results.
Collapse
Affiliation(s)
- Ying-Lian Gao
- Library of Qufu Normal University, Qufu Normal University, Rizhao, China
| | - Zhen Cui
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China. .,Co-Innovation Center for Information Supply and Assurance Technology, Anhui University, Hefei, China.
| | - Juan Wang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, China
| | - Chun-Hou Zheng
- Co-Innovation Center for Information Supply and Assurance Technology, Anhui University, Hefei, China
| |
Collapse
|
56
|
Wu M, Luo Y, Liang F. Accelerate Training of Restricted Boltzmann Machines via Iterative Conditional Maximum Likelihood Estimation. STATISTICS AND ITS INTERFACE 2019; 12:377-385. [PMID: 33859774 PMCID: PMC8046342 DOI: 10.4310/18-sii552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Restricted Boltzmann machines (RBMs) have become a popular tool of feature coding or extraction for unsupervised learning in recent years. However, there still lacks an efficient algorithm for training the RBM due to that its likelihood function contains an intractable normalizing constant. The existing algorithms, such as contrastive divergence and its variants, approximate the gradient of the likelihood function using Markov chain Monte Carlo. However, the approximation is time consuming and, moreover, the approximation error often impedes the convergence of the training algorithm. This paper proposes a fast algorithm for training RBMs by treating the hidden states as missing data and then estimating the parameters of the RBM via an iterative conditional maximum likelihood estimation approach, which avoids the issue of intractable normalizing constants. The numerical results indicate that the proposed algorithm can provide a drastic improvement over the contrastive divergence algorithm in RBM training. This paper also presents an extension of the proposed algorithm for how to cope with missing data in RBM training and illustrates its application using an example about drug-target interaction prediction.
Collapse
Affiliation(s)
- Mingqi Wu
- Shell, 150 N Dairy Ashford Rd Houston, Texas 77079, USA
| | - Ye Luo
- Faculty of Business and Economics University of Hong Kong Hong Kong, China
| | - Faming Liang
- Department of Statistics Purdue University West Lafayette, IN 47907, USA
| |
Collapse
|
57
|
Sachdev K, Gupta MK. A comprehensive review of feature based methods for drug target interaction prediction. J Biomed Inform 2019; 93:103159. [PMID: 30926470 DOI: 10.1016/j.jbi.2019.103159] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 03/25/2019] [Accepted: 03/26/2019] [Indexed: 12/22/2022]
Abstract
Drug target interaction is a prominent research area in the field of drug discovery. It refers to the recognition of interactions between chemical compounds and the protein targets in the human body. Wet lab experiments to identify these interactions are expensive as well as time consuming. The computational methods of interaction prediction help limit the search space for these experiments. These computational methods can be divided into ligand based approaches, docking approaches and chemogenomic approaches. In this review, we aim to describe the various feature based chemogenomic methods for drug target interaction prediction. It provides a comprehensive overview of the various techniques, datasets, tools and metrics. The feature based methods have been categorized, explained and compared. A novel framework for drug target interaction prediction has also been proposed that aims to improve the performance of existing methods. To the best of our knowledge, this is the first comprehensive review focusing only on feature based methods of drug target interaction.
Collapse
Affiliation(s)
- Kanica Sachdev
- Computer Science and Engineering Department, SMVDU, J&K, India.
| | | |
Collapse
|
58
|
CFSBoost: Cumulative feature subspace boosting for drug-target interaction prediction. J Theor Biol 2019; 464:1-8. [DOI: 10.1016/j.jtbi.2018.12.024] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 12/13/2018] [Accepted: 12/18/2018] [Indexed: 01/12/2023]
|
59
|
Cui Z, Gao YL, Liu JX, Wang J, Shang J, Dai LY. The computational prediction of drug-disease interactions using the dual-network L 2,1-CMF method. BMC Bioinformatics 2019; 20:5. [PMID: 30611214 PMCID: PMC6320570 DOI: 10.1186/s12859-018-2575-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 12/10/2018] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Predicting drug-disease interactions (DDIs) is time-consuming and expensive. Improving the accuracy of prediction results is necessary, and it is crucial to develop a novel computing technology to predict new DDIs. The existing methods mostly use the construction of heterogeneous networks to predict new DDIs. However, the number of known interacting drug-disease pairs is small, so there will be many errors in this heterogeneous network that will interfere with the final results. RESULTS A novel method, known as the dual-network L2,1-collaborative matrix factorization, is proposed to predict novel DDIs. The Gaussian interaction profile kernels and L2,1-norm are introduced in our method to achieve better results than other advanced methods. The network similarities of drugs and diseases with their chemical and semantic similarities are combined in this method. CONCLUSIONS Cross validation is used to evaluate our method, and simulation experiments are used to predict new interactions using two different datasets. Finally, our prediction accuracy is better than other existing methods. This proves that our method is feasible and effective.
Collapse
Affiliation(s)
- Zhen Cui
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826 China
| | - Ying-Lian Gao
- Library of Qufu Normal University, Qufu Normal University, Rizhao, China
| | - Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826 China
| | - Juan Wang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826 China
| | - Junliang Shang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826 China
| | - Ling-Yun Dai
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826 China
| |
Collapse
|
60
|
Ezzat A, Wu M, Li X, Kwoh CK. Computational Prediction of Drug-Target Interactions via Ensemble Learning. Methods Mol Biol 2019; 1903:239-254. [PMID: 30547446 DOI: 10.1007/978-1-4939-8955-3_14] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Therapeutic effects of drugs are mediated via interactions between them and their intended targets. As such, prediction of drug-target interactions is of great importance. Drug-target interaction prediction is especially relevant in the case of drug repositioning where attempts are made to repurpose old drugs for new indications. While experimental wet-lab techniques exist for predicting such interactions, they are tedious and time-consuming. On the other hand, computational methods also exist for predicting interactions, and they do so with reasonable accuracy. In addition, computational methods can help guide their wet-lab counterparts by recommending interactions for further validation. In this chapter, a computational method for predicting drug-target interactions is presented. Specifically, we describe a machine learning method that utilizes ensemble learning to perform predictions. We also mention details pertaining to the preparation of the data required for the prediction effort and demonstrate how to evaluate and improve prediction performance.
Collapse
Affiliation(s)
- Ali Ezzat
- Biomedical Informatics Lab, School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore.
| | - Min Wu
- Data Analytics Department, Institute for Infocomm Research, A-Star, Singapore, Singapore
| | - Xiaoli Li
- Data Analytics Department, Institute for Infocomm Research, A-Star, Singapore, Singapore
| | - Chee-Keong Kwoh
- Division of Software and Information Systems, School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
61
|
Sharma A, Rani R. BE-DTI': Ensemble framework for drug target interaction prediction using dimensionality reduction and active learning. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 165:151-162. [PMID: 30337070 DOI: 10.1016/j.cmpb.2018.08.011] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 08/03/2018] [Accepted: 08/17/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND AND OBJECTIVE Drug-target interaction prediction plays an intrinsic role in the drug discovery process. Prediction of novel drugs and targets helps in identifying optimal drug therapies for various stringent diseases. Computational prediction of drug-target interactions can help to identify potential drug-target pairs and speed-up the process of drug repositioning. In our present, work we have focused on machine learning algorithms for predicting drug-target interactions from the pool of existing drug-target data. The key idea is to train the classifier using existing DTI so as to predict new or unknown DTI. However, there are various challenges such as class imbalance and high dimensional nature of data that need to be addressed before developing optimal drug-target interaction model. METHODS In this paper, we propose a bagging based ensemble framework named BE-DTI' for drug-target interaction prediction using dimensionality reduction and active learning to deal with class-imbalanced data. Active learning helps to improve under-sampling bagging based ensembles. Dimensionality reduction is used to deal with high dimensional data. RESULTS Results show that the proposed technique outperforms the other five competing methods in 10-fold cross-validation experiments in terms of AUC=0.927, Sensitivity=0.886, Specificity=0.864, and G-mean=0.874. CONCLUSION Missing interactions and new interactions are predicted using the proposed framework. Some of the known interactions are removed from the original dataset and their interactions are recalculated to check the accuracy of the proposed framework. Moreover, validation of the proposed approach is performed using the external dataset. All these results show that structurally similar drugs tend to interact with similar targets.
Collapse
Affiliation(s)
- Aman Sharma
- Computer Science and Engineering Department, Thapar Institute of Engineering & Technology, Punjab, Patiala, India.
| | - Rinkle Rani
- Computer Science and Engineering Department, Thapar Institute of Engineering & Technology, Punjab, Patiala, India.
| |
Collapse
|
62
|
Ibrahim SJA, Thangamani M. Prediction of Novel Drugs and Diseases for Hepatocellular Carcinoma Based on Multi-Source Simulated Annealing Based Random Walk. J Med Syst 2018; 42:188. [PMID: 30173379 DOI: 10.1007/s10916-018-1038-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Accepted: 08/20/2018] [Indexed: 01/09/2023]
Abstract
Computational techniques for foreseeing drug-disease associations by means of incorporating gene expression as well as biological network give high intuitions to the composite associations amongst targets, drugs, disease genes in addition to the diseases at a system level. Hepatocellular Carcinoma (HCC) is a malevolent tumor containing a greater rate of sickness as well as mortality. In the present work, an Integrative framework is presented with the aim of resolving this problem, for identifying new Drugs for HCC dependent upon Multi-Source Random Walk (PD-MRW), in which score the complete drugs by means of building the drug-drug similarity network. On the other hand, the collection of clinical phenotypes as well as drug side effects in combination with patient-specific genetic info. As a result, the formation of disease-drug networks that denotes the prescriptions, which are allotted to treat those diseases that are not concentrated by means of PD-MRW model. With the aim of overcoming this issue, this research offers an integrative framework for foreseeing new drugs as well as diseases for HCC dependent upon Multi-Source Simulated Annealing based Random Walk (PDD-MSSARW). Primarily, build a Gene-Gene Weighted Interaction Network (GWIN), dependent upon the gene expression as well as protein interaction network. After that, construct a drug-drug similarity network, dependent upon multi-source random walk in GWIN, disease-drug similarity network with the help of Similarity Weighted Bipartite Graph Network (SWBGN) that is build up in which the nodes are drugs as well as association among one node to another node that explains the disease diagnoses. Lastly, dependent upon the known drugs for HCC, score the entire drugs in the similarity networks. The sturdiness of the likelihoods, their overlap with those stated in Comparative Toxicogenomics Database (CTD) as well as kinds of literature, and their enhanced KEGG pathway illustrate PDD-MSSARW method be capable of efficiently find out novel drug signs.
Collapse
Affiliation(s)
| | - M Thangamani
- Kongu Engineering College, Perundurai, Tamilnadu, India
| |
Collapse
|
63
|
Chen R, Liu X, Jin S, Lin J, Liu J. Machine Learning for Drug-Target Interaction Prediction. Molecules 2018; 23:E2208. [PMID: 30200333 PMCID: PMC6225477 DOI: 10.3390/molecules23092208] [Citation(s) in RCA: 120] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2018] [Revised: 08/27/2018] [Accepted: 08/27/2018] [Indexed: 12/18/2022] Open
Abstract
Identifying drug-target interactions will greatly narrow down the scope of search of candidate medications, and thus can serve as the vital first step in drug discovery. Considering that in vitro experiments are extremely costly and time-consuming, high efficiency computational prediction methods could serve as promising strategies for drug-target interaction (DTI) prediction. In this review, our goal is to focus on machine learning approaches and provide a comprehensive overview. First, we summarize a brief list of databases frequently used in drug discovery. Next, we adopt a hierarchical classification scheme and introduce several representative methods of each category, especially the recent state-of-the-art methods. In addition, we compare the advantages and limitations of methods in each category. Lastly, we discuss the remaining challenges and future outlook of machine learning in DTI prediction. This article may provide a reference and tutorial insights on machine learning-based DTI prediction for future researchers.
Collapse
Affiliation(s)
- Ruolan Chen
- Department of Computer Science, School of Information Science and Technology, Xiamen University, Xiamen 361005, China.
| | - Xiangrong Liu
- Department of Computer Science, School of Information Science and Technology, Xiamen University, Xiamen 361005, China.
| | - Shuting Jin
- Department of Computer Science, School of Information Science and Technology, Xiamen University, Xiamen 361005, China.
| | - Jiawei Lin
- Department of Computer Science, School of Information Science and Technology, Xiamen University, Xiamen 361005, China.
| | - Juan Liu
- Department of Instrumental and Electrical Engineering, School of Aerospace Engineering, Xiamen University, Xiamen 361005, China.
| |
Collapse
|
64
|
Ding P, Yin R, Luo J, Kwoh CK. Ensemble Prediction of Synergistic Drug Combinations Incorporating Biological, Chemical, Pharmacological, and Network Knowledge. IEEE J Biomed Health Inform 2018; 23:1336-1345. [PMID: 29994408 DOI: 10.1109/jbhi.2018.2852274] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Combinatorial therapy may reduce drug side effects and improve drug efficacy, making combination therapy a promising strategy to treat complex diseases. However, in the existing computational methods, the natural properties and network knowledge of drugs have not been adequately and simultaneously considered, making it difficult to identify effective drug combinations. Computational methods that incorporate multiple sources of information (biological, chemical, pharmacological, and network knowledge) offer more opportunities to screen synergistic drug combinations. Therefore, we developed a novel Ensemble Prediction framework of Synergistic Drug Combinations (EPSDC) to accurately and efficiently predict drug combinations by integrating information from multiple-sources. EPSDC constructs feature vector of drug pair by concatenating different types of drug similarities, and then uses these groups in a feature-based base predictor. Next, transductive learning is applied on heterogeneous drug-target networks to achieve a network-based score for the drug pair. Finally, two types of ensemble rules are introduced to combine the feature-based score and the network-based score, and then potential drug combinations are prioritized. To demonstrate the effect of the ensemble rule, comprehensive experiments were conducted to compare single models and ensemble models. The experimental results indicated that our method outperformed the state-of-the-art method in five-fold cross validation and de novo prediction tests on the two benchmark datasets. We further analyzed the effect of maximum length of the meta-path and the impacts of different types of features. Moreover, the practical usefulness of our method was confirmed in the predicted novel drug combinations. The source code of EPSDC is available at https://github.com/KDDing/EPSDC.
Collapse
|
65
|
Drug-Target Interaction Prediction in Drug Repositioning Based on Deep Semi-Supervised Learning. COMPUTATIONAL INTELLIGENCE AND ITS APPLICATIONS 2018. [DOI: 10.1007/978-3-319-89743-1_27] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
66
|
iDTI-ESBoost: Identification of Drug Target Interaction Using Evolutionary and Structural Features with Boosting. Sci Rep 2017; 7:17731. [PMID: 29255285 PMCID: PMC5735173 DOI: 10.1038/s41598-017-18025-2] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Accepted: 12/05/2017] [Indexed: 02/07/2023] Open
Abstract
Prediction of new drug-target interactions is critically important as it can lead the researchers to find new uses for old drugs and to disclose their therapeutic profiles or side effects. However, experimental prediction of drug-target interactions is expensive and time-consuming. As a result, computational methods for predictioning new drug-target interactions have gained a tremendous interest in recent times. Here we present iDTI-ESBoost, a prediction model for identification of drug-target interactions using evolutionary and structural features. Our proposed method uses a novel data balancing and boosting technique to predict drug-target interaction. On four benchmark datasets taken from a gold standard data, iDTI-ESBoost outperforms the state-of-the-art methods in terms of area under receiver operating characteristic (auROC) curve. iDTI-ESBoost also outperforms the latest and the best-performing method found in the literature in terms of area under precision recall (auPR) curve. This is significant as auPR curves are argued as suitable metric for comparison for imbalanced datasets similar to the one studied here. Our reported results show the effectiveness of the classifier, balancing methods and the novel features incorporated in iDTI-ESBoost. iDTI-ESBoost is a novel prediction method that has for the first time exploited the structural features along with the evolutionary features to predict drug-protein interactions. We believe the excellent performance of iDTI-ESBoost both in terms of auROC and auPR would motivate the researchers and practitioners to use it to predict drug-target interactions. To facilitate that, iDTI-ESBoost is implemented and made publicly available at: http://farshidrayhan.pythonanywhere.com/iDTI-ESBoost/.
Collapse
|
67
|
Cheng T, Hao M, Takeda T, Bryant SH, Wang Y. Large-Scale Prediction of Drug-Target Interaction: a Data-Centric Review. AAPS J 2017; 19:1264-1275. [PMID: 28577120 PMCID: PMC11097213 DOI: 10.1208/s12248-017-0092-6] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Accepted: 04/25/2017] [Indexed: 11/30/2022] Open
Abstract
The prediction of drug-target interactions (DTIs) is of extraordinary significance to modern drug discovery in terms of suggesting new drug candidates and repositioning old drugs. Despite technological advances, large-scale experimental determination of DTIs is still expensive and laborious. Effective and low-cost computational alternatives remain in strong need. Meanwhile, open-access resources have been rapidly growing with massive amount of bioactivity data becoming available, creating unprecedented opportunities for the development of novel in silico models for large-scale DTI prediction. In this work, we review the state-of-the-art computational approaches for identifying DTIs from a data-centric perspective: what the underlying data are and how they are utilized in each study. We also summarize popular public data resources and online tools for DTI prediction. It is found that various types of data were employed including properties of chemical structures, drug therapeutic effects and side effects, drug-target binding, drug-drug interactions, bioactivity data of drug molecules across multiple biological targets, and drug-induced gene expressions. More often, the heterogeneous data were integrated to offer better performance. However, challenges remain such as handling data imbalance, incorporating negative samples and quantitative bioactivity data, as well as maintaining cross-links among different data sources, which are essential for large-scale and automated information integration.
Collapse
Affiliation(s)
- Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Ming Hao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Takako Takeda
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Stephen H Bryant
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Yanli Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
68
|
Schönbach C, Verma C, Bond PJ, Ranganathan S. Bioinformatics and systems biology research update from the 15 th International Conference on Bioinformatics (InCoB2016). BMC Bioinformatics 2016; 17:524. [PMID: 28155668 PMCID: PMC5259976 DOI: 10.1186/s12859-016-1409-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The International Conference on Bioinformatics (InCoB) has been publishing peer-reviewed conference papers in BMC Bioinformatics since 2006. Of the 44 articles accepted for publication in supplement issues of BMC Bioinformatics, BMC Genomics, BMC Medical Genomics and BMC Systems Biology, 24 articles with a bioinformatics or systems biology focus are reviewed in this editorial. InCoB2017 is scheduled to be held in Shenzen, China, September 20-22, 2017.
Collapse
Affiliation(s)
- Christian Schönbach
- International Research Center for Medical Sciences, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, 860-0811 Japan
| | - Chandra Verma
- Bioinformatics Institute, Agency for Science, Technology and Research (A∗STAR), Singapore, 138671 Singapore
| | - Peter J. Bond
- Bioinformatics Institute, Agency for Science, Technology and Research (A∗STAR), Singapore, 138671 Singapore
| | - Shoba Ranganathan
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109 Australia
| |
Collapse
|