151
|
Shi JY, Zhang AQ, Zhang SW, Mao KT, Yiu SM. A unified solution for different scenarios of predicting drug-target interactions via triple matrix factorization. BMC SYSTEMS BIOLOGY 2018; 12:136. [PMID: 30598094 PMCID: PMC6311903 DOI: 10.1186/s12918-018-0663-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Background During the identification of potential candidates, computational prediction of drug-target interactions (DTIs) is important to subsequent expensive validation in wet-lab. DTI screening considers four scenarios, depending on whether the drug is an existing or a new drug and whether the target is an existing or a new target. However, existing approaches have the following limitations. First, only a few of them can address the most difficult scenario (i.e., predicting interactions between new drugs and new targets). More importantly, none of the existing approaches could provide the explicit information for understanding the mechanism of forming interactions, such as the drug-target feature pairs contributing to the interactions. Results In this paper, we propose a Triple Matrix Factorization-based model (TMF) to tackle these problems. Compared with former state-of-the-art predictive methods, TMF demonstrates its significant superiority by assessing the predictions on four benchmark datasets over four kinds of screening scenarios. Also, it exhibits its outperformance by validating predicted novel interactions. More importantly, by using PubChem fingerprints of chemical structures as drug features and occurring frequencies of amino acid trimer as protein features, TMF shows its ability to find out the features determining interactions, including dominant feature pairs, frequently occurring substructures, and conserved triplet of amino acids. Conclusions Our TMF provides a unified framework of DTI prediction for all the screening scenarios. It also presents a new insight for the underlying mechanism of DTIs by indicating dominant features, which play important roles in the forming of DTI. Electronic supplementary material The online version of this article (10.1186/s12918-018-0663-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'An, China.
| | - An-Qi Zhang
- School of Life Sciences, Northwestern Polytechnical University, Xi'An, China
| | - Shao-Wu Zhang
- School of Automations, Northwestern Polytechnical University, Xi'An, China
| | - Kui-Tao Mao
- School of Computer Science, Northwestern Polytechnical University, Xi'An, China
| | - Siu-Ming Yiu
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
152
|
Shi H, Liu S, Chen J, Li X, Ma Q, Yu B. Predicting drug-target interactions using Lasso with random forest based on evolutionary information and chemical structure. Genomics 2018; 111:1839-1852. [PMID: 30550813 DOI: 10.1016/j.ygeno.2018.12.007] [Citation(s) in RCA: 89] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Revised: 12/06/2018] [Accepted: 12/07/2018] [Indexed: 01/01/2023]
Abstract
The identification of drug-target interactions has great significance for pharmaceutical scientific research. Since traditional experimental methods identifying drug-target interactions is costly and time-consuming, the use of machine learning methods to predict potential drug-target interactions has attracted widespread attention. This paper presents a novel drug-target interactions prediction method called LRF-DTIs. Firstly, the pseudo-position specific scoring matrix (PsePSSM) and FP2 molecular fingerprinting were used to extract the features of drug-target. Secondly, using Lasso to reduce the dimension of the extracted feature information and then the Synthetic Minority Oversampling Technique (SMOTE) method was used to deal with unbalanced data. Finally, the processed feature vectors were input into a random forest (RF) classifier to predict drug-target interactions. Through 10 trials of 5-fold cross-validation, the overall prediction accuracies on the enzyme, ion channel (IC), G-protein-coupled receptor (GPCR) and nuclear receptor (NR) datasets reached 98.09%, 97.32%, 95.69%, and 94.88%, respectively, and compared with other prediction methods. In addition, we have tested and verified that our method not only could be applied to predict the new interactions but also could obtain a satisfactory result on the new dataset. All the experimental results indicate that our method can significantly improve the prediction accuracy of drug-target interactions and play a vital role in the new drug research and target protein development. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/LRF-DTIs/ for academic use.
Collapse
Affiliation(s)
- Han Shi
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China; Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China
| | - Simin Liu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China; Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China
| | - Junqi Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China; Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China
| | - Xuan Li
- Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China; School of Life Sciences, University of Science and Technology of China, Hefei 230027, China.
| |
Collapse
|
153
|
Wang M, Tang C, Chen J. Drug-Target Interaction Prediction via Dual Laplacian Graph Regularized Matrix Completion. BIOMED RESEARCH INTERNATIONAL 2018; 2018:1425608. [PMID: 30627536 PMCID: PMC6304580 DOI: 10.1155/2018/1425608] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 09/03/2018] [Accepted: 10/24/2018] [Indexed: 01/16/2023]
Abstract
Drug-target interactions play an important role for biomedical drug discovery and development. However, it is expensive and time-consuming to accomplish this task by experimental determination. Therefore, developing computational techniques for drug-target interaction prediction is urgent and has practical significance. In this work, we propose an effective computational model of dual Laplacian graph regularized matrix completion, referred to as DLGRMC briefly, to infer the unknown drug-target interactions. Specifically, DLGRMC transforms the task of drug-target interaction prediction into a matrix completion problem, in which the potential interactions between drugs and targets can be obtained based on the prediction scores after the matrix completion procedure. In DLGRMC, the drug pairwise chemical structure similarities and the target pairwise genomic sequence similarities are fully exploited to serve the matrix completion by using a dual Laplacian graph regularization term; i.e., drugs with similar chemical structure are more likely to have interactions with similar targets and targets with similar genomic sequence similarity are more likely to have interactions with similar drugs. In addition, during the matrix completion process, an indicator matrix with binary values which indicates the indices of the observed drug-target interactions is deployed to preserve the experimental confirmed interactions. Furthermore, we develop an alternative iterative strategy to solve the constrained matrix completion problem based on Augmented Lagrange Multiplier algorithm. We evaluate DLGRMC on five benchmark datasets and the results show that DLGRMC outperforms several state-of-the-art approaches in terms of 10-fold cross validation based AUPR values and PR curves. In addition, case studies also demonstrate that DLGRMC can successfully predict most of the experimental validated drug-target interactions.
Collapse
Affiliation(s)
- Minhui Wang
- Department of Pharmacy, People's Hospital of Lian'shui County, Huai'an 223300, China
| | - Chang Tang
- School of Computer Science, China University of Geosciences, Wuhan 430074, China
| | - Jiajia Chen
- Department of Pharmacy, The Affiliated Huai'an Hospital of Xuzhou Medical University, Huai'an 223002, China
| |
Collapse
|
154
|
|
155
|
Wang A, Lim H, Cheng SY, Xie L. ANTENNA, a Multi-Rank, Multi-Layered Recommender System for Inferring Reliable Drug-Gene-Disease Associations: Repurposing Diazoxide as a Targeted Anti-Cancer Therapy. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1960-1967. [PMID: 29993812 PMCID: PMC6139288 DOI: 10.1109/tcbb.2018.2812189] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Existing drug discovery processes follow a reductionist model of "one-drug-one-gene-one-disease," which is inadequate to tackle complex diseases involving multiple malfunctioned genes. The availability of big omics data offers opportunities to transform drug discovery process into a new paradigm of systems pharmacology that focuses on designing drugs to target molecular interaction networks instead of a single gene. Here, we develop a reliable multi-rank, multi-layered recommender system, ANTENNA, to mine large-scale chemical genomics and disease association data for prediction of novel drug-gene-disease associations. ANTENNA integrates a novel tri-factorization based dual-regularized weighted and imputed One Class Collaborative Filtering (OCCF) algorithm, tREMAP, with a statistical framework based on Random Walk with Restart and assess the reliability of specific predictions. In the benchmark, tREMAP clearly outperforms the single-rank OCCF. We apply ANTENNA to a real-world problem: repurposing old drugs for new clinical indications without effective treatments. We discover that FDA-approved drug diazoxide can inhibit multiple kinase genes responsible for many diseases including cancer and kill triple negative breast cancer (TNBC) cells efficiently [Formula: see text]. TNBC is a deadly disease without effective targeted therapies. Our finding demonstrates the power of big data analytics in drug discovery and developing a targeted therapy for TNBC.
Collapse
|
156
|
Cheng Z, Zhou S, Wang Y, Liu H, Guan J, Chen YPP. Effectively Identifying Compound-Protein Interactions by Learning from Positive and Unlabeled Examples. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1832-1843. [PMID: 28113437 DOI: 10.1109/tcbb.2016.2570211] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Prediction of compound-protein interactions (CPIs) is to find new compound-protein pairs where a protein is targeted by at least a compound, which is a crucial step in new drug design. Currently, a number of machine learning based methods have been developed to predict new CPIs in the literature. However, as there is not yet any publicly available set of validated negative CPIs, most existing machine learning based approaches use the unknown interactions (not validated CPIs) selected randomly as the negative examples to train classifiers for predicting new CPIs. Obviously, this is not quite reasonable and unavoidably impacts the CPI prediction performance. In this paper, we simply take the unknown CPIs as unlabeled examples, and propose a new method called PUCPI (the abbreviation of PU learning for Compound-Protein Interaction identification) that employs biased-SVM (Support Vector Machine) to predict CPIs using only positive and unlabeled examples. PU learning is a class of learning methods that leans from positive and unlabeled (PU) samples. To the best of our knowledge, this is the first work that identifies CPIs using only positive and unlabeled examples. We first collect known CPIs as positive examples and then randomly select compound-protein pairs not in the positive set as unlabeled examples. For each CPI/compound-protein pair, we extract protein domains as protein features and compound substructures as chemical features, then take the tensor product of the corresponding compound features and protein features as the feature vector of the CPI/compound-protein pair. After that, biased-SVM is employed to train classifiers on different datasets of CPIs and compound-protein pairs. Experiments over various datasets show that our method outperforms six typical classifiers, including random forest, L1- and L2-regularized logistic regression, naive Bayes, SVM and k-nearest neighbor (kNN), and three types of existing CPI prediction models. More information can be found at http://admis.fudan.edu.cn/projects/pucpi.html.
Collapse
|
157
|
Stock M, Pahikkala T, Airola A, Waegeman W, De Baets B. Algebraic shortcuts for leave-one-out cross-validation in supervised network inference. Brief Bioinform 2018; 21:262-271. [PMID: 30329015 DOI: 10.1093/bib/bby095] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Revised: 08/21/2018] [Accepted: 09/06/2018] [Indexed: 12/20/2022] Open
Abstract
Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using a model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. This distinction matters because (i) the performance might dramatically differ between the prediction settings and (ii) tuning the model hyperparameters to obtain the best possible model depends on the setting of interest. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings.In this work we discuss a state-of-the-art kernel-based network inference technique called two-step kernel ridge regression. We show that this regression model can be trained efficiently, with a time complexity scaling with the number of vertices rather than the number of edges. Furthermore, this framework leads to a series of cross-validation shortcuts that allow one to rapidly estimate the model performance for any relevant network prediction setting. This allows computational biologists to fully assess the capabilities of their models. The machine learning techniques with the algebraic shortcuts are implemented in the RLScore software package: https://github.com/aatapa/RLScore.
Collapse
Affiliation(s)
- Michiel Stock
- Department of Data Analysis and Mathematical Modelling, Ghent University, Belgium
| | - Tapio Pahikkala
- Department of Future Technologies, University of Turku, Finland
| | - Antti Airola
- Department of Future Technologies, University of Turku, Finland
| | - Willem Waegeman
- Department of Data Analysis and Mathematical Modelling, Ghent University, Belgium
| | - Bernard De Baets
- Department of Data Analysis and Mathematical Modelling, Ghent University, Belgium
| |
Collapse
|
158
|
Hao M, Bryant SH, Wang Y. A new chemoinformatics approach with improved strategies for effective predictions of potential drugs. J Cheminform 2018; 10:50. [PMID: 30311095 PMCID: PMC6755712 DOI: 10.1186/s13321-018-0303-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2018] [Accepted: 10/02/2018] [Indexed: 12/24/2022] Open
Abstract
Background Fast and accurate identification of potential drug candidates against therapeutic targets (i.e., drug–target interactions, DTIs) is a fundamental step in the early drug discovery process. However, experimental determination of DTIs is time-consuming and costly, especially for testing the associations between the entire chemical and genomic spaces. Therefore, computationally efficient algorithms with accurate predictions are required to achieve such a challenging task. In this work, we design a new chemoinformatics approach derived from neighbor-based collaborative filtering (NBCF) to infer potential drug candidates for targets of interest. One of the fundamental steps of NBCF in the application of DTI predictions is to accurately measure the similarity between drugs solely based on the DTI profiles of known knowledge. However, commonly used similarity calculation methods such as COSINE may be noise-prone due to the extremely sparse property of the DTI bipartite network, which decreases the model performance of NBCF. We herein propose three strategies to remedy such a dilemma, which include: (1) adopting a positive pointwise mutual information (PPMI)-based similarity metric, which is noise-immune to some extent; (2) performing low-rank approximation of the original prediction scores; (3) incorporating auxiliary (complementary) information to produce the final predictions. Results We test the proposed methods in three benchmark datasets and the results indicate that our strategies are helpful to improve the NBCF performance for DTI predictions. Comparing to the prior algorithm, our methods exhibit better results assessed by a recall-based evaluation metric. Conclusions A new chemoinformatics approach with improved strategies was successfully developed to predict potential DTIs. Among them, the model based on the sparsity resistant PPMI similarity metric exhibits the best performance, which may be helpful to researchers for identifying potential drugs against therapeutic targets of interest, and can also be applied to related research such as identifying candidate disease genes.
Collapse
Affiliation(s)
- Ming Hao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Stephen H Bryant
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Yanli Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
159
|
Playe B, Azencott CA, Stoven V. Efficient multi-task chemogenomics for drug specificity prediction. PLoS One 2018; 13:e0204999. [PMID: 30286165 PMCID: PMC6171913 DOI: 10.1371/journal.pone.0204999] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Accepted: 09/18/2018] [Indexed: 01/10/2023] Open
Abstract
Adverse drug reactions, also called side effects, range from mild to fatal clinical events and significantly affect the quality of care. Among other causes, side effects occur when drugs bind to proteins other than their intended target. As experimentally testing drug specificity against the entire proteome is out of reach, we investigate the application of chemogenomics approaches. We formulate the study of drug specificity as a problem of predicting interactions between drugs and proteins at the proteome scale. We build several benchmark datasets, and propose NN-MT, a multi-task Support Vector Machine (SVM) algorithm that is trained on a limited number of data points, in order to solve the computational issues or proteome-wide SVM for chemogenomics. We compare NN-MT to different state-of-the-art methods, and show that its prediction performances are similar or better, at an efficient calculation cost. Compared to its competitors, the proposed method is particularly efficient to predict (protein, ligand) interactions in the difficult double-orphan case, i.e. when no interactions are previously known for the protein nor for the ligand. The NN-MT algorithm appears to be a good default method providing state-of-the-art or better performances, in a wide range of prediction scenario that are considered in the present study: proteome-wide prediction, protein family prediction, test (protein, ligand) pairs dissimilar to pairs in the train set, and orphan cases.
Collapse
Affiliation(s)
- Benoit Playe
- Center for Computational Biology, Mines ParisTech, PSL Research University, Paris, France
- Institut Curie F-75248, Paris, France
- INSERM U900, F-75248, Paris, France
- * E-mail:
| | - Chloé-Agathe Azencott
- Center for Computational Biology, Mines ParisTech, PSL Research University, Paris, France
- Institut Curie F-75248, Paris, France
- INSERM U900, F-75248, Paris, France
| | - Véronique Stoven
- Center for Computational Biology, Mines ParisTech, PSL Research University, Paris, France
- Institut Curie F-75248, Paris, France
- INSERM U900, F-75248, Paris, France
| |
Collapse
|
160
|
Sharma A, Rani R. BE-DTI': Ensemble framework for drug target interaction prediction using dimensionality reduction and active learning. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 165:151-162. [PMID: 30337070 DOI: 10.1016/j.cmpb.2018.08.011] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 08/03/2018] [Accepted: 08/17/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND AND OBJECTIVE Drug-target interaction prediction plays an intrinsic role in the drug discovery process. Prediction of novel drugs and targets helps in identifying optimal drug therapies for various stringent diseases. Computational prediction of drug-target interactions can help to identify potential drug-target pairs and speed-up the process of drug repositioning. In our present, work we have focused on machine learning algorithms for predicting drug-target interactions from the pool of existing drug-target data. The key idea is to train the classifier using existing DTI so as to predict new or unknown DTI. However, there are various challenges such as class imbalance and high dimensional nature of data that need to be addressed before developing optimal drug-target interaction model. METHODS In this paper, we propose a bagging based ensemble framework named BE-DTI' for drug-target interaction prediction using dimensionality reduction and active learning to deal with class-imbalanced data. Active learning helps to improve under-sampling bagging based ensembles. Dimensionality reduction is used to deal with high dimensional data. RESULTS Results show that the proposed technique outperforms the other five competing methods in 10-fold cross-validation experiments in terms of AUC=0.927, Sensitivity=0.886, Specificity=0.864, and G-mean=0.874. CONCLUSION Missing interactions and new interactions are predicted using the proposed framework. Some of the known interactions are removed from the original dataset and their interactions are recalculated to check the accuracy of the proposed framework. Moreover, validation of the proposed approach is performed using the external dataset. All these results show that structurally similar drugs tend to interact with similar targets.
Collapse
Affiliation(s)
- Aman Sharma
- Computer Science and Engineering Department, Thapar Institute of Engineering & Technology, Punjab, Patiala, India.
| | - Rinkle Rani
- Computer Science and Engineering Department, Thapar Institute of Engineering & Technology, Punjab, Patiala, India.
| |
Collapse
|
161
|
Abstract
Motivation The identification of novel drug-target (DT) interactions is a substantial part of the drug discovery process. Most of the computational methods that have been proposed to predict DT interactions have focused on binary classification, where the goal is to determine whether a DT pair interacts or not. However, protein-ligand interactions assume a continuum of binding strength values, also called binding affinity and predicting this value still remains a challenge. The increase in the affinity data available in DT knowledge-bases allows the use of advanced learning techniques such as deep learning architectures in the prediction of binding affinities. In this study, we propose a deep-learning based model that uses only sequence information of both targets and drugs to predict DT interaction binding affinities. The few studies that focus on DT binding affinity prediction use either 3D structures of protein-ligand complexes or 2D features of compounds. One novel approach used in this work is the modeling of protein sequences and compound 1D representations with convolutional neural networks (CNNs). Results The results show that the proposed deep learning based model that uses the 1D representations of targets and drugs is an effective approach for drug target binding affinity prediction. The model in which high-level representations of a drug and a target are constructed via CNNs achieved the best Concordance Index (CI) performance in one of our larger benchmark datasets, outperforming the KronRLS algorithm and SimBoost, a state-of-the-art method for DT binding affinity prediction. Availability and implementation https://github.com/hkmztrk/DeepDTA. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hakime Öztürk
- Department of Computer Engineering, Bogazici University, Istanbul, Turkey
| | - Arzucan Özgür
- Department of Computer Engineering, Bogazici University, Istanbul, Turkey
| | - Elif Ozkirimli
- Department of Chemical Engineering, Bogazici University, Istanbul, Turkey
| |
Collapse
|
162
|
Chen R, Liu X, Jin S, Lin J, Liu J. Machine Learning for Drug-Target Interaction Prediction. Molecules 2018; 23:E2208. [PMID: 30200333 PMCID: PMC6225477 DOI: 10.3390/molecules23092208] [Citation(s) in RCA: 120] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2018] [Revised: 08/27/2018] [Accepted: 08/27/2018] [Indexed: 12/18/2022] Open
Abstract
Identifying drug-target interactions will greatly narrow down the scope of search of candidate medications, and thus can serve as the vital first step in drug discovery. Considering that in vitro experiments are extremely costly and time-consuming, high efficiency computational prediction methods could serve as promising strategies for drug-target interaction (DTI) prediction. In this review, our goal is to focus on machine learning approaches and provide a comprehensive overview. First, we summarize a brief list of databases frequently used in drug discovery. Next, we adopt a hierarchical classification scheme and introduce several representative methods of each category, especially the recent state-of-the-art methods. In addition, we compare the advantages and limitations of methods in each category. Lastly, we discuss the remaining challenges and future outlook of machine learning in DTI prediction. This article may provide a reference and tutorial insights on machine learning-based DTI prediction for future researchers.
Collapse
Affiliation(s)
- Ruolan Chen
- Department of Computer Science, School of Information Science and Technology, Xiamen University, Xiamen 361005, China.
| | - Xiangrong Liu
- Department of Computer Science, School of Information Science and Technology, Xiamen University, Xiamen 361005, China.
| | - Shuting Jin
- Department of Computer Science, School of Information Science and Technology, Xiamen University, Xiamen 361005, China.
| | - Jiawei Lin
- Department of Computer Science, School of Information Science and Technology, Xiamen University, Xiamen 361005, China.
| | - Juan Liu
- Department of Instrumental and Electrical Engineering, School of Aerospace Engineering, Xiamen University, Xiamen 361005, China.
| |
Collapse
|
163
|
Wang C, Kurgan L. Review and comparative assessment of similarity-based methods for prediction of drug–protein interactions in the druggable human proteome. Brief Bioinform 2018; 20:2066-2087. [DOI: 10.1093/bib/bby069] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 06/26/2018] [Accepted: 07/10/2018] [Indexed: 12/18/2022] Open
Abstract
AbstractDrug–protein interactions (DPIs) underlie the desired therapeutic actions and the adverse side effects of a significant majority of drugs. Computational prediction of DPIs facilitates research in drug discovery, characterization and repurposing. Similarity-based methods that do not require knowledge of protein structures are particularly suitable for druggable genome-wide predictions of DPIs. We review 35 high-impact similarity-based predictors that were published in the past decade. We group them based on three types of similarities and their combinations that they use. We discuss and compare key aspects of these methods including source databases, internal databases and their predictive models. Using our novel benchmark database, we perform comparative empirical analysis of predictive performance of seven types of representative predictors that utilize each type of similarity individually and all possible combinations of similarities. We assess predictive quality at the database-wide DPI level and we are the first to also include evaluation over individual drugs. Our comprehensive analysis shows that predictors that use more similarity types outperform methods that employ fewer similarities, and that the model combining all three types of similarities secures area under the receiver operating characteristic curve of 0.93. We offer a comprehensive analysis of sensitivity of predictive performance to intrinsic and extrinsic characteristics of the considered predictors. We find that predictive performance is sensitive to low levels of similarities between sequences of the drug targets and several extrinsic properties of the input drug structures, drug profiles and drug targets. The benchmark database and a webserver for the seven predictors are freely available at http://biomine.cs.vcu.edu/servers/CONNECTOR/.
Collapse
Affiliation(s)
- Chen Wang
- Computer Science Department, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Computer Science Department, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
164
|
Airola A, Pahikkala T. Fast Kronecker Product Kernel Methods via Generalized Vec Trick. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:3374-3387. [PMID: 28783645 DOI: 10.1109/tnnls.2017.2727545] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Kronecker product kernel provides the standard approach in the kernel methods' literature for learning from graph data, where edges are labeled and both start and end vertices have their own feature representations. The methods allow generalization to such new edges, whose start and end vertices do not appear in the training data, a setting known as zero-shot or zero-data learning. Such a setting occurs in numerous applications, including drug-target interaction prediction, collaborative filtering, and information retrieval. Efficient training algorithms based on the so-called vec trick that makes use of the special structure of the Kronecker product are known for the case where the training data are a complete bipartite graph. In this paper, we generalize these results to noncomplete training graphs. This allows us to derive a general framework for training Kronecker product kernel methods, as specific examples we implement Kronecker ridge regression and support vector machine algorithms. Experimental results demonstrate that the proposed approach leads to accurate models, while allowing order of magnitude improvements in training and prediction time.
Collapse
|
165
|
Xue H, Li J, Xie H, Wang Y. Review of Drug Repositioning Approaches and Resources. Int J Biol Sci 2018; 14:1232-1244. [PMID: 30123072 PMCID: PMC6097480 DOI: 10.7150/ijbs.24612] [Citation(s) in RCA: 327] [Impact Index Per Article: 54.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2017] [Accepted: 06/12/2018] [Indexed: 12/23/2022] Open
Abstract
Drug discovery is a time-consuming, high-investment, and high-risk process in traditional drug development. Drug repositioning has become a popular strategy in recent years. Different from traditional drug development strategies, the strategy is efficient, economical and riskless. There are usually three kinds of approaches: computational approaches, biological experimental approaches, and mixed approaches, all of which are widely used in drug repositioning. In this paper, we reviewed computational approaches and highlighted their characteristics to provide references for researchers to develop more powerful approaches. At the same time, the important findings obtained using these approaches are listed. Furthermore, we summarized 76 important resources about drug repositioning. Finally, challenges and opportunities in drug repositioning are discussed from multiple perspectives, including technology, commercial models, patents and investment.
Collapse
Affiliation(s)
- Hanqing Xue
- School of Computer Science and Technology, Harbin Institute of Technology, 150001, Harbin, China
| | - Jie Li
- School of Computer Science and Technology, Harbin Institute of Technology, 150001, Harbin, China
| | - Haozhe Xie
- School of Computer Science and Technology, Harbin Institute of Technology, 150001, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, 150001, Harbin, China
| |
Collapse
|
166
|
|
167
|
Stock M, Pahikkala T, Airola A, De Baets B, Waegeman W. A Comparative Study of Pairwise Learning Methods Based on Kernel Ridge Regression. Neural Comput 2018; 30:2245-2283. [PMID: 29894652 DOI: 10.1162/neco_a_01096] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction, or network inference problems. During the past decade, kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression, and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency, and spectral filtering properties. Our theoretical results provide valuable insights into assessing the advantages and limitations of existing pairwise learning methods.
Collapse
Affiliation(s)
- Michiel Stock
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium
| | - Tapio Pahikkala
- Department of Future Technologies, University of Turku, 20520 Turku, Finland
| | - Antti Airola
- Department of Future Technologies, University of Turku, 20520 Turku, Finland
| | - Bernard De Baets
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium
| | - Willem Waegeman
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
168
|
Huang G, Li J, Zhao C. Computational Prediction and Analysis of Associations between Small Molecules and Binding-Associated S-Nitrosylation Sites. Molecules 2018; 23:molecules23040954. [PMID: 29671802 PMCID: PMC6017196 DOI: 10.3390/molecules23040954] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Revised: 03/30/2018] [Accepted: 04/09/2018] [Indexed: 01/12/2023] Open
Abstract
Interactions between drugs and proteins occupy a central position during the process of drug discovery and development. Numerous methods have recently been developed for identifying drug–target interactions, but few have been devoted to finding interactions between post-translationally modified proteins and drugs. We presented a machine learning-based method for identifying associations between small molecules and binding-associated S-nitrosylated (SNO-) proteins. Namely, small molecules were encoded by molecular fingerprint, SNO-proteins were encoded by the information entropy-based method, and the random forest was used to train a classifier. Ten-fold and leave-one-out cross validations achieved, respectively, 0.7235 and 0.7490 of the area under a receiver operating characteristic curve. Computational analysis of similarity suggested that SNO-proteins associated with the same drug shared statistically significant similarity, and vice versa. This method and finding are useful to identify drug–SNO associations and further facilitate the discovery and development of SNO-associated drugs.
Collapse
Affiliation(s)
- Guohua Huang
- Provincial Key Laboratory of Informational Service for Rural Area of Southwestern Hunan, Shaoyang University, Shaoyang 422000, China.
- College of Information Engineering, Shaoyang University, Shaoyang 422000, China.
| | - Jincheng Li
- Provincial Key Laboratory of Informational Service for Rural Area of Southwestern Hunan, Shaoyang University, Shaoyang 422000, China.
- College of Information Engineering, Shaoyang University, Shaoyang 422000, China.
| | - Chenglin Zhao
- Provincial Key Laboratory of Informational Service for Rural Area of Southwestern Hunan, Shaoyang University, Shaoyang 422000, China.
- College of Information Engineering, Shaoyang University, Shaoyang 422000, China.
| |
Collapse
|
169
|
Tauler R, Parastar H. Big (Bio)Chemical Data Mining Using Chemometric Methods: A Need for Chemists. Angew Chem Int Ed Engl 2018; 61:e201801134. [DOI: 10.1002/anie.201801134] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2018] [Indexed: 11/08/2022]
Affiliation(s)
- Roma Tauler
- IDAEA-CSIC Environmental Chemistry Jordi Girona 18-26 08034 Barcelona SPAIN
| | | |
Collapse
|
170
|
Ezzat A, Wu M, Li XL, Kwoh CK. Computational prediction of drug–target interactions using chemogenomic approaches: an empirical survey. Brief Bioinform 2018; 20:1337-1357. [DOI: 10.1093/bib/bby002] [Citation(s) in RCA: 117] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Revised: 12/21/2017] [Indexed: 01/18/2023] Open
Abstract
Abstract
Computational prediction of drug–target interactions (DTIs) has become an essential task in the drug discovery process. It narrows down the search space for interactions by suggesting potential interaction candidates for validation via wet-lab experiments that are well known to be expensive and time-consuming. In this article, we aim to provide a comprehensive overview and empirical evaluation on the computational DTI prediction techniques, to act as a guide and reference for our fellow researchers. Specifically, we first describe the data used in such computational DTI prediction efforts. We then categorize and elaborate the state-of-the-art methods for predicting DTIs. Next, an empirical comparison is performed to demonstrate the prediction performance of some representative methods under different scenarios. We also present interesting findings from our evaluation study, discussing the advantages and disadvantages of each method. Finally, we highlight potential avenues for further enhancement of DTI prediction performance as well as related research directions.
Collapse
|
171
|
Abstract
Background Matrix factorization is a well established pattern discovery tool that has seen numerous applications in biomedical data analytics, such as gene expression co-clustering, patient stratification, and gene-disease association mining. Matrix factorization learns a latent data model that takes a data matrix and transforms it into a latent feature space enabling generalization, noise removal and feature discovery. However, factorization algorithms are numerically intensive, and hence there is a pressing challenge to scale current algorithms to work with large datasets. Our focus in this paper is matrix tri-factorization, a popular method that is not limited by the assumption of standard matrix factorization about data residing in one latent space. Matrix tri-factorization solves this by inferring a separate latent space for each dimension in a data matrix, and a latent mapping of interactions between the inferred spaces, making the approach particularly suitable for biomedical data mining. Results We developed a block-wise approach for latent factor learning in matrix tri-factorization. The approach partitions a data matrix into disjoint submatrices that are treated independently and fed into a parallel factorization system. An appealing property of the proposed approach is its mathematical equivalence with serial matrix tri-factorization. In a study on large biomedical datasets we show that our approach scales well on multi-processor and multi-GPU architectures. On a four-GPU system we demonstrate that our approach can be more than 100-times faster than its single-processor counterpart. Conclusions A general approach for scaling non-negative matrix tri-factorization is proposed. The approach is especially useful parallel matrix factorization implemented in a multi-GPU environment. We expect the new approach will be useful in emerging procedures for latent factor analysis, notably for data integration, where many large data matrices need to be collectively factorized.
Collapse
|
172
|
Nourani E, Khunjush F, Sevilgen FE. Virus–human protein–protein interaction prediction using Bayesian matrix factorization and projection techniques. Biocybern Biomed Eng 2018. [DOI: 10.1016/j.bbe.2018.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
173
|
iDTI-ESBoost: Identification of Drug Target Interaction Using Evolutionary and Structural Features with Boosting. Sci Rep 2017; 7:17731. [PMID: 29255285 PMCID: PMC5735173 DOI: 10.1038/s41598-017-18025-2] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Accepted: 12/05/2017] [Indexed: 02/07/2023] Open
Abstract
Prediction of new drug-target interactions is critically important as it can lead the researchers to find new uses for old drugs and to disclose their therapeutic profiles or side effects. However, experimental prediction of drug-target interactions is expensive and time-consuming. As a result, computational methods for predictioning new drug-target interactions have gained a tremendous interest in recent times. Here we present iDTI-ESBoost, a prediction model for identification of drug-target interactions using evolutionary and structural features. Our proposed method uses a novel data balancing and boosting technique to predict drug-target interaction. On four benchmark datasets taken from a gold standard data, iDTI-ESBoost outperforms the state-of-the-art methods in terms of area under receiver operating characteristic (auROC) curve. iDTI-ESBoost also outperforms the latest and the best-performing method found in the literature in terms of area under precision recall (auPR) curve. This is significant as auPR curves are argued as suitable metric for comparison for imbalanced datasets similar to the one studied here. Our reported results show the effectiveness of the classifier, balancing methods and the novel features incorporated in iDTI-ESBoost. iDTI-ESBoost is a novel prediction method that has for the first time exploited the structural features along with the evolutionary features to predict drug-protein interactions. We believe the excellent performance of iDTI-ESBoost both in terms of auROC and auPR would motivate the researchers and practitioners to use it to predict drug-target interactions. To facilitate that, iDTI-ESBoost is implemented and made publicly available at: http://farshidrayhan.pythonanywhere.com/iDTI-ESBoost/.
Collapse
|
174
|
Peska L, Buza K, Koller J. Drug-target interaction prediction: A Bayesian ranking approach. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017; 152:15-21. [PMID: 29054256 DOI: 10.1016/j.cmpb.2017.09.003] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Revised: 07/28/2017] [Accepted: 09/05/2017] [Indexed: 06/07/2023]
Abstract
BACKGROUND AND OBJECTIVE In silico prediction of drug-target interactions (DTI) could provide valuable information and speed-up the process of drug repositioning - finding novel usage for existing drugs. In our work, we focus on machine learning algorithms supporting drug-centric repositioning approach, which aims to find novel usage for existing or abandoned drugs. We aim at proposing a per-drug ranking-based method, which reflects the needs of drug-centric repositioning research better than conventional drug-target prediction approaches. METHODS We propose Bayesian Ranking Prediction of Drug-Target Interactions (BRDTI). The method is based on Bayesian Personalized Ranking matrix factorization (BPR) which has been shown to be an excellent approach for various preference learning tasks, however, it has not been used for DTI prediction previously. In order to successfully deal with DTI challenges, we extended BPR by proposing: (i) the incorporation of target bias, (ii) a technique to handle new drugs and (iii) content alignment to take structural similarities of drugs and targets into account. RESULTS Evaluation on five benchmark datasets shows that BRDTI outperforms several state-of-the-art approaches in terms of per-drug nDCG and AUC. BRDTI results w.r.t. nDCG are 0.929, 0.953, 0.948, 0.897 and 0.690 for G-Protein Coupled Receptors (GPCR), Ion Channels (IC), Nuclear Receptors (NR), Enzymes (E) and Kinase (K) datasets respectively. Additionally, BRDTI significantly outperformed other methods (BLM-NII, WNN-GIP, NetLapRLS and CMF) w.r.t. nDCG in 17 out of 20 cases. Furthermore, BRDTI was also shown to be able to predict novel drug-target interactions not contained in the original datasets. The average recall at top-10 predicted targets for each drug was 0.762, 0.560, 1.000 and 0.404 for GPCR, IC, NR, and E datasets respectively. CONCLUSIONS Based on the evaluation, we can conclude that BRDTI is an appropriate choice for researchers looking for an in silico DTI prediction technique to be used in drug-centric repositioning scenarios. BRDTI Software and supplementary materials are available online at www.ksi.mff.cuni.cz/∼peska/BRDTI.
Collapse
Affiliation(s)
- Ladislav Peska
- Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic; Brain Imaging Centre, Hungarian Academy of Sciences, Budapest, Hungary.
| | - Krisztian Buza
- Brain Imaging Centre, Hungarian Academy of Sciences, Budapest, Hungary; Rheinische Friedrich-Wilhelms-Universität Bonn, Germany
| | - Júlia Koller
- Institute of Genomic Medicine and Rare Disorders, Semmelweis University, Budapest, Hungary
| |
Collapse
|
175
|
Ding Y, Tang J, Guo F. Identification of drug-target interactions via multiple information integration. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.08.045] [Citation(s) in RCA: 138] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
176
|
Zhang W, Chen Y, Li D. Drug-Target Interaction Prediction through Label Propagation with Linear Neighborhood Information. Molecules 2017; 22:molecules22122056. [PMID: 29186828 PMCID: PMC6149680 DOI: 10.3390/molecules22122056] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Revised: 11/19/2017] [Accepted: 11/20/2017] [Indexed: 11/16/2022] Open
Abstract
Interactions between drugs and target proteins provide important information for the drug discovery. Currently, experiments identified only a small number of drug-target interactions. Therefore, the development of computational methods for drug-target interaction prediction is an urgent task of theoretical interest and practical significance. In this paper, we propose a label propagation method with linear neighborhood information (LPLNI) for predicting unobserved drug-target interactions. Firstly, we calculate drug-drug linear neighborhood similarity in the feature spaces, by considering how to reconstruct data points from neighbors. Then, we take similarities as the manifold of drugs, and assume the manifold unchanged in the interaction space. At last, we predict unobserved interactions between known drugs and targets by using drug-drug linear neighborhood similarity and known drug-target interactions. The experiments show that LPLNI can utilize only known drug-target interactions to make high-accuracy predictions on four benchmark datasets. Furthermore, we consider incorporating chemical structures into LPLNI models. Experimental results demonstrate that the model with integrated information (LPLNI-II) can produce improved performances, better than other state-of-the-art methods. The known drug-target interactions are an important information source for computational predictions. The usefulness of the proposed method is demonstrated by cross validation and the case study.
Collapse
Affiliation(s)
- Wen Zhang
- School of Computer, Wuhan University, Wuhan 430072, China.
| | - Yanlin Chen
- School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China.
| | - Dingfang Li
- School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China.
| |
Collapse
|
177
|
Bolgár B, Antal P. VB-MK-LMF: fusion of drugs, targets and interactions using variational Bayesian multiple kernel logistic matrix factorization. BMC Bioinformatics 2017; 18:440. [PMID: 28978313 PMCID: PMC5628496 DOI: 10.1186/s12859-017-1845-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 09/21/2017] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Computational fusion approaches to drug-target interaction (DTI) prediction, capable of utilizing multiple sources of background knowledge, were reported to achieve superior predictive performance in multiple studies. Other studies showed that specificities of the DTI task, such as weighting the observations and focusing the side information are also vital for reaching top performance. METHOD We present Variational Bayesian Multiple Kernel Logistic Matrix Factorization (VB-MK-LMF), which unifies the advantages of (1) multiple kernel learning, (2) weighted observations, (3) graph Laplacian regularization, and (4) explicit modeling of probabilities of binary drug-target interactions. RESULTS VB-MK-LMF achieves significantly better predictive performance in standard benchmarks compared to state-of-the-art methods, which can be traced back to multiple factors. The systematic evaluation of the effect of multiple kernels confirm their benefits, but also highlights the limitations of linear kernel combinations, already recognized in other fields. The analysis of the effect of prior kernels using varying sample sizes sheds light on the balance of data and knowledge in DTI tasks and on the rate at which the effect of priors vanishes. This also shows the existence of "small sample size" regions where using side information offers significant gains. Alongside favorable predictive performance, a notable property of MF methods is that they provide a unified space for drugs and targets using latent representations. Compared to earlier studies, the dimensionality of this space proved to be surprisingly low, which makes the latent representations constructed by VB-ML-LMF especially well-suited for visual analytics. The probabilistic nature of the predictions allows the calculation of the expected values of hits in functionally relevant sets, which we demonstrate by predicting drug promiscuity. The variational Bayesian approximation is also implemented for general purpose graphics processing units yielding significantly improved computational time. CONCLUSION In standard benchmarks, VB-MK-LMF shows significantly improved predictive performance in a wide range of settings. Beyond these benchmarks, another contribution of our work is highlighting and providing estimates for further pharmaceutically relevant quantities, such as promiscuity, druggability and total number of interactions.
Collapse
Affiliation(s)
- Bence Bolgár
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Magyar tudósok krt. 2., Budapest, 1117 Hungary
| | - Péter Antal
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Magyar tudósok krt. 2., Budapest, 1117 Hungary
| |
Collapse
|
178
|
Buza K, Peška L. Drug–target interaction prediction with Bipartite Local Models and hubness-aware regression. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.04.055] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
|
179
|
Drug-target interaction prediction using ensemble learning and dimensionality reduction. Methods 2017; 129:81-88. [DOI: 10.1016/j.ymeth.2017.05.016] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Revised: 04/03/2017] [Accepted: 05/18/2017] [Indexed: 11/23/2022] Open
|
180
|
Wang L, You ZH, Chen X, Xia SX, Liu F, Yan X, Zhou Y, Song KJ. A Computational-Based Method for Predicting Drug-Target Interactions by Using Stacked Autoencoder Deep Neural Network. J Comput Biol 2017; 25:361-373. [PMID: 28891684 DOI: 10.1089/cmb.2017.0135] [Citation(s) in RCA: 103] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Identifying the interaction between drugs and target proteins is an important area of drug research, which provides a broad prospect for low-risk and faster drug development. However, due to the limitations of traditional experiments when revealing drug-protein interactions (DTIs), the screening of targets not only takes a lot of time and money but also has high false-positive and false-negative rates. Therefore, it is imperative to develop effective automatic computational methods to accurately predict DTIs in the postgenome era. In this article, we propose a new computational method for predicting DTIs from drug molecular structure and protein sequence by using the stacked autoencoder of deep learning, which can adequately extract the raw data information. The proposed method has the advantage that it can automatically mine the hidden information from protein sequences and generate highly representative features through iterations of multiple layers. The feature descriptors are then constructed by combining the molecular substructure fingerprint information, and fed into the rotation forest for accurate prediction. The experimental results of fivefold cross-validation indicate that the proposed method achieves superior performance on gold standard data sets (enzymes, ion channels, GPCRs [G-protein-coupled receptors], and nuclear receptors) with accuracy of 0.9414, 0.9116, 0.8669, and 0.8056, respectively. We further comprehensively explore the performance of the proposed method by comparing it with other feature extraction algorithms, state-of-the-art classifiers, and other excellent methods on the same data set. The excellent comparison results demonstrate that the proposed method is highly competitive when predicting drug-target interactions.
Collapse
Affiliation(s)
- Lei Wang
- 1 School of Computer Science and Technology, China University of Mining and Technology , Xuzhou, China .,2 College of Information Science and Engineering, Zaozhuang University , Zaozhuang, China
| | - Zhu-Hong You
- 3 Xinjiang Technical Institutes of Physics and Chemistry , Chinese Academy of Science, Urumqi, China
| | - Xing Chen
- 4 School of Information and Control Engineering, China University of Mining and Technology , Xuzhou, China
| | - Shi-Xiong Xia
- 1 School of Computer Science and Technology, China University of Mining and Technology , Xuzhou, China
| | - Feng Liu
- 5 China National Coal Association , Beijing, China
| | - Xin Yan
- 6 School of Foreign Languages, Zaozhuang University , Zaozhuang, China
| | - Yong Zhou
- 1 School of Computer Science and Technology, China University of Mining and Technology , Xuzhou, China
| | - Ke-Jian Song
- 7 School of Information Engineering, JiangXi University of Science and Technology , Ganzhou, China
| |
Collapse
|
181
|
In silico prediction of drug-target interaction networks based on drug chemical structure and protein sequences. Sci Rep 2017; 7:11174. [PMID: 28894115 PMCID: PMC5593914 DOI: 10.1038/s41598-017-10724-0] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Accepted: 08/14/2017] [Indexed: 01/09/2023] Open
Abstract
Analysis of drug–target interactions (DTIs) is of great importance in developing new drug candidates for known protein targets or discovering new targets for old drugs. However, the experimental approaches for identifying DTIs are expensive, laborious and challenging. In this study, we report a novel computational method for predicting DTIs using the highly discriminative information of drug-target interactions and our newly developed discriminative vector machine (DVM) classifier. More specifically, each target protein sequence is transformed as the position-specific scoring matrix (PSSM), in which the evolutionary information is retained; then the local binary pattern (LBP) operator is used to calculate the LBP histogram descriptor. For a drug molecule, a novel fingerprint representation is utilized to describe its chemical structure information representing existence of certain functional groups or fragments. When applying the proposed method to the four datasets (Enzyme, GPCR, Ion Channel and Nuclear Receptor) for predicting DTIs, we obtained good average accuracies of 93.16%, 89.37%, 91.73% and 92.22%, respectively. Furthermore, we compared the performance of the proposed model with that of the state-of-the-art SVM model and other previous methods. The achieved results demonstrate that our method is effective and robust and can be taken as a useful tool for predicting DTIs.
Collapse
|
182
|
Yuan Q, Gao J, Wu D, Zhang S, Mamitsuka H, Zhu S. DrugE-Rank: improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank. Bioinformatics 2017; 32:i18-i27. [PMID: 27307615 PMCID: PMC4908328 DOI: 10.1093/bioinformatics/btw244] [Citation(s) in RCA: 99] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Motivation: Identifying drug–target interactions is an important task in drug discovery. To reduce heavy time and financial cost in experimental way, many computational approaches have been proposed. Although these approaches have used many different principles, their performance is far from satisfactory, especially in predicting drug–target interactions of new candidate drugs or targets. Methods: Approaches based on machine learning for this problem can be divided into two types: feature-based and similarity-based methods. Learning to rank is the most powerful technique in the feature-based methods. Similarity-based methods are well accepted, due to their idea of connecting the chemical and genomic spaces, represented by drug and target similarities, respectively. We propose a new method, DrugE-Rank, to improve the prediction performance by nicely combining the advantages of the two different types of methods. That is, DrugE-Rank uses LTR, for which multiple well-known similarity-based methods can be used as components of ensemble learning. Results: The performance of DrugE-Rank is thoroughly examined by three main experiments using data from DrugBank: (i) cross-validation on FDA (US Food and Drug Administration) approved drugs before March 2014; (ii) independent test on FDA approved drugs after March 2014; and (iii) independent test on FDA experimental drugs. Experimental results show that DrugE-Rank outperforms competing methods significantly, especially achieving more than 30% improvement in Area under Prediction Recall curve for FDA approved new drugs and FDA experimental drugs. Availability:http://datamining-iip.fudan.edu.cn/service/DrugE-Rank Contact:zhusf@fudan.edu.cn Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qingjun Yuan
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
| | - Junning Gao
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
| | - Dongliang Wu
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
| | - Shihua Zhang
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Japan Department of Computer Science, Aalto University, Finland
| | - Shanfeng Zhu
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China Centre for Computational System Biology, Fudan University, Shanghai, China
| |
Collapse
|
183
|
An Ameliorated Prediction of Drug-Target Interactions Based on Multi-Scale Discrete Wavelet Transform and Network Features. Int J Mol Sci 2017; 18:ijms18081781. [PMID: 28813000 PMCID: PMC5578170 DOI: 10.3390/ijms18081781] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2017] [Revised: 08/08/2017] [Accepted: 08/14/2017] [Indexed: 11/17/2022] Open
Abstract
The prediction of drug–target interactions (DTIs) via computational technology plays a crucial role in reducing the experimental cost. A variety of state-of-the-art methods have been proposed to improve the accuracy of DTI predictions. In this paper, we propose a kind of drug–target interactions predictor adopting multi-scale discrete wavelet transform and network features (named as DAWN) in order to solve the DTIs prediction problem. We encode the drug molecule by a substructure fingerprint with a dictionary of substructure patterns. Simultaneously, we apply the discrete wavelet transform (DWT) to extract features from target sequences. Then, we concatenate and normalize the target, drug, and network features to construct feature vectors. The prediction model is obtained by feeding these feature vectors into the support vector machine (SVM) classifier. Extensive experimental results show that the prediction ability of DAWN has a compatibility among other DTI prediction schemes. The prediction areas under the precision–recall curves (AUPRs) of four datasets are 0.895 (Enzyme), 0.921 (Ion Channel), 0.786 (guanosine-binding protein coupled receptor, GPCR), and 0.603 (Nuclear Receptor), respectively.
Collapse
|
184
|
Peng L, Zhu W, Liao B, Duan Y, Chen M, Chen Y, Yang J. Screening drug-target interactions with positive-unlabeled learning. Sci Rep 2017; 7:8087. [PMID: 28808275 PMCID: PMC5556112 DOI: 10.1038/s41598-017-08079-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Accepted: 07/04/2017] [Indexed: 02/03/2023] Open
Abstract
Identifying drug-target interaction (DTI) candidates is crucial for drug repositioning. However, usually only positive DTIs are deposited in known databases, which challenges computational methods to predict novel DTIs due to the lack of negative samples. To overcome this dilemma, researchers usually randomly select negative samples from unlabeled drug-target pairs, which introduces a lot of false-positives. In this study, a negative sample extraction method named NDTISE is first developed to screen strong negative DTI examples based on positive-unlabeled learning. A novel DTI screening framework, PUDTI, is then designed to infer new drug repositioning candidates by integrating NDTISE, probabilities that remaining ambiguous samples belong to the positive and negative classes, and an SVM-based optimization model. We investigated the effectiveness of NDTISE on a DTI data provided by NCPIS. NDTISE is much better than random selection and slightly outperforms NCPIS. We then compared PUDTI with 6 state-of-the-art methods on 4 classes of DTI datasets from human enzymes, ion channels, GPCRs and nuclear receptors. PUDTI achieved the highest AUC among the 7 methods on all 4 datasets. Finally, we validated a few top predicted DTIs through mining independent drug databases and literatures. In conclusion, PUDTI provides an effective pre-filtering method for new drug design.
Collapse
Affiliation(s)
- Lihong Peng
- Key Laboratory for Embedded and Network Computing of Hunan Province, College of Information Science and Engineering, Hunan University, Changsha Hunan, 410082, China
- College of Information Engineering, Changsha Medical University, Changsha Hunan, 410219, China
| | - Wen Zhu
- Key Laboratory for Embedded and Network Computing of Hunan Province, College of Information Science and Engineering, Hunan University, Changsha Hunan, 410082, China
| | - Bo Liao
- Key Laboratory for Embedded and Network Computing of Hunan Province, College of Information Science and Engineering, Hunan University, Changsha Hunan, 410082, China.
| | - Yu Duan
- Hunan University of Commerce, Changsha Hunan, 410205, China
| | - Min Chen
- Key Laboratory for Embedded and Network Computing of Hunan Province, College of Information Science and Engineering, Hunan University, Changsha Hunan, 410082, China
| | - Yi Chen
- College of Drug, Changsha Medical University, Changsha Hunan, 410219, China
| | - Jialiang Yang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, NY, 10029, USA
| |
Collapse
|
185
|
Computational-experimental approach to drug-target interaction mapping: A case study on kinase inhibitors. PLoS Comput Biol 2017; 13:e1005678. [PMID: 28787438 PMCID: PMC5560747 DOI: 10.1371/journal.pcbi.1005678] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Revised: 08/17/2017] [Accepted: 07/11/2017] [Indexed: 01/09/2023] Open
Abstract
Due to relatively high costs and labor required for experimental profiling of the full target space of chemical compounds, various machine learning models have been proposed as cost-effective means to advance this process in terms of predicting the most potent compound-target interactions for subsequent verification. However, most of the model predictions lack direct experimental validation in the laboratory, making their practical benefits for drug discovery or repurposing applications largely unknown. Here, we therefore introduce and carefully test a systematic computational-experimental framework for the prediction and pre-clinical verification of drug-target interactions using a well-established kernel-based regression algorithm as the prediction model. To evaluate its performance, we first predicted unmeasured binding affinities in a large-scale kinase inhibitor profiling study, and then experimentally tested 100 compound-kinase pairs. The relatively high correlation of 0.77 (p < 0.0001) between the predicted and measured bioactivities supports the potential of the model for filling the experimental gaps in existing compound-target interaction maps. Further, we subjected the model to a more challenging task of predicting target interactions for such a new candidate drug compound that lacks prior binding profile information. As a specific case study, we used tivozanib, an investigational VEGF receptor inhibitor with currently unknown off-target profile. Among 7 kinases with high predicted affinity, we experimentally validated 4 new off-targets of tivozanib, namely the Src-family kinases FRK and FYN A, the non-receptor tyrosine kinase ABL1, and the serine/threonine kinase SLK. Our sub-sequent experimental validation protocol effectively avoids any possible information leakage between the training and validation data, and therefore enables rigorous model validation for practical applications. These results demonstrate that the kernel-based modeling approach offers practical benefits for probing novel insights into the mode of action of investigational compounds, and for the identification of new target selectivities for drug repurposing applications.
Collapse
|
186
|
Meng FR, You ZH, Chen X, Zhou Y, An JY. Prediction of Drug-Target Interaction Networks from the Integration of Protein Sequences and Drug Chemical Structures. Molecules 2017; 22:molecules22071119. [PMID: 28678206 PMCID: PMC6152073 DOI: 10.3390/molecules22071119] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2017] [Revised: 06/27/2017] [Accepted: 07/03/2017] [Indexed: 11/16/2022] Open
Abstract
Knowledge of drug–target interaction (DTI) plays an important role in discovering new drug candidates. Unfortunately, there are unavoidable shortcomings; including the time-consuming and expensive nature of the experimental method to predict DTI. Therefore, it motivates us to develop an effective computational method to predict DTI based on protein sequence. In the paper, we proposed a novel computational approach based on protein sequence, namely PDTPS (Predicting Drug Targets with Protein Sequence) to predict DTI. The PDTPS method combines Bi-gram probabilities (BIGP), Position Specific Scoring Matrix (PSSM), and Principal Component Analysis (PCA) with Relevance Vector Machine (RVM). In order to evaluate the prediction capacity of the PDTPS, the experiment was carried out on enzyme, ion channel, GPCR, and nuclear receptor datasets by using five-fold cross-validation tests. The proposed PDTPS method achieved average accuracy of 97.73%, 93.12%, 86.78%, and 87.78% on enzyme, ion channel, GPCR and nuclear receptor datasets, respectively. The experimental results showed that our method has good prediction performance. Furthermore, in order to further evaluate the prediction performance of the proposed PDTPS method, we compared it with the state-of-the-art support vector machine (SVM) classifier on enzyme and ion channel datasets, and other exiting methods on four datasets. The promising comparison results further demonstrate that the efficiency and robust of the proposed PDTPS method. This makes it a useful tool and suitable for predicting DTI, as well as other bioinformatics tasks.
Collapse
Affiliation(s)
- Fan-Rong Meng
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China.
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 21116, China.
| | - Yong Zhou
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China.
| | - Ji-Yong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China.
| |
Collapse
|
187
|
Ezzat A, Zhao P, Wu M, Li XL, Kwoh CK. Drug-Target Interaction Prediction with Graph Regularized Matrix Factorization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:646-656. [PMID: 26890921 DOI: 10.1109/tcbb.2016.2530062] [Citation(s) in RCA: 168] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Experimental determination of drug-target interactions is expensive and time-consuming. Therefore, there is a continuous demand for more accurate predictions of interactions using computational techniques. Algorithms have been devised to infer novel interactions on a global scale where the input to these algorithms is a drug-target network (i.e., a bipartite graph where edges connect pairs of drugs and targets that are known to interact). However, these algorithms had difficulty predicting interactions involving new drugs or targets for which there are no known interactions (i.e., "orphan" nodes in the network). Since data usually lie on or near to low-dimensional non-linear manifolds, we propose two matrix factorization methods that use graph regularization in order to learn such manifolds. In addition, considering that many of the non-occurring edges in the network are actually unknown or missing cases, we developed a preprocessing step to enhance predictions in the "new drug" and "new target" cases by adding edges with intermediate interaction likelihood scores. In our cross validation experiments, our methods achieved better results than three other state-of-the-art methods in most cases. Finally, we simulated some "new drug" and "new target" cases and found that GRMF predicted the left-out interactions reasonably well.
Collapse
|
188
|
Lotfi Shahreza M, Ghadiri N, Mousavi SR, Varshosaz J, Green JR. Heter-LP: A heterogeneous label propagation algorithm and its application in drug repositioning. J Biomed Inform 2017; 68:167-183. [DOI: 10.1016/j.jbi.2017.03.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2016] [Revised: 02/09/2017] [Accepted: 03/10/2017] [Indexed: 12/14/2022]
|
189
|
Chaudhari R, Tan Z, Huang B, Zhang S. Computational polypharmacology: a new paradigm for drug discovery. Expert Opin Drug Discov 2017; 12:279-291. [PMID: 28067061 PMCID: PMC7241838 DOI: 10.1080/17460441.2017.1280024] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
INTRODUCTION Over the past couple of years, the cost of drug development has sharply increased along with the high rate of clinical trial failures. Such increase in expenses is partially due to the inability of the "one drug - one target" approach to predict drug side effects and toxicities. To tackle this issue, an alternative approach, known as polypharmacology, is being adopted to study small molecule interactions with multiple targets. Apart from developing more potent and effective drugs, this approach allows for studies of off-target activities and the facilitation of drug repositioning. Although exhaustive polypharmacology studies in-vitro or in-vivo are not practical, computational methods of predicting unknown targets or side effects are being developed. Areas covered: This article describes various computational approaches that have been developed to study polypharmacology profiles of small molecules. It also provides a brief description of the algorithms used in these state-of-the-art methods. Expert opinion: Recent success in computational prediction of multi-targeting drugs has established polypharmacology as a promising alternative approach to tackle some of the daunting complications in drug discovery. This will not only help discover more effective agents, but also present tremendous opportunities to study novel target pharmacology and facilitate drug repositioning efforts in the pharmaceutical industry.
Collapse
Affiliation(s)
- Rajan Chaudhari
- Integrated Molecular Discovery Laboratory, Department of Experimental Therapeutics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030
| | - Zhi Tan
- Integrated Molecular Discovery Laboratory, Department of Experimental Therapeutics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030
- The University of Texas Graduate School of Biomedical Sciences, Houston, TX 77030
| | - Beibei Huang
- Integrated Molecular Discovery Laboratory, Department of Experimental Therapeutics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030
| | - Shuxing Zhang
- Integrated Molecular Discovery Laboratory, Department of Experimental Therapeutics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030
- The University of Texas Graduate School of Biomedical Sciences, Houston, TX 77030
| |
Collapse
|
190
|
SELF-BLM: Prediction of drug-target interactions via self-training SVM. PLoS One 2017; 12:e0171839. [PMID: 28192537 PMCID: PMC5305209 DOI: 10.1371/journal.pone.0171839] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2016] [Accepted: 01/26/2017] [Indexed: 01/08/2023] Open
Abstract
Predicting drug-target interactions is important for the development of novel drugs and the repositioning of drugs. To predict such interactions, there are a number of methods based on drug and target protein similarity. Although these methods, such as the bipartite local model (BLM), show promise, they often categorize unknown interactions as negative interaction. Therefore, these methods are not ideal for finding potential drug-target interactions that have not yet been validated as positive interactions. Thus, here we propose a method that integrates machine learning techniques, such as self-training support vector machine (SVM) and BLM, to develop a self-training bipartite local model (SELF-BLM) that facilitates the identification of potential interactions. The method first categorizes unlabeled interactions and negative interactions among unknown interactions using a clustering method. Then, using the BLM method and self-training SVM, the unlabeled interactions are self-trained and final local classification models are constructed. When applied to four classes of proteins that include enzymes, G-protein coupled receptors (GPCRs), ion channels, and nuclear receptors, SELF-BLM showed the best performance for predicting not only known interactions but also potential interactions in three protein classes compare to other related studies. The implemented software and supporting data are available at https://github.com/GIST-CSBL/SELF-BLM.
Collapse
|
191
|
Liu H, Song Y, Guan J, Luo L, Zhuang Z. Inferring new indications for approved drugs via random walk on drug-disease heterogenous networks. BMC Bioinformatics 2016; 17:539. [PMID: 28155639 PMCID: PMC5259862 DOI: 10.1186/s12859-016-1336-7] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Background Since traditional drug research and development is often time-consuming and high-risk, there is an increasing interest in establishing new medical indications for approved drugs, referred to as drug repositioning, which provides a relatively low-cost and high-efficiency approach for drug discovery. With the explosive growth of large-scale biochemical and phenotypic data, drug repositioning holds great potential for precision medicine in the post-genomic era. It is urgent to develop rational and systematic approaches to predict new indications for approved drugs on a large scale. Results In this paper, we propose the two-pass random walks with restart on a heterogenous network, TP-NRWRH for short, to predict new indications for approved drugs. Rather than random walk on bipartite network, we integrated the drug-drug similarity network, disease-disease similarity network and known drug-disease association network into one heterogenous network, on which the two-pass random walks with restart is implemented. We have conducted performance evaluation on two datasets of drug-disease associations, and the results show that our method has higher performance than six existing methods. A case study on the Alzheimer’s disease showed that nine of top 10 predicted drugs have been approved or investigational for neurodegenerative diseases. The experimental results show that our method achieves state-of-the-art performance in predicting new indications for approved drugs. Conclusions We proposed a two-pass random walk with restart on the drug-disease heterogeneous network, referred to as TP-NRWRH, to predict new indications for approved drugs. Performance evaluation on two independent datasets showed that TP-NRWRH achieved higher performance than six existing methods on 10-fold cross validations. The case study on the Alzheimer’s disease showed that nine of top 10 predicted drugs have been approved or are investigational for neurodegenerative diseases. The results show that our method achieves state-of-the-art performance in predicting new indications for approved drugs.
Collapse
Affiliation(s)
- Hui Liu
- Changzhou NO. 7 People's Hospital, Changzhou, Jiangsu, 213011, China.,Changzhou University, Jiangsu, 213164, China
| | - Yinglong Song
- Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, 200433, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, Shanghai, 201804, China
| | - Libo Luo
- Changzhou NO. 7 People's Hospital, Changzhou, Jiangsu, 213011, China.
| | - Ziheng Zhuang
- Changzhou NO. 7 People's Hospital, Changzhou, Jiangsu, 213011, China. .,Changzhou University, Jiangsu, 213164, China.
| |
Collapse
|
192
|
Ezzat A, Wu M, Li XL, Kwoh CK. Drug-target interaction prediction via class imbalance-aware ensemble learning. BMC Bioinformatics 2016; 17:509. [PMID: 28155697 PMCID: PMC5259867 DOI: 10.1186/s12859-016-1377-y] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Multiple computational methods for predicting drug-target interactions have been developed to facilitate the drug discovery process. These methods use available data on known drug-target interactions to train classifiers with the purpose of predicting new undiscovered interactions. However, a key challenge regarding this data that has not yet been addressed by these methods, namely class imbalance, is potentially degrading the prediction performance. Class imbalance can be divided into two sub-problems. Firstly, the number of known interacting drug-target pairs is much smaller than that of non-interacting drug-target pairs. This imbalance ratio between interacting and non-interacting drug-target pairs is referred to as the between-class imbalance. Between-class imbalance degrades prediction performance due to the bias in prediction results towards the majority class (i.e. the non-interacting pairs), leading to more prediction errors in the minority class (i.e. the interacting pairs). Secondly, there are multiple types of drug-target interactions in the data with some types having relatively fewer members (or are less represented) than others. This variation in representation of the different interaction types leads to another kind of imbalance referred to as the within-class imbalance. In within-class imbalance, prediction results are biased towards the better represented interaction types, leading to more prediction errors in the less represented interaction types. RESULTS We propose an ensemble learning method that incorporates techniques to address the issues of between-class imbalance and within-class imbalance. Experiments show that the proposed method improves results over 4 state-of-the-art methods. In addition, we simulated cases for new drugs and targets to see how our method would perform in predicting their interactions. New drugs and targets are those for which no prior interactions are known. Our method displayed satisfactory prediction performance and was able to predict many of the interactions successfully. CONCLUSIONS Our proposed method has improved the prediction performance over the existing work, thus proving the importance of addressing problems pertaining to class imbalance in the data.
Collapse
Affiliation(s)
- Ali Ezzat
- School of Computer Science & Engineering, Nanyang Technological University, Nanyang Ave., Singapore, 639798, Singapore
| | - Min Wu
- Institute for Infocomm Research (I2R), A*Star, Fusionopolis Way, Singapore, 138632, Singapore
| | - Xiao-Li Li
- Institute for Infocomm Research (I2R), A*Star, Fusionopolis Way, Singapore, 138632, Singapore.
| | - Chee-Keong Kwoh
- School of Computer Science & Engineering, Nanyang Technological University, Nanyang Ave., Singapore, 639798, Singapore
| |
Collapse
|
193
|
Lim H, Gray P, Xie L, Poleksic A. Improved genome-scale multi-target virtual screening via a novel collaborative filtering approach to cold-start problem. Sci Rep 2016; 6:38860. [PMID: 27958331 PMCID: PMC5153628 DOI: 10.1038/srep38860] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Accepted: 11/15/2016] [Indexed: 12/18/2022] Open
Abstract
Conventional one-drug-one-gene approach has been of limited success in modern drug discovery. Polypharmacology, which focuses on searching for multi-targeted drugs to perturb disease-causing networks instead of designing selective ligands to target individual proteins, has emerged as a new drug discovery paradigm. Although many methods for single-target virtual screening have been developed to improve the efficiency of drug discovery, few of these algorithms are designed for polypharmacology. Here, we present a novel theoretical framework and a corresponding algorithm for genome-scale multi-target virtual screening based on the one-class collaborative filtering technique. Our method overcomes the sparseness of the protein-chemical interaction data by means of interaction matrix weighting and dual regularization from both chemicals and proteins. While the statistical foundation behind our method is general enough to encompass genome-wide drug off-target prediction, the program is specifically tailored to find protein targets for new chemicals with little to no available interaction data. We extensively evaluate our method using a number of the most widely accepted gene-specific and cross-gene family benchmarks and demonstrate that our method outperforms other state-of-the-art algorithms for predicting the interaction of new chemicals with multiple proteins. Thus, the proposed algorithm may provide a powerful tool for multi-target drug design.
Collapse
Affiliation(s)
- Hansaim Lim
- Department of Computer Science, Hunter College, The City University of New York, New York, New York 10065, United States
| | - Paul Gray
- Department of Computer Science, University of Northern Iowa, Cedar Falls, Iowa 50614, United States
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, New York 10065, United States.,Ph.D. Program in Computer Science, Biochemistry and Biology, The Graduate Center, The City University of New York, New York, New York 10065, United States
| | - Aleksandar Poleksic
- Department of Computer Science, University of Northern Iowa, Cedar Falls, Iowa 50614, United States
| |
Collapse
|
194
|
Abstract
Systems pharmacology aims to holistically understand mechanisms of drug actions to support drug discovery and clinical practice. Systems pharmacology modeling (SPM) is data driven. It integrates an exponentially growing amount of data at multiple scales (genetic, molecular, cellular, organismal, and environmental). The goal of SPM is to develop mechanistic or predictive multiscale models that are interpretable and actionable. The current explosions in genomics and other omics data, as well as the tremendous advances in big data technologies, have already enabled biologists to generate novel hypotheses and gain new knowledge through computational models of genome-wide, heterogeneous, and dynamic data sets. More work is needed to interpret and predict a drug response phenotype, which is dependent on many known and unknown factors. To gain a comprehensive understanding of drug actions, SPM requires close collaborations between domain experts from diverse fields and integration of heterogeneous models from biophysics, mathematics, statistics, machine learning, and semantic webs. This creates challenges in model management, model integration, model translation, and knowledge integration. In this review, we discuss several emergent issues in SPM and potential solutions using big data technology and analytics. The concurrent development of high-throughput techniques, cloud computing, data science, and the semantic web will likely allow SPM to be findable, accessible, interoperable, reusable, reliable, interpretable, and actionable.
Collapse
Affiliation(s)
- Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065; .,The Graduate Center, The City University of New York, New York, NY 10016
| | - Eli J Draizen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894; .,Program in Bioinformatics, Boston University, Boston, Massachusetts 02215
| | - Philip E Bourne
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894; .,Office of the Director, National Institutes of Health, Bethesda, Maryland 20894
| |
Collapse
|
195
|
Lim H, Poleksic A, Yao Y, Tong H, He D, Zhuang L, Meng P, Xie L. Large-Scale Off-Target Identification Using Fast and Accurate Dual Regularized One-Class Collaborative Filtering and Its Application to Drug Repurposing. PLoS Comput Biol 2016; 12:e1005135. [PMID: 27716836 PMCID: PMC5055357 DOI: 10.1371/journal.pcbi.1005135] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 09/08/2016] [Indexed: 12/19/2022] Open
Abstract
Target-based screening is one of the major approaches in drug discovery. Besides the intended target, unexpected drug off-target interactions often occur, and many of them have not been recognized and characterized. The off-target interactions can be responsible for either therapeutic or side effects. Thus, identifying the genome-wide off-targets of lead compounds or existing drugs will be critical for designing effective and safe drugs, and providing new opportunities for drug repurposing. Although many computational methods have been developed to predict drug-target interactions, they are either less accurate than the one that we are proposing here or computationally too intensive, thereby limiting their capability for large-scale off-target identification. In addition, the performances of most machine learning based algorithms have been mainly evaluated to predict off-target interactions in the same gene family for hundreds of chemicals. It is not clear how these algorithms perform in terms of detecting off-targets across gene families on a proteome scale. Here, we are presenting a fast and accurate off-target prediction method, REMAP, which is based on a dual regularized one-class collaborative filtering algorithm, to explore continuous chemical space, protein space, and their interactome on a large scale. When tested in a reliable, extensive, and cross-gene family benchmark, REMAP outperforms the state-of-the-art methods. Furthermore, REMAP is highly scalable. It can screen a dataset of 200 thousands chemicals against 20 thousands proteins within 2 hours. Using the reconstructed genome-wide target profile as the fingerprint of a chemical compound, we predicted that seven FDA-approved drugs can be repurposed as novel anti-cancer therapies. The anti-cancer activity of six of them is supported by experimental evidences. Thus, REMAP is a valuable addition to the existing in silico toolbox for drug target identification, drug repurposing, phenotypic screening, and side effect prediction. The software and benchmark are available at https://github.com/hansaimlim/REMAP. High-throughput techniques have generated vast amounts of diverse omics and phenotypic data. However, these sets of data have not yet been fully explored to improve the effectiveness and efficiency of drug discovery, a process which has traditionally adopted a one-drug-one-gene paradigm. Consequently, the cost of bringing a drug to market is astounding and the failure rate is daunting. The failure of the target-based drug discovery is in large part due to the fact that a drug rarely interacts only with its intended receptor, but also generally binds to other receptors. To rationally design potent and safe therapeutics, we need to identify all the possible cellular proteins interacting with a drug in an organism. Existing experimental techniques are not sufficient to address this problem, and will benefit from computational modeling. However, it is a daunting task to reliably screen millions of chemicals against hundreds of thousands of proteins. Here, we introduce a fast and accurate method REMAP for large-scale predictions of drug-target interactions. REMAP outperforms state-of-the-art algorithms in terms of both speed and accuracy, and has been successfully applied to drug repurposing. Thus, REMAP may have broad applications in drug discovery.
Collapse
Affiliation(s)
- Hansaim Lim
- The Graduate Center, The City University of New York, New York, New York, United States
| | - Aleksandar Poleksic
- Department of Computer Science, University of Northern Iowa, Cedar Falls, Iowa, United States
| | - Yuan Yao
- Department of Computer Science and Technology, Nanjing University, Nanjing, Jiangsu, China
| | - Hanghang Tong
- School of Computing, Informatics and Decision Systems Engineering, Arizona State University, Tempe, Arizona, United States
| | - Di He
- The Graduate Center, The City University of New York, New York, New York, United States
| | - Luke Zhuang
- Academy for Information Technology, Union County Vocational-Technical Schools, Scotch Plains, New Jersey, United States
| | - Patrick Meng
- High Technology High School, Lincroft, New Jersey, United States
| | - Lei Xie
- The Graduate Center, The City University of New York, New York, New York, United States
- Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States
- * E-mail:
| |
Collapse
|
196
|
Ammad-ud-din M, Khan SA, Malani D, Murumägi A, Kallioniemi O, Aittokallio T, Kaski S. Drug response prediction by inferring pathway-response associations with kernelized Bayesian matrix factorization. Bioinformatics 2016; 32:i455-i463. [DOI: 10.1093/bioinformatics/btw433] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
197
|
|
198
|
Shi JY, Li JX, Lu HM. Predicting existing targets for new drugs base on strategies for missing interactions. BMC Bioinformatics 2016; 17 Suppl 8:282. [PMID: 27585458 PMCID: PMC5009565 DOI: 10.1186/s12859-016-1118-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background There has been paid more and more attention to supervised classification models in the area of predicting drug-target interactions (DTIs). However, in terms of classification, unavoidable missing DTIs in data would cause three issues which have not yet been addressed appropriately by former approaches. Directly labeled as negatives (non-DTIs), missing DTIs increase the confusion of positives (DTIs) and negatives, aggravate the imbalance between few positives and many negatives, and are usually discriminated as highly-scored false positives, which influence the existing measures sharply. Results Under the framework of local classification model (LCM), this work focuses on the scenario of predicting how possibly a new drug interacts with known targets. To address the first two issues, two strategies, Spy and Super-target, are introduced accordingly and further integrated to form a two-layer LCM. In the bottom layer, Spy-based local classifiers for protein targets are built by positives, as well as reliable negatives identified among unlabeled drug-target pairs. In the top layer, regular local classifiers specific to super-targets are built with more positives generated by grouping similar targets and their interactions. Furthermore, to handle the third issue, an additional performance measure, Coverage, is presented for assessing DTI prediction. The experiments based on benchmark datasets are finally performed under five-fold cross validation of drugs to evaluate this approach. The main findings are concluded as follows. (1) Both two individual strategies and their combination are effective to missing DTIs, and the combination wins the best. (2) Having the advantages of less confusing decision boundary at the bottom layer and less biased decision boundary at the top layer, our two-layer LCM outperforms two former approaches. (3) Coverage is more robust to missing interactions than other measures and is able to evaluate how far one needs to go down the list of targets to cover all the proper targets of a drug. Conclusions Proposing two strategies and one performance measure, this work has addressed the issues derived from missing interactions, which cause confusing and biased decision boundaries in classifiers, as well as the inappropriate measure of predicting performance, in the scenario of predicting interactions between new drugs and known targets.
Collapse
Affiliation(s)
- Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Jia-Xin Li
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Hui-Meng Lu
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, 710072, China
| |
Collapse
|
199
|
Tetko IV, Engkvist O, Koch U, Reymond JL, Chen H. BIGCHEM: Challenges and Opportunities for Big Data Analysis in Chemistry. Mol Inform 2016; 35:615-621. [PMID: 27464907 PMCID: PMC5129546 DOI: 10.1002/minf.201600073] [Citation(s) in RCA: 68] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 07/06/2016] [Indexed: 01/19/2023]
Abstract
The increasing volume of biomedical data in chemistry and life sciences requires the development of new methods and approaches for their handling. Here, we briefly discuss some challenges and opportunities of this fast growing area of research with a focus on those to be addressed within the BIGCHEM project. The article starts with a brief description of some available resources for “Big Data” in chemistry and a discussion of the importance of data quality. We then discuss challenges with visualization of millions of compounds by combining chemical and biological data, the expectations from mining the “Big Data” using advanced machine‐learning methods, and their applications in polypharmacology prediction and target de‐convolution in phenotypic screening. We show that the efficient exploration of billions of molecules requires the development of smart strategies. We also address the issue of secure information sharing without disclosing chemical structures, which is critical to enable bi‐party or multi‐party data sharing. Data sharing is important in the context of the recent trend of “open innovation” in pharmaceutical industry, which has led to not only more information sharing among academics and pharma industries but also the so‐called “precompetitive” collaboration between pharma companies. At the end we highlight the importance of education in “Big Data” for further progress of this area.
Collapse
Affiliation(s)
- Igor V Tetko
- Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, b. 60w, D-85764, Neuherberg, Germany.,BIGCHEM GmbH, Ingolstädter Landstraße 1, b. 60w, D-85764, Neuherberg, Germany
| | - Ola Engkvist
- Discovery Sciences, AstraZeneca R&D Gothenburg, Pepparedsleden 1, Mölndal, SE-43183, Sweden
| | - Uwe Koch
- Lead Discovery Center GmbH, Otto-Hahn Strasse 15, Dortmund, 44227, Germany
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Hongming Chen
- Discovery Sciences, AstraZeneca R&D Gothenburg, Pepparedsleden 1, Mölndal, SE-43183, Sweden
| |
Collapse
|
200
|
Sridhar D, Fakhraei S, Getoor L. A probabilistic approach for collective similarity-based drug–drug interaction prediction. Bioinformatics 2016; 32:3175-3182. [DOI: 10.1093/bioinformatics/btw342] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Accepted: 05/22/2016] [Indexed: 01/09/2023] Open
|