1
|
Muniyappan S, Rayan AXA, Varrieth GT. DTiGNN: Learning drug-target embedding from a heterogeneous biological network based on a two-level attention-based graph neural network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:9530-9571. [PMID: 37161255 DOI: 10.3934/mbe.2023419] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
MOTIVATION In vitro experiment-based drug-target interaction (DTI) exploration demands more human, financial and data resources. In silico approaches have been recommended for predicting DTIs to reduce time and cost. During the drug development process, one can analyze the therapeutic effect of the drug for a particular disease by identifying how the drug binds to the target for treating that disease. Hence, DTI plays a major role in drug discovery. Many computational methods have been developed for DTI prediction. However, the existing methods have limitations in terms of capturing the interactions via multiple semantics between drug and target nodes in a heterogeneous biological network (HBN). METHODS In this paper, we propose a DTiGNN framework for identifying unknown drug-target pairs. The DTiGNN first calculates the similarity between the drug and target from multiple perspectives. Then, the features of drugs and targets from each perspective are learned separately by using a novel method termed an information entropy-based random walk. Next, all of the learned features from different perspectives are integrated into a single drug and target similarity network by using a multi-view convolutional neural network. Using the integrated similarity networks, drug interactions, drug-disease associations, protein interactions and protein-disease association, the HBN is constructed. Next, a novel embedding algorithm called a meta-graph guided graph neural network is used to learn the embedding of drugs and targets. Then, a convolutional neural network is employed to infer new DTIs after balancing the sample using oversampling techniques. RESULTS The DTiGNN is applied to various datasets, and the result shows better performance in terms of the area under receiver operating characteristic curve (AUC) and area under precision-recall curve (AUPR), with scores of 0.98 and 0.99, respectively. There are 23,739 newly predicted DTI pairs in total.
Collapse
Affiliation(s)
- Saranya Muniyappan
- Computer Science and Engineering, CEG Campus, Anna University, Tamil Nadu, India
| | | | | |
Collapse
|
2
|
Drug-target interaction prediction via an ensemble of weighted nearest neighbors with interaction recovery. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02495-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
3
|
Zhang Y, Jiang Z, Chen C, Wei Q, Gu H, Yu B. DeepStack-DTIs: Predicting Drug-Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier. Interdiscip Sci 2021; 14:311-330. [PMID: 34731411 DOI: 10.1007/s12539-021-00488-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Revised: 10/19/2021] [Accepted: 10/21/2021] [Indexed: 12/12/2022]
Abstract
Accurate prediction of drug-target interactions (DTIs), which is often used in the fields of drug discovery and drug repositioning, is regarded a key challenge in the study of drug science. In this paper, a new method called DeepStack-DTIs is proposed to predict DTIs. First, for the target protein, pseudo-position specific score matrix, pseudo amino acid composition and SPIDER3 are used to extract the different feature information of the target protein. Meanwhile, the path-based fingerprint features of each drug are extracted. Then, the synthetic minority oversampling technique (SMOTE) and light gradient boosting machine (LightGBM) are used for data balancing and feature selection, respectively. Finally, the processed features are input to the deep-stacked ensemble classifier composed of gated recurrent unit (GRU), deep neural network (DNN), support vector machine (SVM), eXtreme gradient boosting (XGBoost) and logistic regression (LR) to predict DTIs. Under the five-fold cross-validation and compared with existing methods, the proposed method achieves higher prediction accuracy on the gold standard dataset. To evaluate the predictive power of DeepStack-DTIs, we validate the method on another dataset and predict the drug-target interaction network. The results indicate that DeepStack-DTIs has excellent predictive ability than the other methods, and provides novel insights for the prediction of DTIs. A novel method DeepStack-DTIs for drug-target interactions prediction. PsePSSM, PseAAC, SPIDER3 and FP2 are fused to convert protein sequence and drug molecule information into digital information, respectively. The SMOTE algorithm is used to balance the dataset and LightGBM feature selection algorithm is employed to remove redundant and irrelevant features to select the optimal feature subset. This optimal feature subset is inputted into the deep-stacked ensemble classifier to predict drug-target interactions. The experimental results show DeepStack-DTIs method can significantly improve the prediction accuracy of drug-target interactions.
Collapse
Affiliation(s)
- Yan Zhang
- College of Mechanical and Electrical Engineering, Qingdao University of Science and Technology, Qingdao, 266061, China.,College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Zhiwen Jiang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Cheng Chen
- School of Computer Science and Technology, Shandong University, Qingdao, 266237, China
| | - Qinqin Wei
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Haiming Gu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China. .,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China. .,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, 571158, China.
| |
Collapse
|
4
|
Chen C, Shi H, Jiang Z, Salhi A, Chen R, Cui X, Yu B. DNN-DTIs: Improved drug-target interactions prediction using XGBoost feature selection and deep neural network. Comput Biol Med 2021; 136:104676. [PMID: 34375902 DOI: 10.1016/j.compbiomed.2021.104676] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Revised: 07/18/2021] [Accepted: 07/19/2021] [Indexed: 02/03/2023]
Abstract
Analysis and prediction of drug-target interactions (DTIs) play an important role in understanding drug mechanisms, as well as drug repositioning and design. Machine learning (ML)-based methods for DTIs prediction can mitigate the shortcomings of time-consuming and labor-intensive experimental approaches, while providing new ideas and insights for drug design. We propose a novel pipeline for predicting drug-target interactions, called DNN-DTIs. First, the target information is characterized by a number of features, namely, pseudo-amino acid composition, pseudo position-specific scoring matrix, conjoint triad composition, transition and distribution, Moreau-Broto autocorrelation, and structural features. The drug compounds are subsequently encoded using substructure fingerprints. Next, eXtreme gradient boosting (XGBoost) is used to determine the subset of non-redundant features of importance. The optimal balanced set of sample vectors is obtained by applying the synthetic minority oversampling technique (SMOTE). Finally, a DTIs predictor, DNN-DTIs, is developed based on a deep neural network (DNN) via a layer-by-layer learning scheme. Experimental results indicate that DNN-DTIs achieves better performance than other state-of-the-art predictors with ACC values of 98.78%, 98.60%, 97.98%, 98.24% and 98.00% on Enzyme, Ion Channels (IC), GPCR, Nuclear Receptors (NR) and Kuang's datasets. Therefore, the accurate prediction performance of DNN-DTIs makes it a favored choice for contributing to the study of DTIs, especially drug repositioning.
Collapse
Affiliation(s)
- Cheng Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China; School of Computer Science and Technology, Shandong University, Qingdao, 266237, China
| | - Han Shi
- Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China
| | - Zhiwen Jiang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Adil Salhi
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Ruixin Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Xuefeng Cui
- School of Computer Science and Technology, Shandong University, Qingdao, 266237, China
| | - Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China; Key Laboratory of Computational Science and Application of Hainan Province, Haikou, 571158, China.
| |
Collapse
|
5
|
Wang C, Kurgan L. Survey of Similarity-Based Prediction of Drug-Protein Interactions. Curr Med Chem 2021; 27:5856-5886. [PMID: 31393241 DOI: 10.2174/0929867326666190808154841] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Revised: 04/16/2018] [Accepted: 10/23/2018] [Indexed: 12/20/2022]
Abstract
Therapeutic activity of a significant majority of drugs is determined by their interactions with proteins. Databases of drug-protein interactions (DPIs) primarily focus on the therapeutic protein targets while the knowledge of the off-targets is fragmented and partial. One way to bridge this knowledge gap is to employ computational methods to predict protein targets for a given drug molecule, or interacting drugs for given protein targets. We survey a comprehensive set of 35 methods that were published in high-impact venues and that predict DPIs based on similarity between drugs and similarity between protein targets. We analyze the internal databases of known PDIs that these methods utilize to compute similarities, and investigate how they are linked to the 12 publicly available source databases. We discuss contents, impact and relationships between these internal and source databases, and well as the timeline of their releases and publications. The 35 predictors exploit and often combine three types of similarities that consider drug structures, drug profiles, and target sequences. We review the predictive architectures of these methods, their impact, and we explain how their internal DPIs databases are linked to the source databases. We also include a detailed timeline of the development of these predictors and discuss the underlying limitations of the current resources and predictive tools. Finally, we provide several recommendations concerning the future development of the related databases and methods.
Collapse
Affiliation(s)
- Chen Wang
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| |
Collapse
|
6
|
Ding Y, Tang J, Guo F. The Computational Models of Drug-target Interaction Prediction. Protein Pept Lett 2020; 27:348-358. [PMID: 30968771 DOI: 10.2174/0929866526666190410124110] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Revised: 02/22/2019] [Accepted: 04/02/2019] [Indexed: 12/19/2022]
Abstract
The identification of Drug-Target Interactions (DTIs) is an important process in drug discovery and medical research. However, the tradition experimental methods for DTIs identification are still time consuming, extremely expensive and challenging. In the past ten years, various computational methods have been developed to identify potential DTIs. In this paper, the identification methods of DTIs are summarized. What's more, several state-of-the-art computational methods are mainly introduced, containing network-based method and machine learning-based method. In particular, for machine learning-based methods, including the supervised and semisupervised models, have essential differences in the approach of negative samples. Although these effective computational models in identification of DTIs have achieved significant improvements, network-based and machine learning-based methods have their disadvantages, respectively. These computational methods are evaluated on four benchmark data sets via values of Area Under the Precision Recall curve (AUPR).
Collapse
Affiliation(s)
- Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States.,School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
7
|
Bagherian M, Sabeti E, Wang K, Sartor MA, Nikolovska-Coleska Z, Najarian K. Machine learning approaches and databases for prediction of drug-target interaction: a survey paper. Brief Bioinform 2020; 22:247-269. [PMID: 31950972 PMCID: PMC7820849 DOI: 10.1093/bib/bbz157] [Citation(s) in RCA: 161] [Impact Index Per Article: 40.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Revised: 11/01/2019] [Accepted: 11/07/2019] [Indexed: 12/12/2022] Open
Abstract
The task of predicting the interactions between drugs and targets plays a key role in the process of drug discovery. There is a need to develop novel and efficient prediction approaches in order to avoid costly and laborious yet not-always-deterministic experiments to determine drug–target interactions (DTIs) by experiments alone. These approaches should be capable of identifying the potential DTIs in a timely manner. In this article, we describe the data required for the task of DTI prediction followed by a comprehensive catalog consisting of machine learning methods and databases, which have been proposed and utilized to predict DTIs. The advantages and disadvantages of each set of methods are also briefly discussed. Lastly, the challenges one may face in prediction of DTI using machine learning approaches are highlighted and we conclude by shedding some lights on important future research directions.
Collapse
Affiliation(s)
- Maryam Bagherian
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Elyas Sabeti
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Kai Wang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Maureen A Sartor
- Department of Pathology, University of Michigan, Ann Arbor, MI, 48109, USA
| | | | - Kayvan Najarian
- Department of Electrical Engineering and Computer Science, College of Engineering, University of Michigan, Ann Arbor, MI, 48109, USA
| |
Collapse
|
8
|
Li Y, Huang YA, You ZH, Li LP, Wang Z. Drug-Target Interaction Prediction Based on Drug Fingerprint Information and Protein Sequence. Molecules 2019; 24:molecules24162999. [PMID: 31430892 PMCID: PMC6719962 DOI: 10.3390/molecules24162999] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Revised: 08/13/2019] [Accepted: 08/14/2019] [Indexed: 01/09/2023] Open
Abstract
The identification of drug-target interactions (DTIs) is a critical step in drug development. Experimental methods that are based on clinical trials to discover DTIs are time-consuming, expensive, and challenging. Therefore, as complementary to it, developing new computational methods for predicting novel DTI is of great significance with regards to saving cost and shortening the development period. In this paper, we present a novel computational model for predicting DTIs, which uses the sequence information of proteins and a rotation forest classifier. Specifically, all of the target protein sequences are first converted to a position-specific scoring matrix (PSSM) to retain evolutionary information. We then use local phase quantization (LPQ) descriptors to extract evolutionary information in the PSSM. On the other hand, substructure fingerprint information is utilized to extract the features of the drug. We finally combine the features of drugs and protein together to represent features of each drug-target pair and use a rotation forest classifier to calculate the scores of interaction possibility, for a global DTI prediction. The experimental results indicate that the proposed model is effective, achieving average accuracies of 89.15%, 86.01%, 82.20%, and 71.67% on four datasets (i.e., enzyme, ion channel, G protein-coupled receptors (GPCR), and nuclear receptor), respectively. In addition, we compared the prediction performance of the rotation forest classifier with another popular classifier, support vector machine, on the same dataset. Several types of methods previously proposed are also implemented on the same datasets for performance comparison. The comparison results demonstrate the superiority of the proposed method to the others. We anticipate that the proposed method can be used as an effective tool for predicting drug-target interactions on a large scale, given the information of protein sequences and drug fingerprints.
Collapse
Affiliation(s)
- Yang Li
- School of Information Engineering, Xijing University, Xi'an 710123, China
| | - Yu-An Huang
- School of Information Engineering, Xijing University, Xi'an 710123, China.
| | - Zhu-Hong You
- School of Information Engineering, Xijing University, Xi'an 710123, China.
| | - Li-Ping Li
- School of Information Engineering, Xijing University, Xi'an 710123, China
| | - Zheng Wang
- School of Information Engineering, Xijing University, Xi'an 710123, China
| |
Collapse
|
9
|
Xia LY, Yang ZY, Zhang H, Liang Y. Improved Prediction of Drug-Target Interactions Using Self-Paced Learning with Collaborative Matrix Factorization. J Chem Inf Model 2019; 59:3340-3351. [PMID: 31260620 DOI: 10.1021/acs.jcim.9b00408] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Identifying drug-target interactions (DTIs) plays an important role in the field of drug discovery, drug side-effects, and drug repositioning. However, in vivo or biochemical experimental methods for identifying new DTIs are extremely expensive and time-consuming. Recently, in silico or various computational methods have been developed for DTI prediction, such as ligand-based approaches and docking approaches, but these traditional computational methods have several limitations. This work utilizes the chemogenomic-based approaches for efficiently identifying potential DTI candidates, namely, self-paced learning with collaborative matrix factorization based on weighted low-rank approximation (SPLCMF) for DTI prediction, which integrates multiple networks related to drugs and targets into regularized least-squares and focuses on learning a low-dimensional vector representation of features. The SPLCMF framework can select samples from easy to complex into training by using soft weighting, which is inclined to more faithfully reflect the latent importance of samples in training. Experimental results on synthetic data and five benchmark data sets show that our proposed SPLCMF outperforms other existing state-of-the-art approaches. These results indicate that our proposed SPLCMF can provide a useful tool to predict unknown DTIs, which may provide new insights into drug discovery, drug side-effect prediction, and repositioning existing drug.
Collapse
Affiliation(s)
- Liang-Yong Xia
- Faculty of Information Technology , Macau University of Science and Technology , Macau , China 999078
| | - Zi-Yi Yang
- Faculty of Information Technology , Macau University of Science and Technology , Macau , China 999078
| | - Hui Zhang
- Faculty of Information Technology , Macau University of Science and Technology , Macau , China 999078
| | - Yong Liang
- Faculty of Information Technology , Macau University of Science and Technology , Macau , China 999078.,State Key Laboratory of Quality Research in Chinese Medicines , Macau University of Science and Technology , Macau , China 999078
| |
Collapse
|
10
|
Shi H, Liu S, Chen J, Li X, Ma Q, Yu B. Predicting drug-target interactions using Lasso with random forest based on evolutionary information and chemical structure. Genomics 2018; 111:1839-1852. [PMID: 30550813 DOI: 10.1016/j.ygeno.2018.12.007] [Citation(s) in RCA: 89] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Revised: 12/06/2018] [Accepted: 12/07/2018] [Indexed: 01/01/2023]
Abstract
The identification of drug-target interactions has great significance for pharmaceutical scientific research. Since traditional experimental methods identifying drug-target interactions is costly and time-consuming, the use of machine learning methods to predict potential drug-target interactions has attracted widespread attention. This paper presents a novel drug-target interactions prediction method called LRF-DTIs. Firstly, the pseudo-position specific scoring matrix (PsePSSM) and FP2 molecular fingerprinting were used to extract the features of drug-target. Secondly, using Lasso to reduce the dimension of the extracted feature information and then the Synthetic Minority Oversampling Technique (SMOTE) method was used to deal with unbalanced data. Finally, the processed feature vectors were input into a random forest (RF) classifier to predict drug-target interactions. Through 10 trials of 5-fold cross-validation, the overall prediction accuracies on the enzyme, ion channel (IC), G-protein-coupled receptor (GPCR) and nuclear receptor (NR) datasets reached 98.09%, 97.32%, 95.69%, and 94.88%, respectively, and compared with other prediction methods. In addition, we have tested and verified that our method not only could be applied to predict the new interactions but also could obtain a satisfactory result on the new dataset. All the experimental results indicate that our method can significantly improve the prediction accuracy of drug-target interactions and play a vital role in the new drug research and target protein development. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/LRF-DTIs/ for academic use.
Collapse
Affiliation(s)
- Han Shi
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China; Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China
| | - Simin Liu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China; Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China
| | - Junqi Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China; Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China
| | - Xuan Li
- Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China; School of Life Sciences, University of Science and Technology of China, Hefei 230027, China.
| |
Collapse
|
11
|
Hao M, Bryant SH, Wang Y. A new chemoinformatics approach with improved strategies for effective predictions of potential drugs. J Cheminform 2018; 10:50. [PMID: 30311095 PMCID: PMC6755712 DOI: 10.1186/s13321-018-0303-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2018] [Accepted: 10/02/2018] [Indexed: 12/24/2022] Open
Abstract
Background Fast and accurate identification of potential drug candidates against therapeutic targets (i.e., drug–target interactions, DTIs) is a fundamental step in the early drug discovery process. However, experimental determination of DTIs is time-consuming and costly, especially for testing the associations between the entire chemical and genomic spaces. Therefore, computationally efficient algorithms with accurate predictions are required to achieve such a challenging task. In this work, we design a new chemoinformatics approach derived from neighbor-based collaborative filtering (NBCF) to infer potential drug candidates for targets of interest. One of the fundamental steps of NBCF in the application of DTI predictions is to accurately measure the similarity between drugs solely based on the DTI profiles of known knowledge. However, commonly used similarity calculation methods such as COSINE may be noise-prone due to the extremely sparse property of the DTI bipartite network, which decreases the model performance of NBCF. We herein propose three strategies to remedy such a dilemma, which include: (1) adopting a positive pointwise mutual information (PPMI)-based similarity metric, which is noise-immune to some extent; (2) performing low-rank approximation of the original prediction scores; (3) incorporating auxiliary (complementary) information to produce the final predictions. Results We test the proposed methods in three benchmark datasets and the results indicate that our strategies are helpful to improve the NBCF performance for DTI predictions. Comparing to the prior algorithm, our methods exhibit better results assessed by a recall-based evaluation metric. Conclusions A new chemoinformatics approach with improved strategies was successfully developed to predict potential DTIs. Among them, the model based on the sparsity resistant PPMI similarity metric exhibits the best performance, which may be helpful to researchers for identifying potential drugs against therapeutic targets of interest, and can also be applied to related research such as identifying candidate disease genes.
Collapse
Affiliation(s)
- Ming Hao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Stephen H Bryant
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Yanli Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
12
|
Wang C, Kurgan L. Review and comparative assessment of similarity-based methods for prediction of drug–protein interactions in the druggable human proteome. Brief Bioinform 2018; 20:2066-2087. [DOI: 10.1093/bib/bby069] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 06/26/2018] [Accepted: 07/10/2018] [Indexed: 12/18/2022] Open
Abstract
AbstractDrug–protein interactions (DPIs) underlie the desired therapeutic actions and the adverse side effects of a significant majority of drugs. Computational prediction of DPIs facilitates research in drug discovery, characterization and repurposing. Similarity-based methods that do not require knowledge of protein structures are particularly suitable for druggable genome-wide predictions of DPIs. We review 35 high-impact similarity-based predictors that were published in the past decade. We group them based on three types of similarities and their combinations that they use. We discuss and compare key aspects of these methods including source databases, internal databases and their predictive models. Using our novel benchmark database, we perform comparative empirical analysis of predictive performance of seven types of representative predictors that utilize each type of similarity individually and all possible combinations of similarities. We assess predictive quality at the database-wide DPI level and we are the first to also include evaluation over individual drugs. Our comprehensive analysis shows that predictors that use more similarity types outperform methods that employ fewer similarities, and that the model combining all three types of similarities secures area under the receiver operating characteristic curve of 0.93. We offer a comprehensive analysis of sensitivity of predictive performance to intrinsic and extrinsic characteristics of the considered predictors. We find that predictive performance is sensitive to low levels of similarities between sequences of the drug targets and several extrinsic properties of the input drug structures, drug profiles and drug targets. The benchmark database and a webserver for the seven predictors are freely available at http://biomine.cs.vcu.edu/servers/CONNECTOR/.
Collapse
Affiliation(s)
- Chen Wang
- Computer Science Department, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Computer Science Department, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
13
|
Hu H, Zhu C, Ai H, Zhang L, Zhao J, Zhao Q, Liu H. LPI-ETSLP: lncRNA-protein interaction prediction using eigenvalue transformation-based semi-supervised link prediction. MOLECULAR BIOSYSTEMS 2018; 13:1781-1787. [PMID: 28702594 DOI: 10.1039/c7mb00290d] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
RNA-protein interactions are essential for understanding many important cellular processes. In particular, lncRNA-protein interactions play important roles in post-transcriptional gene regulation, such as splicing, translation, signaling and even the progression of complex diseases. However, the experimental validation of lncRNA-protein interactions remains time-consuming and expensive, and only a few theoretical approaches are available for predicting potential lncRNA-protein associations. Here, we presented eigenvalue transformation-based semi-supervised link prediction (LPI-ETSLP) to uncover the relationship between lncRNAs and proteins. Moreover, it is semi-supervised and does not need negative samples. Based on 5-fold cross validation, an AUC of 0.8876 and an AUPR of 0.6438 have demonstrated its reliable performance compared with three other computational models. Furthermore, the case study demonstrated that many lncRNA-protein interactions predicted by our method can be successfully confirmed by experiments. It is indicated that LPI-ETSLP would be a useful bioinformatics resource for biomedical research studies.
Collapse
Affiliation(s)
- Huan Hu
- School of Life Science, Liaoning University, Shenyang, 110036, China.
| | | | | | | | | | | | | |
Collapse
|
14
|
Cheng T, Hao M, Takeda T, Bryant SH, Wang Y. Large-Scale Prediction of Drug-Target Interaction: a Data-Centric Review. AAPS J 2017; 19:1264-1275. [PMID: 28577120 PMCID: PMC11097213 DOI: 10.1208/s12248-017-0092-6] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Accepted: 04/25/2017] [Indexed: 11/30/2022] Open
Abstract
The prediction of drug-target interactions (DTIs) is of extraordinary significance to modern drug discovery in terms of suggesting new drug candidates and repositioning old drugs. Despite technological advances, large-scale experimental determination of DTIs is still expensive and laborious. Effective and low-cost computational alternatives remain in strong need. Meanwhile, open-access resources have been rapidly growing with massive amount of bioactivity data becoming available, creating unprecedented opportunities for the development of novel in silico models for large-scale DTI prediction. In this work, we review the state-of-the-art computational approaches for identifying DTIs from a data-centric perspective: what the underlying data are and how they are utilized in each study. We also summarize popular public data resources and online tools for DTI prediction. It is found that various types of data were employed including properties of chemical structures, drug therapeutic effects and side effects, drug-target binding, drug-drug interactions, bioactivity data of drug molecules across multiple biological targets, and drug-induced gene expressions. More often, the heterogeneous data were integrated to offer better performance. However, challenges remain such as handling data imbalance, incorporating negative samples and quantitative bioactivity data, as well as maintaining cross-links among different data sources, which are essential for large-scale and automated information integration.
Collapse
Affiliation(s)
- Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Ming Hao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Takako Takeda
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Stephen H Bryant
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Yanli Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
15
|
Meng FR, You ZH, Chen X, Zhou Y, An JY. Prediction of Drug-Target Interaction Networks from the Integration of Protein Sequences and Drug Chemical Structures. Molecules 2017; 22:molecules22071119. [PMID: 28678206 PMCID: PMC6152073 DOI: 10.3390/molecules22071119] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2017] [Revised: 06/27/2017] [Accepted: 07/03/2017] [Indexed: 11/16/2022] Open
Abstract
Knowledge of drug–target interaction (DTI) plays an important role in discovering new drug candidates. Unfortunately, there are unavoidable shortcomings; including the time-consuming and expensive nature of the experimental method to predict DTI. Therefore, it motivates us to develop an effective computational method to predict DTI based on protein sequence. In the paper, we proposed a novel computational approach based on protein sequence, namely PDTPS (Predicting Drug Targets with Protein Sequence) to predict DTI. The PDTPS method combines Bi-gram probabilities (BIGP), Position Specific Scoring Matrix (PSSM), and Principal Component Analysis (PCA) with Relevance Vector Machine (RVM). In order to evaluate the prediction capacity of the PDTPS, the experiment was carried out on enzyme, ion channel, GPCR, and nuclear receptor datasets by using five-fold cross-validation tests. The proposed PDTPS method achieved average accuracy of 97.73%, 93.12%, 86.78%, and 87.78% on enzyme, ion channel, GPCR and nuclear receptor datasets, respectively. The experimental results showed that our method has good prediction performance. Furthermore, in order to further evaluate the prediction performance of the proposed PDTPS method, we compared it with the state-of-the-art support vector machine (SVM) classifier on enzyme and ion channel datasets, and other exiting methods on four datasets. The promising comparison results further demonstrate that the efficiency and robust of the proposed PDTPS method. This makes it a useful tool and suitable for predicting DTI, as well as other bioinformatics tasks.
Collapse
Affiliation(s)
- Fan-Rong Meng
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China.
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 21116, China.
| | - Yong Zhou
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China.
| | - Ji-Yong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 21116, China.
| |
Collapse
|
16
|
Hao M, Bryant SH, Wang Y. Predicting drug-target interactions by dual-network integrated logistic matrix factorization. Sci Rep 2017; 7:40376. [PMID: 28079135 PMCID: PMC5227688 DOI: 10.1038/srep40376] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Accepted: 12/05/2016] [Indexed: 12/20/2022] Open
Abstract
In this work, we propose a dual-network integrated logistic matrix factorization (DNILMF) algorithm to predict potential drug-target interactions (DTI). The prediction procedure consists of four steps: (1) inferring new drug/target profiles and constructing profile kernel matrix; (2) diffusing drug profile kernel matrix with drug structure kernel matrix; (3) diffusing target profile kernel matrix with target sequence kernel matrix; and (4) building DNILMF model and smoothing new drug/target predictions based on their neighbors. We compare our algorithm with the state-of-the-art method based on the benchmark dataset. Results indicate that the DNILMF algorithm outperforms the previously reported approaches in terms of AUPR (area under precision-recall curve) and AUC (area under curve of receiver operating characteristic) based on the 5 trials of 10-fold cross-validation. We conclude that the performance improvement depends on not only the proposed objective function, but also the used nonlinear diffusion technique which is important but under studied in the DTI prediction field. In addition, we also compile a new DTI dataset for increasing the diversity of currently available benchmark datasets. The top prediction results for the new dataset are confirmed by experimental studies or supported by other computational research.
Collapse
Affiliation(s)
- Ming Hao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Stephen H Bryant
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Yanli Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| |
Collapse
|
17
|
Peng L, Peng M, Liao B, Xiao Q, Liu W, Huang G, Li K. A novel information fusion strategy based on a regularized framework for identifying disease-related microRNAs. RSC Adv 2017. [DOI: 10.1039/c7ra08894a] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
This is the overall flowchart of RLSSLP. RLSSLP is a novel information fusion strategy based on regularized framework for revealing potential miRNA-disease associations.
Collapse
Affiliation(s)
- Li Peng
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
- College of Computer Science and Engineering
| | - Manman Peng
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Bo Liao
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Qiu Xiao
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Wei Liu
- College of Information Engineering
- XiangTan University
- Xiangtan
- China
| | - Guohua Huang
- College of Information Engineering
- Shaoyang University
- Shaoyang
- China
| | - Keqin Li
- Department of Computer Science
- State University of New York
- New York 12561
- USA
| |
Collapse
|