1
|
Li J, Sun L, Liu L, Li Z. MIFAM-DTI: a drug-target interactions predicting model based on multi-source information fusion and attention mechanism. Front Genet 2024; 15:1381997. [PMID: 38770418 PMCID: PMC11102998 DOI: 10.3389/fgene.2024.1381997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 04/15/2024] [Indexed: 05/22/2024] Open
Abstract
Accurate identification of potential drug-target pairs is a crucial step in drug development and drug repositioning, which is characterized by the ability of the drug to bind to and modulate the activity of the target molecule, resulting in the desired therapeutic effect. As machine learning and deep learning technologies advance, an increasing number of models are being engaged for the prediction of drug-target interactions. However, there is still a great challenge to improve the accuracy and efficiency of predicting. In this study, we proposed a deep learning method called Multi-source Information Fusion and Attention Mechanism for Drug-Target Interaction (MIFAM-DTI) to predict drug-target interactions. Firstly, the physicochemical property feature vector and the Molecular ACCess System molecular fingerprint feature vector of a drug were extracted based on its SMILES sequence. The dipeptide composition feature vector and the Evolutionary Scale Modeling -1b feature vector of a target were constructed based on its amino acid sequence information. Secondly, the PCA method was employed to reduce the dimensionality of the four feature vectors, and the adjacency matrices were constructed by calculating the cosine similarity. Thirdly, the two feature vectors of each drug were concatenated and the two adjacency matrices were subjected to a logical OR operation. And then they were fed into a model composed of graph attention network and multi-head self-attention to obtain the final drug feature vectors. With the same method, the final target feature vectors were obtained. Finally, these final feature vectors were concatenated, which served as the input to a fully connected layer, resulting in the prediction output. MIFAM-DTI not only integrated multi-source information to capture the drug and target features more comprehensively, but also utilized the graph attention network and multi-head self-attention to autonomously learn attention weights and more comprehensively capture information in sequence data. Experimental results demonstrated that MIFAM-DTI outperformed state-of-the-art methods in terms of AUC and AUPR. Case study results of coenzymes involved in cellular energy metabolism also demonstrated the effectiveness and practicality of MIFAM-DTI. The source code and experimental data for MIFAM-DTI are available at https://github.com/Search-AB/MIFAM-DTI.
Collapse
Affiliation(s)
- Jianwei Li
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | | | | | | |
Collapse
|
2
|
Adalia R, Patel S, Paiva A, Kaufman T, Zamora I, Cai X, Sanjuan G, Shou WZ. Development of a Predictive Multiple Reaction Monitoring (MRM) Model for High-Throughput ADME Analyses Using Learning-to-Rank (LTR) Techniques. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:131-139. [PMID: 38014625 DOI: 10.1021/jasms.3c00363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Multiple Reaction Monitoring (MRM) is an important MS/MS technique commonly used in drug discovery and development, allowing for the selective and sensitive quantification of compounds in complex matrices. However, compound optimization can be resource intensive and requires experimental determination of product ions for each compound. In this study, we developed a Learning-to-Rank (LTR) model to predict the product ions directly from compound structures, eliminating the requirement for MRM optimization experiments. Experimentally determined MRM conditions for 5757 compounds were used to develop the model. Using the MassChemSite software, theoretical fragments and their mass-to-charge ratios were generated, which were then matched to the experimental product ions to create a data set. Each possible fragment was ranked based on its intensity in the experimental data. Different LTR models were built on a training split. Hyperparameter selection was performed using 5-fold cross validation. The models were evaluated using the Normalized Discounted Cumulative Gain at top k (NDCG@k) and the Coverage at top k (Coverage@k) metrics. Finally, the model was applied to predict MRM conditions for a prospective set of 235 compounds in high-throughput Caco-2 permeability and metabolic stability assays, and quantification results were compared to those obtained with experimentally acquired MRM conditions. The LTR model achieved a NDCG@5 of 0.732 and Coverage@5 of 0.841 on the validation split, and its predictions led to 97% of biologically equivalent results in the Caco-2 permeability and metabolic stability assays.
Collapse
Affiliation(s)
- Ramon Adalia
- Lead Molecular Design S.L., 08172 Sant Cugat de Valles, Spain
- Universitat Autònoma de Barcelona, 08193 Barcelona, Spain
| | - Shivani Patel
- Lead Discovery and Optimization, Bristol-Myers Squibb, Princeton, New Jersey 08648, United States
| | - Anthony Paiva
- Lead Discovery and Optimization, Bristol-Myers Squibb, Princeton, New Jersey 08648, United States
| | - Tierni Kaufman
- Lead Discovery and Optimization, Bristol-Myers Squibb, Princeton, New Jersey 08648, United States
| | - Ismael Zamora
- Lead Molecular Design S.L., 08172 Sant Cugat de Valles, Spain
| | - Xianmei Cai
- Lead Discovery and Optimization, Bristol-Myers Squibb, Princeton, New Jersey 08648, United States
| | - Gemma Sanjuan
- Universitat Autònoma de Barcelona, 08193 Barcelona, Spain
| | - Wilson Z Shou
- Lead Discovery and Optimization, Bristol-Myers Squibb, Princeton, New Jersey 08648, United States
| |
Collapse
|
3
|
Wang L, Zhou Y, Chen Q. AMMVF-DTI: A Novel Model Predicting Drug-Target Interactions Based on Attention Mechanism and Multi-View Fusion. Int J Mol Sci 2023; 24:14142. [PMID: 37762445 PMCID: PMC10531525 DOI: 10.3390/ijms241814142] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 09/09/2023] [Accepted: 09/12/2023] [Indexed: 09/29/2023] Open
Abstract
Accurate identification of potential drug-target interactions (DTIs) is a crucial task in drug development and repositioning. Despite the remarkable progress achieved in recent years, improving the performance of DTI prediction still presents significant challenges. In this study, we propose a novel end-to-end deep learning model called AMMVF-DTI (attention mechanism and multi-view fusion), which leverages a multi-head self-attention mechanism to explore varying degrees of interaction between drugs and target proteins. More importantly, AMMVF-DTI extracts interactive features between drugs and proteins from both node-level and graph-level embeddings, enabling a more effective modeling of DTIs. This advantage is generally lacking in existing DTI prediction models. Consequently, when compared to many of the start-of-the-art methods, AMMVF-DTI demonstrated excellent performance on the human, C. elegans, and DrugBank baseline datasets, which can be attributed to its ability to incorporate interactive information and mine features from both local and global structures. The results from additional ablation experiments also confirmed the importance of each module in our AMMVF-DTI model. Finally, a case study is presented utilizing our model for COVID-19-related DTI prediction. We believe the AMMVF-DTI model can not only achieve reasonable accuracy in DTI prediction, but also provide insights into the understanding of potential interactions between drugs and targets.
Collapse
|
4
|
Zhang W, Liu B. iSnoDi-MDRF: Identifying snoRNA-Disease Associations Based on Multiple Biological Data by Ranking Framework. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3013-3019. [PMID: 37030816 DOI: 10.1109/tcbb.2023.3258448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Accumulating evidence indicates that the dysregulation of small nucleolar RNAs (snoRNAs) is relevant with diseases. Identifying snoRNA-disease associations by computational methods is desired for biologists, which can save considerable costs and time compared biological experiments. However, it still faces some challenges as followings: (i) Many snoRNAs are detected in recent years, but only a few snoRNAs have been proved to be associated with diseases; (ii) Computational predictors trained with only a few known snoRNA-disease associations fail to accurately identify the snoRNA-disease associations. In this study, we propose a ranking framework, called iSnoDi-MDRF, to identify potential snoRNA-disease associations based on multiple biological data, which has the following highlights: (i) iSnoDi-MDRF integrates ranking framework, which is not only able to identify potential associations between known snoRNAs and diseases, but also can identify diseases associated with new snoRNAs. (ii) Known gene-disease associations are employed to help train a mature model for predicting snoRNA-disease association. Experimental results illustrate that iSnoDi-MDRF is very suitable for identifying potential snoRNA-disease associations. The web server of iSnoDi-MDRF predictor is freely available at http://bliulab.net/iSnoDi-MDRF/.
Collapse
|
5
|
Li H, Liu B. BioSeq-Diabolo: Biological sequence similarity analysis using Diabolo. PLoS Comput Biol 2023; 19:e1011214. [PMID: 37339155 DOI: 10.1371/journal.pcbi.1011214] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 05/24/2023] [Indexed: 06/22/2023] Open
Abstract
As the key for biological sequence structure and function prediction, disease diagnosis and treatment, biological sequence similarity analysis has attracted more and more attentions. However, the exiting computational methods failed to accurately analyse the biological sequence similarities because of the various data types (DNA, RNA, protein, disease, etc) and their low sequence similarities (remote homology). Therefore, new concepts and techniques are desired to solve this challenging problem. Biological sequences (DNA, RNA and protein sequences) can be considered as the sentences of "the book of life", and their similarities can be considered as the biological language semantics (BLS). In this study, we are seeking the semantics analysis techniques derived from the natural language processing (NLP) to comprehensively and accurately analyse the biological sequence similarities. 27 semantics analysis methods derived from NLP were introduced to analyse biological sequence similarities, bringing new concepts and techniques to biological sequence similarity analysis. Experimental results show that these semantics analysis methods are able to facilitate the development of protein remote homology detection, circRNA-disease associations identification and protein function annotation, achieving better performance than the other state-of-the-art predictors in the related fields. Based on these semantics analysis methods, a platform called BioSeq-Diabolo has been constructed, which is named after a popular traditional sport in China. The users only need to input the embeddings of the biological sequence data. BioSeq-Diabolo will intelligently identify the task, and then accurately analyse the biological sequence similarities based on biological language semantics. BioSeq-Diabolo will integrate different biological sequence similarities in a supervised manner by using Learning to Rank (LTR), and the performance of the constructed methods will be evaluated and analysed so as to recommend the best methods for the users. The web server and stand-alone package of BioSeq-Diabolo can be accessed at http://bliulab.net/BioSeq-Diabolo/server/.
Collapse
Affiliation(s)
- Hongliang Li
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
6
|
Sun J, Xu M, Ru J, James-Bott A, Xiong D, Wang X, Cribbs AP. Small molecule-mediated targeting of microRNAs for drug discovery: Experiments, computational techniques, and disease implications. Eur J Med Chem 2023; 257:115500. [PMID: 37262996 DOI: 10.1016/j.ejmech.2023.115500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/05/2023] [Accepted: 05/15/2023] [Indexed: 06/03/2023]
Abstract
Small molecules have been providing medical breakthroughs for human diseases for more than a century. Recently, identifying small molecule inhibitors that target microRNAs (miRNAs) has gained importance, despite the challenges posed by labour-intensive screening experiments and the significant efforts required for medicinal chemistry optimization. Numerous experimentally-verified cases have demonstrated the potential of miRNA-targeted small molecule inhibitors for disease treatment. This new approach is grounded in their posttranscriptional regulation of the expression of disease-associated genes. Reversing dysregulated gene expression using this mechanism may help control dysfunctional pathways. Furthermore, the ongoing improvement of algorithms has allowed for the integration of computational strategies built on top of laboratory-based data, facilitating a more precise and rational design and discovery of lead compounds. To complement the use of extensive pharmacogenomics data in prioritising potential drugs, our previous work introduced a computational approach based on only molecular sequences. Moreover, various computational tools for predicting molecular interactions in biological networks using similarity-based inference techniques have been accumulated in established studies. However, there are a limited number of comprehensive reviews covering both computational and experimental drug discovery processes. In this review, we outline a cohesive overview of both biological and computational applications in miRNA-targeted drug discovery, along with their disease implications and clinical significance. Finally, utilizing drug-target interaction (DTIs) data from DrugBank, we showcase the effectiveness of deep learning for obtaining the physicochemical characterization of DTIs.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| | - Miaoer Xu
- Department of Biology, Emory University, Atlanta, GA, 30322, USA
| | - Jinlong Ru
- Chair of Prevention of Microbial Diseases, School of Life Sciences Weihenstephan, Technical University of Munich, Freising, 85354, Germany
| | - Anna James-Bott
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| | - Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY, 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Xia Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, China.
| | - Adam P Cribbs
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
| |
Collapse
|
7
|
Peng Y, Zhao S, Zeng Z, Hu X, Yin Z. LGBMDF: A cascade forest framework with LightGBM for predicting drug-target interactions. Front Microbiol 2023; 13:1092467. [PMID: 36687573 PMCID: PMC9849804 DOI: 10.3389/fmicb.2022.1092467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 12/07/2022] [Indexed: 01/07/2023] Open
Abstract
Prediction of drug-target interactions (DTIs) plays an important role in drug development. However, traditional laboratory methods to determine DTIs require a lot of time and capital costs. In recent years, many studies have shown that using machine learning methods to predict DTIs can speed up the drug development process and reduce capital costs. An excellent DTI prediction method should have both high prediction accuracy and low computational cost. In this study, we noticed that the previous research based on deep forests used XGBoost as the estimator in the cascade, we applied LightGBM instead of XGBoost to the cascade forest as the estimator, then the estimator group was determined experimentally as three LightGBMs and three ExtraTrees, this new model is called LGBMDF. We conducted 5-fold cross-validation on LGBMDF and other state-of-the-art methods using the same dataset, and compared their Sn, Sp, MCC, AUC and AUPR. Finally, we found that our method has better performance and faster calculation speed.
Collapse
|
8
|
Tian Z, Peng X, Fang H, Zhang W, Dai Q, Ye Y. MHADTI: predicting drug-target interactions via multiview heterogeneous information network embedding with hierarchical attention mechanisms. Brief Bioinform 2022; 23:6761042. [PMID: 36242566 DOI: 10.1093/bib/bbac434] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 08/19/2022] [Accepted: 09/08/2022] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION Discovering the drug-target interactions (DTIs) is a crucial step in drug development such as the identification of drug side effects and drug repositioning. Since identifying DTIs by web-biological experiments is time-consuming and costly, many computational-based approaches have been proposed and have become an efficient manner to infer the potential interactions. Although extensive effort is invested to solve this task, the prediction accuracy still needs to be improved. More especially, heterogeneous network-based approaches do not fully consider the complex structure and rich semantic information in these heterogeneous networks. Therefore, it is still a challenge to predict DTIs efficiently. RESULTS In this study, we develop a novel method via Multiview heterogeneous information network embedding with Hierarchical Attention mechanisms to discover potential Drug-Target Interactions (MHADTI). Firstly, MHADTI constructs different similarity networks for drugs and targets by utilizing their multisource information. Combined with the known DTI network, three drug-target heterogeneous information networks (HINs) with different views are established. Secondly, MHADTI learns embeddings of drugs and targets from multiview HINs with hierarchical attention mechanisms, which include the node-level, semantic-level and graph-level attentions. Lastly, MHADTI employs the multilayer perceptron to predict DTIs with the learned deep feature representations. The hierarchical attention mechanisms could fully consider the importance of nodes, meta-paths and graphs in learning the feature representations of drugs and targets, which makes their embeddings more comprehensively. Extensive experimental results demonstrate that MHADTI performs better than other SOTA prediction models. Moreover, analysis of prediction results for some interested drugs and targets further indicates that MHADTI has advantages in discovering DTIs. AVAILABILITY AND IMPLEMENTATION https://github.com/pxystudy/MHADTI.
Collapse
Affiliation(s)
- Zhen Tian
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Xiangyu Peng
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Haichuan Fang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Wenjie Zhang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Qiguo Dai
- School of Computer Science and Engineering, Dalian Minzu University, Dalian,116600, China
| | - Yangdong Ye
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| |
Collapse
|
9
|
Cheng Z, Zhao Q, Li Y, Wang J. IIFDTI: predicting drug-target interactions through interactive and independent features based on attention mechanism. Bioinformatics 2022; 38:4153-4161. [PMID: 35801934 DOI: 10.1093/bioinformatics/btac485] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 05/02/2022] [Accepted: 07/07/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Identifying drug-target interactions is a crucial step for drug discovery and design. Traditional biochemical experiments are credible to accurately validate drug-target interactions. However, they are also extremely laborious, time-consuming and expensive. With the collection of more validated biomedical data and the advancement of computing technology, the computational methods based on chemogenomics gradually attract more attention, which guide the experimental verifications. RESULTS In this study, we propose an end-to-end deep learning-based method named IIFDTI to predict drug-target interactions (DTIs) based on independent features of drug-target pairs and interactive features of their substructures. First, the interactive features of substructures between drugs and targets are extracted by the bidirectional encoder-decoder architecture. The independent features of drugs and targets are extracted by the graph neural networks and convolutional neural networks, respectively. Then, all extracted features are fused and inputted into fully connected dense layers in downstream tasks for predicting DTIs. IIFDTI takes into account the independent features of drugs/targets and simulates the interactive features of the substructures from the biological perspective. Multiple experiments show that IIFDTI outperforms the state-of-the-art methods in terms of the area under the receiver operating characteristics curve (AUC), the area under the precision-recall curve (AUPR), precision, and recall on benchmark datasets. In addition, the mapped visualizations of attention weights indicate that IIFDTI has learned the biological knowledge insights, and two case studies illustrate the capabilities of IIFDTI in practical applications. AVAILABILITY AND IMPLEMENTATION The data and codes underlying this article are available in Github at https://github.com/czjczj/IIFDTI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhongjian Cheng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Qichang Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
10
|
Zhang W, Hou J, Liu B. iPiDA-LTR: Identifying piwi-interacting RNA-disease associations based on Learning to Rank. PLoS Comput Biol 2022; 18:e1010404. [PMID: 35969645 PMCID: PMC9410559 DOI: 10.1371/journal.pcbi.1010404] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Revised: 08/25/2022] [Accepted: 07/18/2022] [Indexed: 12/01/2022] Open
Abstract
Piwi-interacting RNAs (piRNAs) are regarded as drug targets and biomarkers for the diagnosis and therapy of diseases. However, biological experiments cost substantial time and resources, and the existing computational methods only focus on identifying missing associations between known piRNAs and diseases. With the fast development of biological experiments, more and more piRNAs are detected. Therefore, the identification of piRNA-disease associations of newly detected piRNAs has significant theoretical value and practical significance on pathogenesis of diseases. In this study, the iPiDA-LTR predictor is proposed to identify associations between piRNAs and diseases based on Learning to Rank. The iPiDA-LTR predictor not only identifies the missing associations between known piRNAs and diseases, but also detects diseases associated with newly detected piRNAs. Experimental results demonstrate that iPiDA-LTR effectively predicts piRNA-disease associations outperforming the other related methods.
Collapse
Affiliation(s)
- Wenxiang Zhang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Jialu Hou
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
11
|
Zhang W, Wei H, Liu B. idenMD-NRF: a ranking framework for miRNA-disease association identification. Brief Bioinform 2022; 23:6604995. [PMID: 35679537 DOI: 10.1093/bib/bbac224] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 04/18/2022] [Accepted: 05/11/2022] [Indexed: 11/12/2022] Open
Abstract
Identifying miRNA-disease associations is an important task for revealing pathogenic mechanism of complicated diseases. Different computational methods have been proposed. Although these methods obtained encouraging performance for detecting missing associations between known miRNAs and diseases, how to accurately predict associated diseases for new miRNAs is still a difficult task. In this regard, a ranking framework named idenMD-NRF is proposed for miRNA-disease association identification. idenMD-NRF treats the miRNA-disease association identification as an information retrieval task. Given a novel query miRNA, idenMD-NRF employs Learning to Rank algorithm to rank associated diseases based on high-level association features and various predictors. The experimental results on two independent test datasets indicate that idenMD-NRF is superior to other compared predictors. A user-friendly web server of idenMD-NRF predictor is freely available at http://bliulab.net/idenMD-NRF/.
Collapse
Affiliation(s)
- Wenxiang Zhang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Hang Wei
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710071, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, 100081, China
| |
Collapse
|
12
|
Liu S, Cheng F, Ren B, Xu W, Chen C, Ma C, Zhang X, Tang F, Wang Q, Wang X. Qinzhi Zhudan formula improves memory and alleviates neuroinflammation in vascular dementia rats partly by inhibiting the TNFR1-mediated TNF pathway. JOURNAL OF TRADITIONAL CHINESE MEDICAL SCIENCES 2022. [DOI: 10.1016/j.jtcms.2022.06.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
|
13
|
Cheng Z, Yan C, Wu FX, Wang J. Drug-Target Interaction Prediction Using Multi-Head Self-Attention and Graph Attention Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2208-2218. [PMID: 33956632 DOI: 10.1109/tcbb.2021.3077905] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Identifying drug-target interactions (DTIs) is an important step in the process of new drug discovery and drug repositioning. Accurate predictions for DTIs can improve the efficiency in the drug discovery and development. Although rapid advances in deep learning technologies have generated various computational methods, it is still appealing to further investigate how to design efficient networks for predicting DTIs. In this study, we propose an end-to-end deep learning method (called MHSADTI) to predict DTIs based on the graph attention network and multi-head self-attention mechanism. First, the characteristics of drugs and proteins are extracted by the graph attention network and multi-head self-attention mechanism, respectively. Then, the attention scores are used to consider which amino acid subsequence in a protein is more important for the drug to predict its interactions. Finally, we predict DTIs by a fully connected layer after obtaining the feature vectors of drugs and proteins. MHSADTI takes advantage of self-attention mechanism for obtaining long-dependent contextual relationship in amino acid sequences and predicting DTI interpretability. More effective molecular characteristics are also obtained by the attention mechanism in graph attention networks. Multiple cross validation experiments are adopted to assess the performance of our MHSADTI. The experiments on four datasets, human, C.elegans, DUD-E and DrugBank show our method outperforms the state-of-the-art methods in terms of AUC, Precision, Recall, AUPR and F1-score. In addition, the case studies further demonstrate that our method can provide effective visualizations to interpret the prediction results from biological insights.
Collapse
|
14
|
Zong N, Li N, Wen A, Ngo V, Yu Y, Huang M, Chowdhury S, Jiang C, Fu S, Weinshilboum R, Jiang G, Hunter L, Liu H. BETA: a comprehensive benchmark for computational drug-target prediction. Brief Bioinform 2022; 23:6596989. [PMID: 35649342 PMCID: PMC9294420 DOI: 10.1093/bib/bbac199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 04/10/2022] [Accepted: 04/29/2022] [Indexed: 11/14/2022] Open
Abstract
Internal validation is the most popular evaluation strategy used for drug-target predictive models. The simple random shuffling in the cross-validation, however, is not always ideal to handle large, diverse and copious datasets as it could potentially introduce bias. Hence, these predictive models cannot be comprehensively evaluated to provide insight into their general performance on a variety of use-cases (e.g. permutations of different levels of connectiveness and categories in drug and target space, as well as validations based on different data sources). In this work, we introduce a benchmark, BETA, that aims to address this gap by (i) providing an extensive multipartite network consisting of 0.97 million biomedical concepts and 8.5 million associations, in addition to 62 million drug-drug and protein-protein similarities and (ii) presenting evaluation strategies that reflect seven cases (i.e. general, screening with different connectivity, target and drug screening based on categories, searching for specific drugs and targets and drug repurposing for specific diseases), a total of seven Tests (consisting of 344 Tasks in total) across multiple sampling and validation strategies. Six state-of-the-art methods covering two broad input data types (chemical structure- and gene sequence-based and network-based) were tested across all the developed Tasks. The best-worst performing cases have been analyzed to demonstrate the ability of the proposed benchmark to identify limitations of the tested methods for running over the benchmark tasks. The results highlight BETA as a benchmark in the selection of computational strategies for drug repurposing and target discovery.
Collapse
Affiliation(s)
- Nansu Zong
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Ning Li
- Center for Structure Biology, Center for Cancer Research, National Cancer Institute, Frederick, MD
| | - Andrew Wen
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Victoria Ngo
- Betty Irene Moore School of Nursing, University of California Davis Health, Sacramento, CA.,Stanford Health Policy, Stanford School of Medicine and Freeman Spogli Institute for International Studies, Palo Alto, CA
| | - Yue Yu
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Ming Huang
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Shaika Chowdhury
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Chao Jiang
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Richard Weinshilboum
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN
| | - Guoqian Jiang
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Lawrence Hunter
- Department of Pharmacology, University of Colorado Denver, Aurora, CO
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| |
Collapse
|
15
|
Ru X, Ye X, Sakurai T, Zou Q. NerLTR-DTA: drug-target binding affinity prediction based on neighbor relationship and learning to rank. Bioinformatics 2022; 38:1964-1971. [PMID: 35134828 DOI: 10.1093/bioinformatics/btac048] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 12/20/2021] [Accepted: 01/28/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Drug-target interaction prediction plays an important role in new drug discovery and drug repurposing. Binding affinity indicates the strength of drug-target interactions. Predicting drug-target binding affinity is expected to provide promising candidates for biologists, which can effectively reduce the workload of wet laboratory experiments and speed up the entire process of drug research. Given that, numerous new proteins are sequenced and compounds are synthesized, several improved computational methods have been proposed for such predictions, but there are still some challenges. (i) Many methods only discuss and implement one application scenario, they focus on drug repurposing and ignore the discovery of new drugs and targets. (ii) Many methods do not consider the priority order of proteins (or drugs) related to each target drug (or protein). Therefore, it is necessary to develop a comprehensive method that can be used in multiple scenarios and focuses on candidate order. RESULTS In this study, we propose a method called NerLTR-DTA that uses the neighbor relationship of similarity and sharing to extract features, and applies a ranking framework with regression attributes to predict affinity values and priority order of query drug (or query target) and its related proteins (or compounds). It is worth noting that using the characteristics of learning to rank to set different queries can smartly realize the multi-scenario application of the method, including the discovery of new drugs and new targets. Experimental results on two commonly used datasets show that NerLTR-DTA outperforms some state-of-the-art competing methods. NerLTR-DTA achieves excellent performance in all application scenarios mentioned in this study, and the rm(test)2 values guarantee such excellent performance is not obtained by chance. Moreover, it can be concluded that NerLTR-DTA can provide accurate ranking lists for the relevant results of most queries through the statistics of the association relationship of each query drug (or query protein). In general, NerLTR-DTA is a powerful tool for predicting drug-target associations and can contribute to new drug discovery and drug repurposing. AVAILABILITY AND IMPLEMENTATION The proposed method is implemented in Python and Java. Source codes and datasets are available at https://github.com/RUXIAOQING964914140/NerLTR-DTA.
Collapse
Affiliation(s)
- Xiaoqing Ru
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324000, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324000, China
| |
Collapse
|
16
|
Wu X, Zeng W, Lin F, Zhou X. NeuRank: learning to rank with neural networks for drug-target interaction prediction. BMC Bioinformatics 2021; 22:567. [PMID: 34836495 PMCID: PMC8620576 DOI: 10.1186/s12859-021-04476-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 11/08/2021] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Experimental verification of a drug discovery process is expensive and time-consuming. Therefore, recently, the demand to more efficiently and effectively identify drug-target interactions (DTIs) has intensified. RESULTS We treat the prediction of DTIs as a ranking problem and propose a neural network architecture, NeuRank, to address it. Also, we assume that similar drug compounds are likely to interact with similar target proteins. Thus, in our model, we add drug and target similarities, which are very effective at improving the prediction of DTIs. Then, we develop NeuRank from a point-wise to a pair-wise, and further to list-wise model. CONCLUSION Finally, results from extensive experiments on five public data sets (DrugBank, Enzymes, Ion Channels, G-Protein-Coupled Receptors, and Nuclear Receptors) show that, in identifying DTIs, our models achieve better performance than other state-of-the-art methods.
Collapse
Affiliation(s)
- Xiujin Wu
- School of Informatics, Xiamen University, Xiamen, China
| | - Wenhua Zeng
- School of Informatics, Xiamen University, Xiamen, China
| | - Fan Lin
- School of Informatics, Xiamen University, Xiamen, China
| | - Xiuze Zhou
- Shuye Technology Co., Ltd., Hangzhou, China
| |
Collapse
|
17
|
Zhang Y, Jiang Z, Chen C, Wei Q, Gu H, Yu B. DeepStack-DTIs: Predicting Drug-Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier. Interdiscip Sci 2021; 14:311-330. [PMID: 34731411 DOI: 10.1007/s12539-021-00488-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Revised: 10/19/2021] [Accepted: 10/21/2021] [Indexed: 12/12/2022]
Abstract
Accurate prediction of drug-target interactions (DTIs), which is often used in the fields of drug discovery and drug repositioning, is regarded a key challenge in the study of drug science. In this paper, a new method called DeepStack-DTIs is proposed to predict DTIs. First, for the target protein, pseudo-position specific score matrix, pseudo amino acid composition and SPIDER3 are used to extract the different feature information of the target protein. Meanwhile, the path-based fingerprint features of each drug are extracted. Then, the synthetic minority oversampling technique (SMOTE) and light gradient boosting machine (LightGBM) are used for data balancing and feature selection, respectively. Finally, the processed features are input to the deep-stacked ensemble classifier composed of gated recurrent unit (GRU), deep neural network (DNN), support vector machine (SVM), eXtreme gradient boosting (XGBoost) and logistic regression (LR) to predict DTIs. Under the five-fold cross-validation and compared with existing methods, the proposed method achieves higher prediction accuracy on the gold standard dataset. To evaluate the predictive power of DeepStack-DTIs, we validate the method on another dataset and predict the drug-target interaction network. The results indicate that DeepStack-DTIs has excellent predictive ability than the other methods, and provides novel insights for the prediction of DTIs. A novel method DeepStack-DTIs for drug-target interactions prediction. PsePSSM, PseAAC, SPIDER3 and FP2 are fused to convert protein sequence and drug molecule information into digital information, respectively. The SMOTE algorithm is used to balance the dataset and LightGBM feature selection algorithm is employed to remove redundant and irrelevant features to select the optimal feature subset. This optimal feature subset is inputted into the deep-stacked ensemble classifier to predict drug-target interactions. The experimental results show DeepStack-DTIs method can significantly improve the prediction accuracy of drug-target interactions.
Collapse
Affiliation(s)
- Yan Zhang
- College of Mechanical and Electrical Engineering, Qingdao University of Science and Technology, Qingdao, 266061, China.,College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Zhiwen Jiang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Cheng Chen
- School of Computer Science and Technology, Shandong University, Qingdao, 266237, China
| | - Qinqin Wei
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Haiming Gu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China.,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China. .,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China. .,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, 571158, China.
| |
Collapse
|
18
|
Prediction of Drug-Target Interactions by Combining Dual-Tree Complex Wavelet Transform with Ensemble Learning Method. Molecules 2021; 26:molecules26175359. [PMID: 34500792 PMCID: PMC8433937 DOI: 10.3390/molecules26175359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 08/27/2021] [Accepted: 08/30/2021] [Indexed: 11/17/2022] Open
Abstract
Identification of drug–target interactions (DTIs) is vital for drug discovery. However, traditional biological approaches have some unavoidable shortcomings, such as being time consuming and expensive. Therefore, there is an urgent need to develop novel and effective computational methods to predict DTIs in order to shorten the development cycles of new drugs. In this study, we present a novel computational approach to identify DTIs, which uses protein sequence information and the dual-tree complex wavelet transform (DTCWT). More specifically, a position-specific scoring matrix (PSSM) was performed on the target protein sequence to obtain its evolutionary information. Then, DTCWT was used to extract representative features from the PSSM, which were then combined with the drug fingerprint features to form the feature descriptors. Finally, these descriptors were sent to the Rotation Forest (RoF) model for classification. A 5-fold cross validation (CV) was adopted on four datasets (Enzyme, Ion Channel, GPCRs (G-protein-coupled receptors), and NRs (Nuclear Receptors)) to validate the proposed model; our method yielded high average accuracies of 89.21%, 85.49%, 81.02%, and 74.44%, respectively. To further verify the performance of our model, we compared the RoF classifier with two state-of-the-art algorithms: the support vector machine (SVM) and the k-nearest neighbor (KNN) classifier. We also compared it with some other published methods. Moreover, the prediction results for the independent dataset further indicated that our method is effective for predicting potential DTIs. Thus, we believe that our method is suitable for facilitating drug discovery and development.
Collapse
|
19
|
El-Behery H, Attia AF, El-Feshawy N, Torkey H. Efficient machine learning model for predicting drug-target interactions with case study for Covid-19. Comput Biol Chem 2021; 93:107536. [PMID: 34271420 PMCID: PMC8256690 DOI: 10.1016/j.compbiolchem.2021.107536] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 06/23/2021] [Accepted: 06/24/2021] [Indexed: 11/30/2022]
Abstract
BACKGROUND Discover possible Drug Target Interactions (DTIs) is a decisive step in the detection of the effects of drugs as well as drug repositioning. There is a strong incentive to develop effective computational methods that can effectively predict potential DTIs, as traditional DTI laboratory experiments are expensive, time-consuming, and labor-intensive. Some technologies have been developed for this purpose, however large numbers of interactions have not yet been detected, the accuracy of their prediction still low, and protein sequences and structured data are rarely used together in the prediction process. METHODS This paper presents DTIs prediction model that takes advantage of the special capacity of the structured form of proteins and drugs. Our model obtains features from protein amino-acid sequences using physical and chemical properties, and from drugs smiles (Simplified Molecular Input Line Entry System) strings using encoding techniques. Comparing the proposed model with different existing methods under K-fold cross validation, empirical results show that our model based on ensemble learning algorithms for DTI prediction provide more accurate results from both structures and features data. RESULTS The proposed model is applied on two datasets:Benchmark (feature only) datasets and DrugBank (Structure data) datasets. Experimental results obtained by Light-Boost and ExtraTree using structures and feature data results in 98 % accuracy and 0.97 f-score comparing to 94 % and 0.92 achieved by the existing methods. Moreover, our model can successfully predict more yet undiscovered interactions, and hence can be used as a practical tool to drug repositioning. A case study of applying our prediction model on the proteins that are known to be affected by Corona viruses in order to predict the possible interactions among these proteins and existing drugs is performed. Also, our model is applied on Covid-19 related drugs announced on DrugBank. The results show that some drugs like DB00691 and DB05203 are predicted with 100 % accuracy to interact with ACE2 protein. This protein is a self-membrane protein that enables Covid-19 infection. Hence, our model can be used as an effective tool in drug reposition to predict possible drug treatments for Covid-19.
Collapse
Affiliation(s)
- Heba El-Behery
- Department of Computer Science and Engineering, Faculty of Engineering, Kafrelsheikh University, Kafr_El_Sheikh, Egypt.
| | - Abdel-Fattah Attia
- Department of Computer Science and Engineering, Faculty of Engineering, Kafrelsheikh University, Kafr_El_Sheikh, Egypt.
| | - Nawal El-Feshawy
- Computer Science & Engineering Department, Faculty of Electronic Engineering, Menoufia University, Menouf, Egypt.
| | - Hanaa Torkey
- Computer Science & Engineering Department, Faculty of Electronic Engineering, Menoufia University, Menouf, Egypt.
| |
Collapse
|
20
|
Accurate acid dissociation constant (pK a) calculation for the sulfachloropyridazine and similar molecules. J Mol Model 2021; 27:233. [PMID: 34324066 DOI: 10.1007/s00894-021-04851-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 07/05/2021] [Indexed: 10/20/2022]
Abstract
Accurate calculation of the acid dissociation constant (pKa) has fundamental importance for the description of molecular systems with pharmacological activities. The search for a more appropriate procedure for its determination is always welcome and has aroused increasing interest from the scientific community. In this sense, this work presents a computational study involving the combination of ten DFT functionals (M062X, M06L, B3LYP, BLYP, PBEPBE, BP86, LC-BLYP, SPBE, CAM-B3LYP, LC-PBEPBE) and HF method, eight basis set functions (6-311G, 6-311 + G, 6-311G(d,p), 6-311 + G(d,p), 6-311+ +G(d,p), 6-311(2d,2p), 6-311+ +G(2d,2p), and aug-cc-pVDZ), and three solvation models (SMD, PCM, and CPCM) for an accurate sulfachloropyridazine (SCR) pKa determination. It was found that the smallest deviation (0.02 unit of pKa) between the current study and experimental result was achieved with the BLYP/6-311 + G(d,p)/PCM combination. Therefore, this combination was extended to calculate the pKa of six SCR similar molecules selected through the eletroshape similarity method. For all these molecules, the difference between the obtained results and experimental data ranged between 0.14 and 0.69 units of pKa. This feature suggests that the obtained combination can determine pKa with experimental precision for complexes that are formed by sulfonamide functional group (SO2NHR). Graphical Abstract A computational study involving the combination of different levels of theory, basis sets and solvation models for an accurate sulfanamide pKa determination.
Collapse
|
21
|
Pliakos K, Vens C, Tsoumakas G. Predicting Drug-Target Interactions With Multi-Label Classification and Label Partitioning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1596-1607. [PMID: 31689203 DOI: 10.1109/tcbb.2019.2951378] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Identifying drug-target interactions is crucial for drug discovery. Despite modern technologies used in drug screening, experimental identification of drug-target interactions is an extremely demanding task. Predicting drug-target interactions in silico can thereby facilitate drug discovery as well as drug repositioning. Various machine learning models have been developed over the years to predict such interactions. Multi-output learning models in particular have drawn the attention of the scientific community due to their high predictive performance and computational efficiency. These models are based on the assumption that all the labels are correlated with each other. However, this assumption is too optimistic. Here, we address drug-target interaction prediction as a multi-label classification task that is combined with label partitioning. We show that building multi-output learning models over groups (clusters) of labels often leads to superior results. The performed experiments confirm the efficiency of the proposed framework.
Collapse
|
22
|
Ru X, Ye X, Sakurai T, Zou Q, Xu L, Lin C. Current status and future prospects of drug-target interaction prediction. Brief Funct Genomics 2021; 20:312-322. [PMID: 34189559 DOI: 10.1093/bfgp/elab031] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Revised: 06/01/2021] [Accepted: 06/04/2021] [Indexed: 01/09/2023] Open
Abstract
Drug-target interaction prediction is important for drug development and drug repurposing. Many computational methods have been proposed for drug-target interaction prediction due to their potential to the time and cost reduction. In this review, we introduce the molecular docking and machine learning-based methods, which have been widely applied to drug-target interaction prediction. Particularly, machine learning-based methods are divided into different types according to the data processing form and task type. For each type of method, we provide a specific description and propose some solutions to improve its capability. The knowledge of heterogeneous network and learning to rank are also summarized in this review. As far as we know, this is the first comprehensive review that summarizes the knowledge of heterogeneous network and learning to rank in the drug-target interaction prediction. Moreover, we propose three aspects that can be explored in depth for future research.
Collapse
Affiliation(s)
| | - Xiucai Ye
- Department of Computer Science, and Center for Artificial Intelligence Research (C-AIR), University of Tsukuba
| | - Tetsuya Sakurai
- Department of Computer Science and is the director of the C-AIR, University of Tsukuba
| | - Quan Zou
- University of Electronic Science and Technology of China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic
| | | |
Collapse
|
23
|
Xu L, Ru X, Song R. Application of Machine Learning for Drug-Target Interaction Prediction. Front Genet 2021; 12:680117. [PMID: 34234813 PMCID: PMC8255962 DOI: 10.3389/fgene.2021.680117] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Accepted: 05/28/2021] [Indexed: 11/13/2022] Open
Abstract
Exploring drug–target interactions by biomedical experiments requires a lot of human, financial, and material resources. To save time and cost to meet the needs of the present generation, machine learning methods have been introduced into the prediction of drug–target interactions. The large amount of available drug and target data in existing databases, the evolving and innovative computer technologies, and the inherent characteristics of various types of machine learning have made machine learning techniques the mainstream method for drug–target interaction prediction research. In this review, details of the specific applications of machine learning in drug–target interaction prediction are summarized, the characteristics of each algorithm are analyzed, and the issues that need to be further addressed and explored for future research are discussed. The aim of this review is to provide a sound basis for the construction of high-performance models.
Collapse
Affiliation(s)
- Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Xiaoqing Ru
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| | - Rong Song
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| |
Collapse
|
24
|
Computational drug repositioning for ischemic stroke: neuroprotective drug discovery. Future Med Chem 2021; 13:1271-1283. [PMID: 34137272 DOI: 10.4155/fmc-2021-0022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Background: A comprehensive approach to drug repositioning will be required to overcome translational hurdles and identify more neuroprotective drugs. Results & methods: Gene Set Enrichment Analysis was applied to identify related pathways and enriched genes. Candidate genes were optimized using ToppGene, ToppGenet and pBRIT. From the perspective of the local structures, gene-domain-substructure-drug relationships were constructed. Using the MCODE algorithm and K-means clustering, 31 functional subnetworks were obtained, and 252 drugs with proposed neuroprotective function were identified. Using computational analysis, 72 substructures with different scores were found to correspond to neuroprotective functions. The protective effects of benidipine and barnidipine were confirmed in vitro. Conclusion: The authors' research has great potential to discover more neuroprotective drugs and obtain more information regarding mechanisms of action and functional substructures.
Collapse
|
25
|
Wei H, Xu Y, Liu B. iCircDA-LTR: identification of circRNA-disease associations based on Learning to Rank. Bioinformatics 2021; 37:3302-3310. [PMID: 33963827 DOI: 10.1093/bioinformatics/btab334] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 03/23/2021] [Accepted: 05/04/2021] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION Due to the inherent stability and close relationship with the progression of diseases, circRNAs are serving as important biomarkers and drug targets. Efficient predictors for identifying circRNA-disease associations are highly required. The existing predictors consider circRNA-disease association prediction as a classification task or a recommendation problem, failing to capture the ranking information among the associations and detect the diseases associated with new circRNAs. However, more and more circRNAs are discovered. Identification of the diseases associated with these new circRNAs remains a challenging task. RESULTS In this study, we proposed a new predictor called iCricDA-LTR for circRNA-disease association prediction. Different from any existing predictor, iCricDA-LTR employed a ranking framework to model the global ranking associations among the query circRNAs and the diseases. The Learning to Rank (LTR) algorithm was employed to rank the associations based on various predictors and features in a supervised manner. The experimental results on two independent test datasets showed that iCircDA-LTR outperformed the other competing methods, especially for predicting the diseases associated with new circRNAs. As a result, iCircDA-LTR is more suitable for the real world applications. AVAILABILITY For the convenience of researchers to detect new circRNA-disease associations. The web server of iCircDA-LTR was established and freely available at http://bliulab.net/iCircDA-LTR/.
Collapse
Affiliation(s)
- Hang Wei
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Yong Xu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China.,School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
26
|
Vatansever S, Schlessinger A, Wacker D, Kaniskan HÜ, Jin J, Zhou M, Zhang B. Artificial intelligence and machine learning-aided drug discovery in central nervous system diseases: State-of-the-arts and future directions. Med Res Rev 2021; 41:1427-1473. [PMID: 33295676 PMCID: PMC8043990 DOI: 10.1002/med.21764] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/30/2020] [Accepted: 11/20/2020] [Indexed: 01/11/2023]
Abstract
Neurological disorders significantly outnumber diseases in other therapeutic areas. However, developing drugs for central nervous system (CNS) disorders remains the most challenging area in drug discovery, accompanied with the long timelines and high attrition rates. With the rapid growth of biomedical data enabled by advanced experimental technologies, artificial intelligence (AI) and machine learning (ML) have emerged as an indispensable tool to draw meaningful insights and improve decision making in drug discovery. Thanks to the advancements in AI and ML algorithms, now the AI/ML-driven solutions have an unprecedented potential to accelerate the process of CNS drug discovery with better success rate. In this review, we comprehensively summarize AI/ML-powered pharmaceutical discovery efforts and their implementations in the CNS area. After introducing the AI/ML models as well as the conceptualization and data preparation, we outline the applications of AI/ML technologies to several key procedures in drug discovery, including target identification, compound screening, hit/lead generation and optimization, drug response and synergy prediction, de novo drug design, and drug repurposing. We review the current state-of-the-art of AI/ML-guided CNS drug discovery, focusing on blood-brain barrier permeability prediction and implementation into therapeutic discovery for neurological diseases. Finally, we discuss the major challenges and limitations of current approaches and possible future directions that may provide resolutions to these difficulties.
Collapse
Affiliation(s)
- Sezen Vatansever
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Avner Schlessinger
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Daniel Wacker
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - H. Ümit Kaniskan
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Jian Jin
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Ming‐Ming Zhou
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Bin Zhang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| |
Collapse
|
27
|
Zhou M, Zheng C, Xu R. Combining phenome-driven drug-target interaction prediction with patients' electronic health records-based clinical corroboration toward drug discovery. Bioinformatics 2021; 36:i436-i444. [PMID: 32657406 PMCID: PMC7355254 DOI: 10.1093/bioinformatics/btaa451] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Motivation Predicting drug–target interactions (DTIs) using human phenotypic data have the potential in eliminating the translational gap between animal experiments and clinical outcomes in humans. One challenge in human phenome-driven DTI predictions is integrating and modeling diverse drug and disease phenotypic relationships. Leveraging large amounts of clinical observed phenotypes of drugs and diseases and electronic health records (EHRs) of 72 million patients, we developed a novel integrated computational drug discovery approach by seamlessly combining DTI prediction and clinical corroboration. Results We developed a network-based DTI prediction system (TargetPredict) by modeling 855 904 phenotypic and genetic relationships among 1430 drugs, 4251 side effects, 1059 diseases and 17 860 genes. We systematically evaluated TargetPredict in de novo cross-validation and compared it to a state-of-the-art phenome-driven DTI prediction approach. We applied TargetPredict in identifying novel repositioned candidate drugs for Alzheimer’s disease (AD), a disease affecting over 5.8 million people in the United States. We evaluated the clinical efficiency of top repositioned drug candidates using EHRs of over 72 million patients. The area under the receiver operating characteristic (ROC) curve was 0.97 in the de novo cross-validation when evaluated using 910 drugs. TargetPredict outperformed a state-of-the-art phenome-driven DTI prediction system as measured by precision–recall curves [measured by average precision (MAP): 0.28 versus 0.23, P-value < 0.0001]. The EHR-based case–control studies identified that the prescriptions top-ranked repositioned drugs are significantly associated with lower odds of AD diagnosis. For example, we showed that the prescription of liraglutide, a type 2 diabetes drug, is significantly associated with decreased risk of AD diagnosis [adjusted odds ratios (AORs): 0.76; 95% confidence intervals (CI) (0.70, 0.82), P-value < 0.0001]. In summary, our integrated approach that seamlessly combines computational DTI prediction and large-scale patients’ EHRs-based clinical corroboration has high potential in rapidly identifying novel drug targets and drug candidates for complex diseases. Availability and implementation nlp.case.edu/public/data/TargetPredict.
Collapse
Affiliation(s)
- Mengshi Zhou
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA.,Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Chunlei Zheng
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Rong Xu
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA
| |
Collapse
|
28
|
Drug-Target Interaction Prediction Based on Adversarial Bayesian Personalized Ranking. BIOMED RESEARCH INTERNATIONAL 2021; 2021:6690154. [PMID: 33628808 PMCID: PMC7889346 DOI: 10.1155/2021/6690154] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 01/17/2021] [Accepted: 01/23/2021] [Indexed: 12/13/2022]
Abstract
The prediction of drug-target interaction (DTI) is a key step in drug repositioning. In recent years, many studies have tried to use matrix factorization to predict DTI, but they only use known DTIs and ignore the features of drug and target expression profiles, resulting in limited prediction performance. In this study, we propose a new DTI prediction model named AdvB-DTI. Within this model, the features of drug and target expression profiles are associated with Adversarial Bayesian Personalized Ranking through matrix factorization. Firstly, according to the known drug-target relationships, a set of ternary partial order relationships is generated. Next, these partial order relationships are used to train the latent factor matrix of drugs and targets using the Adversarial Bayesian Personalized Ranking method, and the matrix factorization is improved by the features of drug and target expression profiles. Finally, the scores of drug-target pairs are achieved by the inner product of latent factors, and the DTI prediction is performed based on the score ranking. The proposed model effectively takes advantage of the idea of learning to rank to overcome the problem of data sparsity, and perturbation factors are introduced to make the model more robust. Experimental results show that our model could achieve a better DTI prediction performance.
Collapse
|
29
|
Ru X, Ye X, Sakurai T, Zou Q. Application of learning to rank in bioinformatics tasks. Brief Bioinform 2021; 22:6102666. [PMID: 33454758 DOI: 10.1093/bib/bbaa394] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Revised: 11/09/2020] [Accepted: 11/24/2020] [Indexed: 12/17/2022] Open
Abstract
Over the past decades, learning to rank (LTR) algorithms have been gradually applied to bioinformatics. Such methods have shown significant advantages in multiple research tasks in this field. Therefore, it is necessary to summarize and discuss the application of these algorithms so that these algorithms are convenient and contribute to bioinformatics. In this paper, the characteristics of LTR algorithms and their strengths over other types of algorithms are analyzed based on the application of multiple perspectives in bioinformatics. Finally, the paper further discusses the shortcomings of the LTR algorithms, the methods and means to better use the algorithms and some open problems that currently exist.
Collapse
Affiliation(s)
| | - Xiucai Ye
- Department of Computer Science and Center for Artificial Intelligence Research (C-AIR), University of Tsukuba
| | | | - Quan Zou
- University of Electronic Science and Technology of China
| |
Collapse
|
30
|
Wang C, Kurgan L. Survey of Similarity-Based Prediction of Drug-Protein Interactions. Curr Med Chem 2021; 27:5856-5886. [PMID: 31393241 DOI: 10.2174/0929867326666190808154841] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Revised: 04/16/2018] [Accepted: 10/23/2018] [Indexed: 12/20/2022]
Abstract
Therapeutic activity of a significant majority of drugs is determined by their interactions with proteins. Databases of drug-protein interactions (DPIs) primarily focus on the therapeutic protein targets while the knowledge of the off-targets is fragmented and partial. One way to bridge this knowledge gap is to employ computational methods to predict protein targets for a given drug molecule, or interacting drugs for given protein targets. We survey a comprehensive set of 35 methods that were published in high-impact venues and that predict DPIs based on similarity between drugs and similarity between protein targets. We analyze the internal databases of known PDIs that these methods utilize to compute similarities, and investigate how they are linked to the 12 publicly available source databases. We discuss contents, impact and relationships between these internal and source databases, and well as the timeline of their releases and publications. The 35 predictors exploit and often combine three types of similarities that consider drug structures, drug profiles, and target sequences. We review the predictive architectures of these methods, their impact, and we explain how their internal DPIs databases are linked to the source databases. We also include a detailed timeline of the development of these predictors and discuss the underlying limitations of the current resources and predictive tools. Finally, we provide several recommendations concerning the future development of the related databases and methods.
Collapse
Affiliation(s)
- Chen Wang
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| |
Collapse
|
31
|
He S, Wen Y, Yang X, Liu Z, Song X, Huang X, Bo X. PIMD: An Integrative Approach for Drug Repositioning Using Multiple Characterization Fusion. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 18:565-581. [PMID: 33075523 PMCID: PMC8377380 DOI: 10.1016/j.gpb.2018.10.012] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Revised: 09/21/2018] [Accepted: 10/10/2018] [Indexed: 11/28/2022]
Abstract
The accumulation of various types of drug informatics data and computational approaches for drug repositioning can accelerate pharmaceutical research and development. However, the integration of multi-dimensional drug data for precision repositioning remains a pressing challenge. Here, we propose a systematic framework named PIMD to predict drug therapeutic properties by integrating multi-dimensional data for drug repositioning. In PIMD, drug similarity networks (DSNs) based on chemical, pharmacological, and clinical data are fused into an integrated DSN (iDSN) composed of many clusters. Rather than simple fusion, PIMD offers a systematic way to annotate clusters. Unexpected drugs within clusters and drug pairs with a high iDSN similarity score are therefore identified to predict novel therapeutic uses. PIMD provides new insights into the universality, individuality, and complementarity of different drug properties by evaluating the contribution of each property data. To test the performance of PIMD, we use chemical, pharmacological, and clinical properties to generate an iDSN. Analyses of the contributions of each drug property indicate that this iDSN was driven by all data types and performs better than other DSNs. Within the top 20 recommended drug pairs, 7 drugs have been reported to be repurposed. The source code for PIMD is available at https://github.com/Sepstar/PIMD/.
Collapse
Affiliation(s)
- Song He
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Yuqi Wen
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Xiaoxi Yang
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Zhen Liu
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Xinyu Song
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Xin Huang
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China.
| |
Collapse
|
32
|
Yang S, Ye Q, Ding J, Yin, Lu A, Chen X, Hou T, Cao D. Current advances in ligand‐based target prediction. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1504] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Su‐Qing Yang
- Xiangya School of Pharmaceutical Sciences Central South University Changsha Hunan China
| | - Qing Ye
- College of Pharmaceutical Sciences Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University Hangzhou, Zhejiang China
| | - Jun‐Jie Ding
- Beijing Institute of Pharmaceutical Chemistry Beijing China
| | - Yin
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital Central South University Changsha Hunan China
| | - Ai‐Ping Lu
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong China
| | - Xiang Chen
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis, Xiangya Hospital Central South University Changsha Hunan China
| | - Ting‐Jun Hou
- College of Pharmaceutical Sciences Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University Hangzhou, Zhejiang China
| | - Dong‐Sheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha Hunan China
- Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong China
| |
Collapse
|
33
|
Chu Y, Shan X, Chen T, Jiang M, Wang Y, Wang Q, Salahub DR, Xiong Y, Wei DQ. DTI-MLCD: predicting drug-target interactions using multi-label learning with community detection method. Brief Bioinform 2020; 22:5910189. [PMID: 32964234 DOI: 10.1093/bib/bbaa205] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Revised: 08/06/2020] [Accepted: 08/10/2020] [Indexed: 12/20/2022] Open
Abstract
Identifying drug-target interactions (DTIs) is an important step for drug discovery and drug repositioning. To reduce the experimental cost, a large number of computational approaches have been proposed for this task. The machine learning-based models, especially binary classification models, have been developed to predict whether a drug-target pair interacts or not. However, there is still much room for improvement in the performance of current methods. Multi-label learning can overcome some difficulties caused by single-label learning in order to improve the predictive performance. The key challenge faced by multi-label learning is the exponential-sized output space, and considering label correlations can help to overcome this challenge. In this paper, we facilitate multi-label classification by introducing community detection methods for DTI prediction, named DTI-MLCD. Moreover, we updated the gold standard data set by adding 15,000 more positive DTI samples in comparison to the data set, which has widely been used by most of previously published DTI prediction methods since 2008. The proposed DTI-MLCD is applied to both data sets, demonstrating its superiority over other machine learning methods and several existing methods. The data sets and source code of this study are freely available at https://github.com/a96123155/DTI-MLCD.
Collapse
Affiliation(s)
- Yanyi Chu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| | - Xiaoqi Shan
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| | - Tianhang Chen
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| | - Mingming Jiang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| | - Yanjing Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| | - Qiankun Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| | | | - Yi Xiong
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| | - Dong-Qing Wei
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| |
Collapse
|
34
|
Huang J, Chen J, Zhang B, Zhu L, Cai H. Evaluation of gene-drug common module identification methods using pharmacogenomics data. Brief Bioinform 2020; 22:5860683. [PMID: 32591780 DOI: 10.1093/bib/bbaa087] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 04/06/2020] [Accepted: 04/23/2020] [Indexed: 01/21/2023] Open
Abstract
Accurately identifying the interactions between genomic factors and the response of cancer drugs plays important roles in drug discovery, drug repositioning and cancer treatment. A number of studies revealed that interactions between genes and drugs were 'many-genes-to-many drugs' interactions, i.e. common modules, opposed to 'one-gene-to-one-drug' interactions. Such modules fully explain the interactions between complex biological regulatory mechanisms and cancer drugs. However, strategies for effectively and robustly identifying the underlying common modules among pharmacogenomics data remain to be improved. In this paper, we aim to provide a detailed evaluation of three categories of state-of-the-art common module identification techniques from a machine learning perspective, including non-negative matrix factorization (NMF), partial least squares (PLS) and network analyses. We first evaluate the performance of six methods, namely SNMNMF, NetNMF, SNPLS, O2PLS, NSBM and HOGMMNC, using two series of simulated data sets with different noise levels and outlier ratios. Then, we conduct experiments using a real world data set of 2091 genes and 101 drugs in 392 cancer cell lines and compare the real experimental results from the aspect of biological process term enrichment, gene-drug and drug-drug interactions. Finally, we present interesting findings from our evaluation study and discuss the advantages and drawbacks of each method. Supplementary information: Supplementary file is available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Jie Huang
- South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China
| | - Jiazhou Chen
- South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China
| | - Bin Zhang
- South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China
| | - Lei Zhu
- South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China
| | - Hongmin Cai
- South China University of Technology, School of Computer Science and Engineering, Guangzhou, 510006, China
| |
Collapse
|
35
|
Chen J, Wong KC. RNCE: network integration with reciprocal neighbors contextual encoding for multi-modal drug community study on cancer targets. Brief Bioinform 2020; 22:5861765. [PMID: 32577712 DOI: 10.1093/bib/bbaa118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Revised: 04/29/2020] [Indexed: 11/14/2022] Open
Abstract
Mining drug targets and mechanisms of action (MoA) for novel anticancer drugs from pharmacogenomic data is a path to enhance the drug discovery efficiency. Recent approaches have successfully attempted to discover targets/MoA by characterizing drug similarities and communities with integrative methods on multi-modal or multi-omics drug information. However, the sparse and imbalanced community size structure of the drug network is seldom considered in recent approaches. Consequently, we developed a novel network integration approach accounting for network structure by a reciprocal nearest neighbor and contextual information encoding (RNCE) approach. In addition, we proposed a tailor-made clustering algorithm to perform drug community detection on drug networks. RNCE and spectral clustering are proved to outperform state-of-the-art approaches in a series of tests, including network similarity tests and community detection tests on two drug databases. The observed improvement of RNCE can contribute to the field of drug discovery and the related multi-modal/multi-omics integrative studies. Availabilityhttps://github.com/WINGHARE/RNCE.
Collapse
Affiliation(s)
- Junyi Chen
- Department of Computer Science, City University of Hong Kong
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong
| |
Collapse
|
36
|
Ma C, Wang X, Xu T, Zhang S, Liu S, Zhai C, Wang Z, Mu J, Li C, Cheng F, Wang Q. An Integrative Pharmacology-Based Analysis of Refined Qingkailing Injection Against Cerebral Ischemic Stroke: A Novel Combination of Baicalin, Geniposide, Cholic Acid, and Hyodeoxycholic Acid. Front Pharmacol 2020; 11:519. [PMID: 32457601 PMCID: PMC7227481 DOI: 10.3389/fphar.2020.00519] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 04/02/2020] [Indexed: 12/13/2022] Open
Abstract
Stroke is the second leading cause of death after heart disease globally and cerebral ischemic stroke accounts for approximately 70% of all incident stroke cases. We selected four main compounds from a patent Chinese medicine, Qingkailing (QKL) injection, including baicalin from Scutellaria baicalensis Georgi (Huang Qin), geniposide from Gardenia jasminoides J. Ellis (Zhizi), and cholic acid and hyodeoxycholic acid from Bovis Calculus (Niuhuang) with a ratio of 4.4:0.4:3:2.6 m/m, to develop a more efficacious and safer modern Chinese medicine injection against ischemic stroke, refined QKL (RQKL). In this study, we investigated multiple targets, levels, and pathways of RQKL by using an integrative pharm\acology combining experimental validation approach. In silica study showed that RQKL may regulate PI3K-Akt, estrogen, neurotrophin, HIF-1, MAPK, Hippo, FoxO, TGF-beta, NOD-like receptor, apoptosis, NF-kappa B, Wnt, chemokine, TNF, Toll-like receptor signaling pathways against ischemic stroke. The experimental results showed that RQKL improved neurological function and prevented infract volume and blood-brain-barrier damage. RQKL inhibited microgliosis and astrogliosis, and protected neurons from ischemic/reperfusion injury. RQKL also inhibited cell apoptosis and affecting the ratio of the anti-apoptosis protein B-cell lymphoma-2 (Bcl2) and pro-apoptosis protein Bcl2-associated X protein (Bax). Western blot analysis showed that RQKL activated AKT/PI3K signaling pathway and antibody array showed RQKL inhibited inflammatory response and decreased proinflammatory factor Tnf, Il6, and Il1b, and chemokines Ccl2, Cxcl2, and Cxcl3, and increased anti-inflammatory cytokine Il10. In conclusion, RQKL protected tissue against ischemic stroke through multiple-target, multiple signals, and modulating multiple cell-types in brain. This study not only promoted our understanding of the role of RQKL against ischemic stroke, but also provided a pattern for the study of Chinese medicine combining pharmaceutical Informatics and system biology methods.
Collapse
Affiliation(s)
- Chongyang Ma
- School of Traditional Chinese Medicine, Capital Medical University, Beijing, China
| | - Xueqian Wang
- School of Traditional Chinese Medicine Department, Beijing University of Chinese Medicine, Beijing, China
| | - Tian Xu
- School of Traditional Chinese Medicine Department, Beijing University of Chinese Medicine, Beijing, China
| | - Shuang Zhang
- School of Traditional Chinese Medicine Department, Beijing University of Chinese Medicine, Beijing, China
| | - Shuling Liu
- School of Traditional Chinese Medicine Department, Beijing University of Chinese Medicine, Beijing, China
| | - Changming Zhai
- Department of Liver Disease, Guangdong Province Hospital of Traditional Chinese Medicine Zhuhai Branch, Zhuhai, China
| | - Zisong Wang
- Department of Traditional Chinese Medicine, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| | - Jie Mu
- School of Traditional Chinese Medicine Department, Beijing University of Chinese Medicine, Beijing, China
| | - Changxiang Li
- School of Traditional Chinese Medicine Department, Beijing University of Chinese Medicine, Beijing, China
| | - Fafeng Cheng
- School of Traditional Chinese Medicine Department, Beijing University of Chinese Medicine, Beijing, China
| | - Qingguo Wang
- School of Traditional Chinese Medicine Department, Beijing University of Chinese Medicine, Beijing, China
| |
Collapse
|
37
|
Liu L, Huang X, Mamitsuka H, Zhu S. HPOLabeler: improving prediction of human protein–phenotype associations by learning to rank. Bioinformatics 2020; 36:4180-4188. [DOI: 10.1093/bioinformatics/btaa284] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 04/05/2020] [Accepted: 04/30/2020] [Indexed: 12/23/2022] Open
Abstract
Abstract
Motivation
Annotating human proteins by abnormal phenotypes has become an important topic. Human Phenotype Ontology (HPO) is a standardized vocabulary of phenotypic abnormalities encountered in human diseases. As of November 2019, only <4000 proteins have been annotated with HPO. Thus, a computational approach for accurately predicting protein–HPO associations would be important, whereas no methods have outperformed a simple Naive approach in the second Critical Assessment of Functional Annotation, 2013–2014 (CAFA2).
Results
We present HPOLabeler, which is able to use a wide variety of evidence, such as protein–protein interaction (PPI) networks, Gene Ontology, InterPro, trigram frequency and HPO term frequency, in the framework of learning to rank (LTR). LTR has been proved to be powerful for solving large-scale, multi-label ranking problems in bioinformatics. Given an input protein, LTR outputs the ranked list of HPO terms from a series of input scores given to the candidate HPO terms by component learning models (logistic regression, nearest neighbor and a Naive method), which are trained from given multiple evidence. We empirically evaluate HPOLabeler extensively through mainly two experiments of cross validation and temporal validation, for which HPOLabeler significantly outperformed all component models and competing methods including the current state-of-the-art method. We further found that (i) PPI is most informative for prediction among diverse data sources and (ii) low prediction performance of temporal validation might be caused by incomplete annotation of new proteins.
Availability and implementation
http://issubmission.sjtu.edu.cn/hpolabeler/.
Contact
zhusf@fudan.edu.cn
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lizhi Liu
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing
- Shanghai Institute of Artificial Intelligence Algorithms and Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
- Bio-Med Big Data Center, Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai 200031, China
| | - Xiaodi Huang
- School of Computing and Mathematics, Charles Sturt University, Albury, NSW 2640, Australia
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan
- Department of Computer Science, Aalto University, Espoo, Finland
| | - Shanfeng Zhu
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing
- Shanghai Institute of Artificial Intelligence Algorithms and Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
- Bio-Med Big Data Center, Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Science, Chinese Academy of Sciences, Shanghai 200031, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
| |
Collapse
|
38
|
Zhao T, Hu Y, Valsdottir LR, Zang T, Peng J. Identifying drug-target interactions based on graph convolutional network and deep neural network. Brief Bioinform 2020; 22:2141-2150. [PMID: 32367110 DOI: 10.1093/bib/bbaa044] [Citation(s) in RCA: 124] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 03/05/2020] [Accepted: 03/06/2020] [Indexed: 12/21/2022] Open
Abstract
Identification of new drug-target interactions (DTIs) is an important but a time-consuming and costly step in drug discovery. In recent years, to mitigate these drawbacks, researchers have sought to identify DTIs using computational approaches. However, most existing methods construct drug networks and target networks separately, and then predict novel DTIs based on known associations between the drugs and targets without accounting for associations between drug-protein pairs (DPPs). To incorporate the associations between DPPs into DTI modeling, we built a DPP network based on multiple drugs and proteins in which DPPs are the nodes and the associations between DPPs are the edges of the network. We then propose a novel learning-based framework, 'graph convolutional network (GCN)-DTI', for DTI identification. The model first uses a graph convolutional network to learn the features for each DPP. Second, using the feature representation as an input, it uses a deep neural network to predict the final label. The results of our analysis show that the proposed framework outperforms some state-of-the-art approaches by a large margin.
Collapse
Affiliation(s)
- Tianyi Zhao
- Department of Computer Science at Harbin Institute of Technology. He currently works as a bioinformatician in Beth Israel Deaconess Medical Center
| | - Yang Hu
- Department of Life Science at Harbin Institute of Technology. His expertise is bioinformatics
| | - Linda R Valsdottir
- MS in Biology and works as a scientific writer at the Smith Center for Outcomes Research in Cardiology at Beth Israel Deaconess Medical Center in Boston, MA. Her work is focused on helping researchers communicate their findings in an effort to translate novel analytical approaches and clinical expertise into improved outcomes for patients
| | - Tianyi Zang
- School of Computer Science and Technology at Harbin Institute of Technology (HIT), China. Before joining HIT in 2009, he was a research fellow at the Department of Computer Science at University of Oxford, UK. His current research is concerned with biomedical bigdata computing and algorithms, deep-learning algorithms for network data, intelligent recommendation algorithms, and modeling and analysis methods for complex systems
| | - Jiajie Peng
- School of Computer Science at Northwestern Polytechnical University. His expertise is computational biology and machine learning. Availability and implementation: https://github.com/zty2009/GCN-DNN/
| |
Collapse
|
39
|
Rayhan F, Ahmed S, Mousavian Z, Farid DM, Shatabda S. FRnet-DTI: Deep convolutional neural network for drug-target interaction prediction. Heliyon 2020; 6:e03444. [PMID: 32154410 PMCID: PMC7052404 DOI: 10.1016/j.heliyon.2020.e03444] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 06/16/2019] [Accepted: 02/14/2020] [Indexed: 01/09/2023] Open
Abstract
The task of drug-target interaction prediction holds significant importance in pharmacology and therapeutic drug design. In this paper, we present FRnet-DTI, an auto-encoder based feature manipulation and a convolutional neural network based classifier for drug target interaction prediction. Two convolutional neural networks are proposed: FRnet-Encode and FRnet-Predict. Here, one model is used for feature manipulation and the other one for classification. Using the first method FRnet-Encode, we generate 4096 features for each of the instances in each of the datasets and use the second method, FRnet-Predict, to identify interaction probability employing those features. We have tested our method on four gold standard datasets extensively used by other researchers. Experimental results shows that our method significantly improves over the state-of-the-art method on three out of four drug-target interaction gold standard datasets on both area under curve for Receiver Operating Characteristic (auROC) and area under Precision Recall curve (auPR) metric. We also introduce twenty new potential drug-target pairs for interaction based on high prediction scores. The source codes and implementation details of our methods are available from https://github.com/farshidrayhanuiu/FRnet-DTI/ and also readily available to use as an web application from http://farshidrayhan.pythonanywhere.com/FRnet-DTI/.
Collapse
Affiliation(s)
- Farshid Rayhan
- Department of Computer Science and Engineering, United International University, Plot 2, United City, Madani Avenue, Satarkul, Badda, Dhaka-1212, Bangladesh
| | - Sajid Ahmed
- Department of Computer Science and Engineering, United International University, Plot 2, United City, Madani Avenue, Satarkul, Badda, Dhaka-1212, Bangladesh
| | - Zaynab Mousavian
- School of Mathematics, Statistics, and Computer Science, College of Science, University of Tehran, Tehran, Iran
| | - Dewan Md Farid
- Department of Computer Science and Engineering, United International University, Plot 2, United City, Madani Avenue, Satarkul, Badda, Dhaka-1212, Bangladesh
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Plot 2, United City, Madani Avenue, Satarkul, Badda, Dhaka-1212, Bangladesh
| |
Collapse
|
40
|
Ru X, Wang L, Li L, Ding H, Ye X, Zou Q. Exploration of the correlation between GPCRs and drugs based on a learning to rank algorithm. Comput Biol Med 2020; 119:103660. [PMID: 32090901 DOI: 10.1016/j.compbiomed.2020.103660] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 02/04/2020] [Accepted: 02/12/2020] [Indexed: 02/01/2023]
Abstract
Exploring the protein - drug correlation can not only solve the problem of selecting candidate compounds but also solve related problems such as drug redirection and finding potential drug targets. Therefore, many researchers have proposed different machine learning methods for prediction of protein-drug correlations. However, many existing models simply divide the protein-drug relationship into related or irrelevant categories and do not deeply explore the most relevant target (or drug) for a given drug (or target). In order to solve this problem, this paper applies the ranking concept to the prediction of the GPCR (G Protein-Coupled Receptors)-drug correlation. This study uses two different types of data sets to explore candidate compound and potential target problems, and both sets achieved good results. In addition, this study also found that the family to which a protein belongs is not an inherent factor that affects the ranking of GPCR-drug correlations; however, if the drug affects other family members of the protein, then the protein is likely to be a potential target of the drug. This study showed that the learning to rank algorithm is a good tool for exploring protein-drug correlations.
Collapse
Affiliation(s)
- Xiaoqing Ru
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Lida Wang
- Scientific Research Department, Heilongjiang Agricultural Recalmation General Hospital, Harbin, China.
| | - Lihong Li
- School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba Science City, Japan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
41
|
Playe B, Stoven V. Evaluation of deep and shallow learning methods in chemogenomics for the prediction of drugs specificity. J Cheminform 2020; 12:11. [PMID: 33431042 PMCID: PMC7011501 DOI: 10.1186/s13321-020-0413-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 01/27/2020] [Indexed: 01/09/2023] Open
Abstract
Chemogenomics, also called proteochemometrics, covers a range of computational methods that can be used to predict protein–ligand interactions at large scales in the protein and chemical spaces. They differ from more classical ligand-based methods (also called QSAR) that predict ligands for a given protein receptor. In the context of drug discovery process, chemogenomics allows to tackle the question of predicting off-target proteins for drug candidates, one of the main causes of undesirable side-effects and failure within drugs development processes. The present study compares shallow and deep machine-learning approaches for chemogenomics, and explores data augmentation techniques for deep learning algorithms in chemogenomics. Shallow machine-learning algorithms rely on expert-based chemical and protein descriptors, while recent developments in deep learning algorithms enable to learn abstract numerical representations of molecular graphs and protein sequences, in order to optimise the performance of the prediction task. We first propose a formulation of chemogenomics with deep learning, called the chemogenomic neural network (CN), as a feed-forward neural network taking as input the combination of molecule and protein representations learnt by molecular graph and protein sequence encoders. We show that, on large datasets, the deep learning CN model outperforms state-of-the-art shallow methods, and competes with deep methods with expert-based descriptors. However, on small datasets, shallow methods present better prediction performance than deep learning methods. Then, we evaluate data augmentation techniques, namely multi-view and transfer learning, to improve the prediction performance of the chemogenomic neural network. We conclude that a promising research direction is to integrate heterogeneous sources of data such as auxiliary tasks for which large datasets are available, or independently, multiple molecule and protein attribute views.
Collapse
Affiliation(s)
- Benoit Playe
- Center for Computational Biology, Mines ParisTech, PSL Research University, 60 Bd Saint-Michel, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75248, Paris, France
| | - Veronique Stoven
- Center for Computational Biology, Mines ParisTech, PSL Research University, 60 Bd Saint-Michel, 75006, Paris, France. .,Institut Curie, 75248, Paris, France. .,INSERM U900, 75248, Paris, France.
| |
Collapse
|
42
|
Pliakos K, Vens C. Drug-target interaction prediction with tree-ensemble learning and output space reconstruction. BMC Bioinformatics 2020; 21:49. [PMID: 32033537 PMCID: PMC7006075 DOI: 10.1186/s12859-020-3379-z] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 01/21/2020] [Indexed: 12/21/2022] Open
Abstract
Background Computational prediction of drug-target interactions (DTI) is vital for drug discovery. The experimental identification of interactions between drugs and target proteins is very onerous. Modern technologies have mitigated the problem, leveraging the development of new drugs. However, drug development remains extremely expensive and time consuming. Therefore, in silico DTI predictions based on machine learning can alleviate the burdensome task of drug development. Many machine learning approaches have been proposed over the years for DTI prediction. Nevertheless, prediction accuracy and efficiency are persisting problems that still need to be tackled. Here, we propose a new learning method which addresses DTI prediction as a multi-output prediction task by learning ensembles of multi-output bi-clustering trees (eBICT) on reconstructed networks. In our setting, the nodes of a DTI network (drugs and proteins) are represented by features (background information). The interactions between the nodes of a DTI network are modeled as an interaction matrix and compose the output space in our problem. The proposed approach integrates background information from both drug and target protein spaces into the same global network framework. Results We performed an empirical evaluation, comparing the proposed approach to state of the art DTI prediction methods and demonstrated the effectiveness of the proposed approach in different prediction settings. For evaluation purposes, we used several benchmark datasets that represent drug-protein networks. We show that output space reconstruction can boost the predictive performance of tree-ensemble learning methods, yielding more accurate DTI predictions. Conclusions We proposed a new DTI prediction method where bi-clustering trees are built on reconstructed networks. Building tree-ensemble learning models with output space reconstruction leads to superior prediction results, while preserving the advantages of tree-ensembles, such as scalability, interpretability and inductive setting.
Collapse
Affiliation(s)
- Konstantinos Pliakos
- KU Leuven, Campus KULAK, Faculty of Medicine, Kortrijk, Belgium. .,ITEC, imec research group at KU Leuven, Kortrijk, Belgium.
| | - Celine Vens
- KU Leuven, Campus KULAK, Faculty of Medicine, Kortrijk, Belgium.,ITEC, imec research group at KU Leuven, Kortrijk, Belgium
| |
Collapse
|
43
|
Profiling the Protein Targets of Unmodified Bio‐Active Molecules with Drug Affinity Responsive Target Stability and Liquid Chromatography/Tandem Mass Spectrometry. Proteomics 2020; 20:e1900325. [DOI: 10.1002/pmic.201900325] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Revised: 11/28/2019] [Indexed: 12/17/2022]
|
44
|
Bagherian M, Sabeti E, Wang K, Sartor MA, Nikolovska-Coleska Z, Najarian K. Machine learning approaches and databases for prediction of drug-target interaction: a survey paper. Brief Bioinform 2020; 22:247-269. [PMID: 31950972 PMCID: PMC7820849 DOI: 10.1093/bib/bbz157] [Citation(s) in RCA: 172] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Revised: 11/01/2019] [Accepted: 11/07/2019] [Indexed: 12/12/2022] Open
Abstract
The task of predicting the interactions between drugs and targets plays a key role in the process of drug discovery. There is a need to develop novel and efficient prediction approaches in order to avoid costly and laborious yet not-always-deterministic experiments to determine drug–target interactions (DTIs) by experiments alone. These approaches should be capable of identifying the potential DTIs in a timely manner. In this article, we describe the data required for the task of DTI prediction followed by a comprehensive catalog consisting of machine learning methods and databases, which have been proposed and utilized to predict DTIs. The advantages and disadvantages of each set of methods are also briefly discussed. Lastly, the challenges one may face in prediction of DTI using machine learning approaches are highlighted and we conclude by shedding some lights on important future research directions.
Collapse
Affiliation(s)
- Maryam Bagherian
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Elyas Sabeti
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Kai Wang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Maureen A Sartor
- Department of Pathology, University of Michigan, Ann Arbor, MI, 48109, USA
| | | | - Kayvan Najarian
- Department of Electrical Engineering and Computer Science, College of Engineering, University of Michigan, Ann Arbor, MI, 48109, USA
| |
Collapse
|
45
|
Zong N, Wong RSN, Yu Y, Wen A, Huang M, Li N. Drug-target prediction utilizing heterogeneous bio-linked network embeddings. Brief Bioinform 2019; 22:568-580. [PMID: 31885036 DOI: 10.1093/bib/bbz147] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Revised: 10/11/2019] [Accepted: 10/29/2019] [Indexed: 11/12/2022] Open
Abstract
To enable modularization for network-based prediction, we conducted a review of known methods conducting the various subtasks corresponding to the creation of a drug-target prediction framework and associated benchmarking to determine the highest-performing approaches. Accordingly, our contributions are as follows: (i) from a network perspective, we benchmarked the association-mining performance of 32 distinct subnetwork permutations, arranging based on a comprehensive heterogeneous biomedical network derived from 12 repositories; (ii) from a methodological perspective, we identified the best prediction strategy based on a review of combinations of the components with off-the-shelf classification, inference methods and graph embedding methods. Our benchmarking strategy consisted of two series of experiments, totaling six distinct tasks from the two perspectives, to determine the best prediction. We demonstrated that the proposed method outperformed the existing network-based methods as well as how combinatorial networks and methodologies can influence the prediction. In addition, we conducted disease-specific prediction tasks for 20 distinct diseases and showed the reliability of the strategy in predicting 75 novel drug-target associations as shown by a validation utilizing DrugBank 5.1.0. In particular, we revealed a connection of the network topology with the biological explanations for predicting the diseases, 'Asthma' 'Hypertension', and 'Dementia'. The results of our benchmarking produced knowledge on a network-based prediction framework with the modularization of the feature selection and association prediction, which can be easily adapted and extended to other feature sources or machine learning algorithms as well as a performed baseline to comprehensively evaluate the utility of incorporating varying data sources.
Collapse
Affiliation(s)
- Nansu Zong
- Department of Health Sciences Research, Mayo Clinic, 200 First St. SW, Rochester, MN 55905, USA
| | - Rachael Sze Nga Wong
- Department of Bioengineering, UC San Diego, 9500 Gilman Drive, San Diego, CA 92093-0412, USA
| | - Yue Yu
- Department of Health Sciences Research, Mayo Clinic, 200 First St. SW, Rochester, MN 55905, USA
| | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, 200 First St. SW, Rochester, MN 55905, USA
| | - Ming Huang
- Department of Health Sciences Research, Mayo Clinic, 200 First St. SW, Rochester, MN 55905, USA
| | - Ning Li
- Scripps Research Institute, 10550 North Torrey Pines Road, San Diego, CA, 92037, USA
| |
Collapse
|
46
|
Yan C, Duan G, Wu FX, Wang J. IILLS: predicting virus-receptor interactions based on similarity and semi-supervised learning. BMC Bioinformatics 2019; 20:651. [PMID: 31881820 PMCID: PMC6933616 DOI: 10.1186/s12859-019-3278-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Background Viral infectious diseases are the serious threat for human health. The receptor-binding is the first step for the viral infection of hosts. To more effectively treat human viral infectious diseases, the hidden virus-receptor interactions must be discovered. However, current computational methods for predicting virus-receptor interactions are limited. Result In this study, we propose a new computational method (IILLS) to predict virus-receptor interactions based on Initial Interaction scores method via the neighbors and the Laplacian regularized Least Square algorithm. IILLS integrates the known virus-receptor interactions and amino acid sequences of receptors. The similarity of viruses is calculated by the Gaussian Interaction Profile (GIP) kernel. On the other hand, we also compute the receptor GIP similarity and the receptor sequence similarity. Then the sequence similarity is used as the final similarity of receptors according to the prediction results. The 10-fold cross validation (10CV) and leave one out cross validation (LOOCV) are used to assess the prediction performance of our method. We also compare our method with other three competing methods (BRWH, LapRLS, CMF). Conlusion The experiment results show that IILLS achieves the AUC values of 0.8675 and 0.9061 with the 10-fold cross validation and leave-one-out cross validation (LOOCV), respectively, which illustrates that IILLS is superior to the competing methods. In addition, the case studies also further indicate that the IILLS method is effective for the virus-receptor interaction prediction.
Collapse
Affiliation(s)
- Cheng Yan
- School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China.,School of Computer and Information,Qiannan Normal University for Nationalities, Longshan Road, DuYun, 558000, China
| | - Guihua Duan
- School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China.
| | - Fang-Xiang Wu
- Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9, Canada
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China
| |
Collapse
|
47
|
Chu Y, Kaushik AC, Wang X, Wang W, Zhang Y, Shan X, Salahub DR, Xiong Y, Wei DQ. DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features. Brief Bioinform 2019; 22:451-462. [PMID: 31885041 DOI: 10.1093/bib/bbz152] [Citation(s) in RCA: 101] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2019] [Revised: 11/01/2019] [Accepted: 11/04/2019] [Indexed: 12/18/2022] Open
Abstract
Drug-target interactions (DTIs) play a crucial role in target-based drug discovery and development. Computational prediction of DTIs can effectively complement experimental wet-lab techniques for the identification of DTIs, which are typically time- and resource-consuming. However, the performances of the current DTI prediction approaches suffer from a problem of low precision and high false-positive rate. In this study, we aim to develop a novel DTI prediction method for improving the prediction performance based on a cascade deep forest (CDF) model, named DTI-CDF, with multiple similarity-based features between drugs and the similarity-based features between target proteins extracted from the heterogeneous graph, which contains known DTIs. In the experiments, we built five replicates of 10-fold cross-validation under three different experimental settings of data sets, namely, corresponding DTI values of certain drugs (SD), targets (ST), or drug-target pairs (SP) in the training sets are missed but existed in the test sets. The experimental results demonstrate that our proposed approach DTI-CDF achieves a significantly higher performance than that of the traditional ensemble learning-based methods such as random forest and XGBoost, deep neural network, and the state-of-the-art methods such as DDR. Furthermore, there are 1352 newly predicted DTIs which are proved to be correct by KEGG and DrugBank databases. The data sets and source code are freely available at https://github.com//a96123155/DTI-CDF.
Collapse
Affiliation(s)
- Yanyi Chu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| | | | - Xiangeng Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| | - Wei Wang
- Mathematical Sciences, Shanghai Jiao Tong University
| | - Yufang Zhang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| | | | | | - Yi Xiong
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| | - Dong-Qing Wei
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| |
Collapse
|
48
|
Zhang W, Lin W, Zhang D, Wang S, Shi J, Niu Y. Recent Advances in the Machine Learning-Based Drug-Target Interaction Prediction. Curr Drug Metab 2019; 20:194-202. [PMID: 30129407 DOI: 10.2174/1389200219666180821094047] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Revised: 01/18/2018] [Accepted: 03/19/2018] [Indexed: 12/28/2022]
Abstract
BACKGROUND The identification of drug-target interactions is a crucial issue in drug discovery. In recent years, researchers have made great efforts on the drug-target interaction predictions, and developed databases, software and computational methods. RESULTS In the paper, we review the recent advances in machine learning-based drug-target interaction prediction. First, we briefly introduce the datasets and data, and summarize features for drugs and targets which can be extracted from different data. Since drug-drug similarity and target-target similarity are important for many machine learning prediction models, we introduce how to calculate similarities based on data or features. Different machine learningbased drug-target interaction prediction methods can be proposed by using different features or information. Thus, we summarize, analyze and compare different machine learning-based prediction methods. CONCLUSION This study provides the guide to the development of computational methods for the drug-target interaction prediction.
Collapse
Affiliation(s)
- Wen Zhang
- School of Computer Science, Wuhan University, Wuhan 430072, China
| | - Weiran Lin
- School of Computer Science, Wuhan University, Wuhan 430072, China
| | - Ding Zhang
- School of Computer Science, Wuhan University, Wuhan 430072, China
| | - Siman Wang
- School of Computer Science, Wuhan University, Wuhan 430072, China
| | - Jingwen Shi
- School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
| | - Yanqing Niu
- School of Mathematics and Statistics, South-Central University for Nationalities, Wuhan 430074, China
| |
Collapse
|
49
|
Wang X, Zhu X, Ye M, Wang Y, Li CD, Xiong Y, Wei DQ. STS-NLSP: A Network-Based Label Space Partition Method for Predicting the Specificity of Membrane Transporter Substrates Using a Hybrid Feature of Structural and Semantic Similarity. Front Bioeng Biotechnol 2019; 7:306. [PMID: 31781551 PMCID: PMC6851049 DOI: 10.3389/fbioe.2019.00306] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 10/17/2019] [Indexed: 12/11/2022] Open
Abstract
Membrane transport proteins play crucial roles in the pharmacokinetics of substrate drugs, the drug resistance in cancer and are vital to the process of drug discovery, development and anti-cancer therapeutics. However, experimental methods to profile a substrate drug against a panel of transporters to determine its specificity are labor intensive and time consuming. In this article, we aim to develop an in silico multi-label classification approach to predict whether a substrate can specifically recognize one of the 13 categories of drug transporters ranging from ATP-binding cassette to solute carrier families using both structural fingerprints and chemical ontologies information of substrates. The data-driven network-based label space partition (NLSP) method was utilized to construct the model based on a hybrid of similarity-based feature by the integration of 2D fingerprint and semantic similarity. This method builds predictors for each label cluster (possibly intersecting) detected by community detection algorithms and takes union of label sets for a compound as final prediction. NLSP lies into the ensembles of multi-label classifier category in multi-label learning field. We utilized Cramér's V statistics to quantify the label correlations and depicted them via a heatmap. The jackknife tests and iterative stratification based cross-validation method were adopted on a benchmark dataset to evaluate the prediction performance of the proposed models both in multi-label and label-wise manner. Compared with other powerful multi-label methods, ML-kNN, MTSVM, and RAkELd, our multi-label classification model of NLPS-RF (random forest-based NLSP) has proven to be a feasible and effective model, and performed satisfactorily in the predictive task of transporter-substrate specificity. The idea behind NLSP method is intriguing and the power of NLSP remains to be explored for the multi-label learning problems in bioinformatics. The benchmark dataset, intermediate results and python code which can fully reproduce our experiments and results are available at https://github.com/dqwei-lab/STS.
Collapse
Affiliation(s)
- Xiangeng Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, China
| | - Mingzhi Ye
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Cheng-Dong Li
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| |
Collapse
|
50
|
Pliakos K, Vens C. Network inference with ensembles of bi-clustering trees. BMC Bioinformatics 2019; 20:525. [PMID: 31660848 PMCID: PMC6819564 DOI: 10.1186/s12859-019-3104-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Accepted: 09/20/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Network inference is crucial for biomedicine and systems biology. Biological entities and their associations are often modeled as interaction networks. Examples include drug protein interaction or gene regulatory networks. Studying and elucidating such networks can lead to the comprehension of complex biological processes. However, usually we have only partial knowledge of those networks and the experimental identification of all the existing associations between biological entities is very time consuming and particularly expensive. Many computational approaches have been proposed over the years for network inference, nonetheless, efficiency and accuracy are still persisting open problems. Here, we propose bi-clustering tree ensembles as a new machine learning method for network inference, extending the traditional tree-ensemble models to the global network setting. The proposed approach addresses the network inference problem as a multi-label classification task. More specifically, the nodes of a network (e.g., drugs or proteins in a drug-protein interaction network) are modelled as samples described by features (e.g., chemical structure similarities or protein sequence similarities). The labels in our setting represent the presence or absence of links connecting the nodes of the interaction network (e.g., drug-protein interactions in a drug-protein interaction network). RESULTS We extended traditional tree-ensemble methods, such as extremely randomized trees (ERT) and random forests (RF) to ensembles of bi-clustering trees, integrating background information from both node sets of a heterogeneous network into the same learning framework. We performed an empirical evaluation, comparing the proposed approach to currently used tree-ensemble based approaches as well as other approaches from the literature. We demonstrated the effectiveness of our approach in different interaction prediction (network inference) settings. For evaluation purposes, we used several benchmark datasets that represent drug-protein and gene regulatory networks. We also applied our proposed method to two versions of a chemical-protein association network extracted from the STITCH database, demonstrating the potential of our model in predicting non-reported interactions. CONCLUSIONS Bi-clustering trees outperform existing tree-based strategies as well as machine learning methods based on other algorithms. Since our approach is based on tree-ensembles it inherits the advantages of tree-ensemble learning, such as handling of missing values, scalability and interpretability.
Collapse
Affiliation(s)
- Konstantinos Pliakos
- KU Leuven, Campus KULAK, Department of Public Health and Primary Care, Faculty of Medicine, Kortrijk, Belgium. .,ITEC, imec research group at KU Leuven, Kortrijk, Belgium.
| | - Celine Vens
- KU Leuven, Campus KULAK, Department of Public Health and Primary Care, Faculty of Medicine, Kortrijk, Belgium.,ITEC, imec research group at KU Leuven, Kortrijk, Belgium
| |
Collapse
|