1
|
İhtiyar MN, Özgür A. Generative language models on nucleotide sequences of human genes. Sci Rep 2024; 14:22204. [PMID: 39333252 PMCID: PMC11437190 DOI: 10.1038/s41598-024-72512-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 09/09/2024] [Indexed: 09/29/2024] Open
Abstract
Language models, especially transformer-based ones, have achieved colossal success in natural language processing. To be precise, studies like BERT for natural language understanding and works like GPT-3 for natural language generation are very important. If we consider DNA sequences as a text written with an alphabet of four letters representing the nucleotides, they are similar in structure to natural languages. This similarity has led to the development of discriminative language models such as DNABERT in the field of DNA-related bioinformatics. To our knowledge, however, the generative side of the coin is still largely unexplored. Therefore, we have focused on the development of an autoregressive generative language model such as GPT-3 for DNA sequences. Since working with whole DNA sequences is challenging without extensive computational resources, we decided to conduct our study on a smaller scale and focus on nucleotide sequences of human genes, i.e. unique parts of DNA with specific functions, rather than the whole DNA. This decision has not significantly changed the structure of the problem, as both DNA and genes can be considered as 1D sequences consisting of four different nucleotides without losing much information and without oversimplification. First of all, we systematically studied an almost entirely unexplored problem and observed that recurrent neural networks (RNNs) perform best, while simple techniques such as N-grams are also promising. Another beneficial point was learning how to work with generative models on languages we do not understand, unlike natural languages. The importance of using real-world tasks beyond classical metrics such as perplexity was noted. In addition, we examined whether the data-hungry nature of these models can be altered by selecting a language with minimal vocabulary size, four due to four different types of nucleotides. The reason for reviewing this was that choosing such a language might make the problem easier. However, in this study, we found that this did not change the amount of data required very much.
Collapse
Affiliation(s)
- Musa Nuri İhtiyar
- Department of Computer Engineering, Boğaziçi University, 34342, Istanbul, Turkey.
| | - Arzucan Özgür
- Department of Computer Engineering, Boğaziçi University, 34342, Istanbul, Turkey.
| |
Collapse
|
2
|
Guichaoua G, Pinel P, Hoffmann B, Azencott CA, Stoven V. Drug-Target Interactions Prediction at Scale: The Komet Algorithm with the LCIdb Dataset. J Chem Inf Model 2024; 64:6938-6956. [PMID: 39237105 PMCID: PMC11423346 DOI: 10.1021/acs.jcim.4c00422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]
Abstract
Drug-target interactions (DTIs) prediction algorithms are used at various stages of the drug discovery process. In this context, specific problems such as deorphanization of a new therapeutic target or target identification of a drug candidate arising from phenotypic screens require large-scale predictions across the protein and molecule spaces. DTI prediction heavily relies on supervised learning algorithms that use known DTIs to learn associations between molecule and protein features, allowing for the prediction of new interactions based on learned patterns. The algorithms must be broadly applicable to enable reliable predictions, even in regions of the protein or molecule spaces where data may be scarce. In this paper, we address two key challenges to fulfill these goals: building large, high-quality training datasets and designing prediction methods that can scale, in order to be trained on such large datasets. First, we introduce LCIdb, a curated, large-sized dataset of DTIs, offering extensive coverage of both the molecule and druggable protein spaces. Notably, LCIdb contains a much higher number of molecules than publicly available benchmarks, expanding coverage of the molecule space. Second, we propose Komet (Kronecker Optimized METhod), a DTI prediction pipeline designed for scalability without compromising performance. Komet leverages a three-step framework, incorporating efficient computation choices tailored for large datasets and involving the Nyström approximation. Specifically, Komet employs a Kronecker interaction module for (molecule, protein) pairs, which efficiently captures determinants in DTIs, and whose structure allows for reduced computational complexity and quasi-Newton optimization, ensuring that the model can handle large training sets, without compromising on performance. Our method is implemented in open-source software, leveraging GPU parallel computation for efficiency. We demonstrate the interest of our pipeline on various datasets, showing that Komet displays superior scalability and prediction performance compared to state-of-the-art deep learning approaches. Additionally, we illustrate the generalization properties of Komet by showing its performance on an external dataset, and on the publicly available L H benchmark designed for scaffold hopping problems. Komet is available open source at https://komet.readthedocs.io and all datasets, including LCIdb, can be found at https://zenodo.org/records/10731712.
Collapse
Affiliation(s)
- Gwenn Guichaoua
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
| | - Philippe Pinel
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
- Iktos SAS, 75017 Paris, France
| | | | - Chloé-Agathe Azencott
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
| | - Véronique Stoven
- Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France
- Institut Curie, Université PSL, 75005 Paris, France
- INSERM U900, 75005 Paris, France
| |
Collapse
|
3
|
Ohnuki Y, Akiyama M, Sakakibara Y. Deep learning of multimodal networks with topological regularization for drug repositioning. J Cheminform 2024; 16:103. [PMID: 39180095 PMCID: PMC11342530 DOI: 10.1186/s13321-024-00897-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Accepted: 08/12/2024] [Indexed: 08/26/2024] Open
Abstract
MOTIVATION Computational techniques for drug-disease prediction are essential in enhancing drug discovery and repositioning. While many methods utilize multimodal networks from various biological databases, few integrate comprehensive multi-omics data, including transcriptomes, proteomes, and metabolomes. We introduce STRGNN, a novel graph deep learning approach that predicts drug-disease relationships using extensive multimodal networks comprising proteins, RNAs, metabolites, and compounds. We have constructed a detailed dataset incorporating multi-omics data and developed a learning algorithm with topological regularization. This algorithm selectively leverages informative modalities while filtering out redundancies. RESULTS STRGNN demonstrates superior accuracy compared to existing methods and has identified several novel drug effects, corroborating existing literature. STRGNN emerges as a powerful tool for drug prediction and discovery. The source code for STRGNN, along with the dataset for performance evaluation, is available at https://github.com/yuto-ohnuki/STRGNN.git .
Collapse
Affiliation(s)
- Yuto Ohnuki
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, 223-8522, Japan
| | - Manato Akiyama
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, 223-8522, Japan
| | - Yasubumi Sakakibara
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, 223-8522, Japan.
| |
Collapse
|
4
|
Chen Y, Liang X, Du W, Liang Y, Wong G, Chen L. Drug-Target Interaction Prediction Based on an Interactive Inference Network. Int J Mol Sci 2024; 25:7753. [PMID: 39062996 PMCID: PMC11277210 DOI: 10.3390/ijms25147753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 06/25/2024] [Accepted: 06/27/2024] [Indexed: 07/28/2024] Open
Abstract
Drug-target interactions underlie the actions of chemical substances in medicine. Moreover, drug repurposing can expand use profiles while reducing costs and development time by exploiting potential multi-functional pharmacological properties based upon additional target interactions. Nonetheless, drug repurposing relies on the accurate identification and validation of drug-target interactions (DTIs). In this study, a novel drug-target interaction prediction model was developed. The model, based on an interactive inference network, contains embedding, encoding, interaction, feature extraction, and output layers. In addition, this study used Morgan and PubChem molecular fingerprints as additional information for drug encoding. The interaction layer in our model simulates the drug-target interaction process, which assists in understanding the interaction by representing the interaction space. Our method achieves high levels of predictive performance, as well as interpretability of drug-target interactions. Additionally, we predicted and validated 22 Alzheimer's disease-related targets, suggesting our model is robust and effective and thus may be beneficial for drug repurposing.
Collapse
Affiliation(s)
- Yuqi Chen
- College of Mathematics and Computer, Shantou University, Shantou 515063, China; (Y.C.); (X.L.)
| | - Xiaomin Liang
- College of Mathematics and Computer, Shantou University, Shantou 515063, China; (Y.C.); (X.L.)
| | - Wei Du
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (W.D.); (Y.L.)
| | - Yanchun Liang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (W.D.); (Y.L.)
| | - Garry Wong
- Faculty of Health Sciences, University of Macau, Taipa, Macau SAR 999078, China;
| | - Liang Chen
- College of Mathematics and Computer, Shantou University, Shantou 515063, China; (Y.C.); (X.L.)
| |
Collapse
|
5
|
Li Y, Liang W, Peng L, Zhang D, Yang C, Li KC. Predicting Drug-Target Interactions Via Dual-Stream Graph Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:948-958. [PMID: 36074878 DOI: 10.1109/tcbb.2022.3204188] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Drug target interaction prediction is a crucial stage in drug discovery. However, brute-force search over a compound database is financially infeasible. We have witnessed the increasing measured drug-target interactions records in recent years, and the rich drug/protein-related information allows the usage of graph machine learning. Despite the advances in deep learning-enabled drug-target interaction, there are still open challenges: (1) rich and complex relationship between drugs and proteins can be explored; (2) the intermediate node is not calibrated in the heterogeneous graph. To tackle with above issues, this paper proposed a framework named DSG-DTI. Specifically, DSG-DTI has the heterogeneous graph autoencoder and heterogeneous attention network-based Matrix Completion. Our framework ensures that the known types of nodes (e.g., drug, target, side effects, diseases) are precisely embedded into high-dimensional space with our pretraining skills. Also, the attention-based heterogeneous graph-based matrix completion achieves highly competitive results via effective long-range dependencies extraction. We verify our model on two public benchmarks. The result of two publicly available benchmark application programs show that the proposed scheme effectively predicts drug-target interactions and can generalize to newly registered drugs and targets with slight performance degradation, outperforming the best accuracy compared with other baselines.
Collapse
|
6
|
Chen J, Gu Z, Lai L, Pei J. In silico protein function prediction: the rise of machine learning-based approaches. MEDICAL REVIEW (2021) 2023; 3:487-510. [PMID: 38282798 PMCID: PMC10808870 DOI: 10.1515/mr-2023-0038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 10/11/2023] [Indexed: 01/30/2024]
Abstract
Proteins function as integral actors in essential life processes, rendering the realm of protein research a fundamental domain that possesses the potential to propel advancements in pharmaceuticals and disease investigation. Within the context of protein research, an imperious demand arises to uncover protein functionalities and untangle intricate mechanistic underpinnings. Due to the exorbitant costs and limited throughput inherent in experimental investigations, computational models offer a promising alternative to accelerate protein function annotation. In recent years, protein pre-training models have exhibited noteworthy advancement across multiple prediction tasks. This advancement highlights a notable prospect for effectively tackling the intricate downstream task associated with protein function prediction. In this review, we elucidate the historical evolution and research paradigms of computational methods for predicting protein function. Subsequently, we summarize the progress in protein and molecule representation as well as feature extraction techniques. Furthermore, we assess the performance of machine learning-based algorithms across various objectives in protein function prediction, thereby offering a comprehensive perspective on the progress within this field.
Collapse
Affiliation(s)
- Jiaxiao Chen
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Zhonghui Gu
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing, China
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences (2021RU014), Beijing, China
| | - Jianfeng Pei
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences (2021RU014), Beijing, China
| |
Collapse
|
7
|
Hu L, Fu C, Ren Z, Cai Y, Yang J, Xu S, Xu W, Tang D. SSELM-neg: spherical search-based extreme learning machine for drug-target interaction prediction. BMC Bioinformatics 2023; 24:38. [PMID: 36737694 PMCID: PMC9896467 DOI: 10.1186/s12859-023-05153-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 01/18/2023] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND The experimental verification of a drug discovery process is expensive and time-consuming. Therefore, efficiently and effectively identifying drug-target interactions (DTIs) has been the focus of research. At present, many machine learning algorithms are used for predicting DTIs. The key idea is to train the classifier using an existing DTI to predict a new or unknown DTI. However, there are various challenges, such as class imbalance and the parameter optimization of many classifiers, that need to be solved before an optimal DTI model is developed. METHODS In this study, we propose a framework called SSELM-neg for DTI prediction, in which we use a screening approach to choose high-quality negative samples and a spherical search approach to optimize the parameters of the extreme learning machine. RESULTS The results demonstrated that the proposed technique outperformed other state-of-the-art methods in 10-fold cross-validation experiments in terms of the area under the receiver operating characteristic curve (0.986, 0.993, 0.988, and 0.969) and AUPR (0.982, 0.991, 0.982, and 0.946) for the enzyme dataset, G-protein coupled receptor dataset, ion channel dataset, and nuclear receptor dataset, respectively. CONCLUSION The screening approach produced high-quality negative samples with the same number of positive samples, which solved the class imbalance problem. We optimized an extreme learning machine using a spherical search approach to identify DTIs. Therefore, our models performed better than other state-of-the-art methods.
Collapse
Affiliation(s)
- Lingzhi Hu
- grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China
| | - Chengzhou Fu
- grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China ,Guangdong Province Precise Medicine Big Data of Traditional Chinese Medicine Engineering Technology Research Center, Guangzhou, People’s Republic of China
| | - Zhonglu Ren
- grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China
| | - Yongming Cai
- grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China ,Guangdong Province Precise Medicine Big Data of Traditional Chinese Medicine Engineering Technology Research Center, Guangzhou, People’s Republic of China
| | - Jin Yang
- grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China ,Guangdong Province Precise Medicine Big Data of Traditional Chinese Medicine Engineering Technology Research Center, Guangzhou, People’s Republic of China
| | - Siwen Xu
- grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China
| | - Wenhua Xu
- grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China
| | - Deyu Tang
- grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China ,grid.79703.3a0000 0004 1764 3838School of Computer Science and Engineering, South China University of Technology, Guangzhou, People’s Republic of China ,Guangdong Province Precise Medicine Big Data of Traditional Chinese Medicine Engineering Technology Research Center, Guangzhou, People’s Republic of China
| |
Collapse
|
8
|
Li T, Zhao XM, Li L. Co-VAE: Drug-Target Binding Affinity Prediction by Co-Regularized Variational Autoencoders. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:8861-8873. [PMID: 34652996 DOI: 10.1109/tpami.2021.3120428] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Identifying drug-target interactions has been a key step in drug discovery. Many computational methods have been proposed to directly determine whether drugs and targets can interact or not. Drug-target binding affinity is another type of data which could show the strength of the binding interaction between a drug and a target. However, it is more challenging to predict drug-target binding affinity, and thus a very few studies follow this line. In our work, we propose a novel co-regularized variational autoencoders (Co-VAE) to predict drug-target binding affinity based on drug structures and target sequences. The Co-VAE model consists of two VAEs for generating drug SMILES strings and target sequences, respectively, and a co-regularization part for generating the binding affinities. We theoretically prove that the Co-VAE model is to maximize the lower bound of the joint likelihood of drug, protein and their affinity. The Co-VAE could predict drug-target affinity and generate new drugs which share similar targets with the input drugs. The experimental results on two datasets show that the Co-VAE could predict drug-target affinity better than existing affinity prediction methods such as DeepDTA and DeepAffinity, and could generate more new valid drugs than existing methods such as GAN and VAE.
Collapse
|
9
|
Li Y, Zhang C, Ma X, Yang L, Ren H. Identification of the potential mechanism of Radix pueraria in colon cancer based on network pharmacology. Sci Rep 2022; 12:3765. [PMID: 35260672 PMCID: PMC8904787 DOI: 10.1038/s41598-022-07815-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 02/24/2022] [Indexed: 11/09/2022] Open
Abstract
Radix Puerariae (RP), a dry root of Pueraria lobata (Willd.) Ohwi, is used to treat a variety of diseases, including cancer. Several in vitro and in vivo studies have demonstrated the efficacy of RP in the treatment of colon cancer (CC). However, the biological mechanism of RP in the treatment of colon cancer remains unclear. In this study, the active component of RP and its potential molecular mechanism against CC were studied by network pharmacology and enrichment analysis. The methods adopted included screening active ingredients of Chinese medicine, predicting target genes of Chinese medicine and disease, constructing of a protein interaction network, and conducting GO and KEGG enrichment analysis. Finally, the results of network pharmacology were further validated by molecular docking experiments and cell experiments. Eight active constituents and 14 potential protein targets were screened from RP, including EGFR, JAK2 and SRC. The biological mechanism of RP against CC was analysed by studying the relationship between active components, targets, and enrichment pathways. These findings provide a basis for understanding the clinical application of RP in CC.
Collapse
Affiliation(s)
- Yi Li
- Department of Clinical Laboratory, The First Affiliated Hospital of Zhengzhou University, No. 1 Jianshe Road, Zhengzhou, 450052, People's Republic of China
| | - Chunli Zhang
- Department of General Surgery, The People's Hospital of Zhengzhou, Henan, China
| | - Xiaohan Ma
- The Third Affiliated Hospital of Zhengzhou University, Henan, China
| | - Liuqing Yang
- Fuwai Central China Cardiovascular Hospital, Henan, China
| | - Huijun Ren
- Department of Clinical Laboratory, The First Affiliated Hospital of Zhengzhou University, No. 1 Jianshe Road, Zhengzhou, 450052, People's Republic of China.
| |
Collapse
|
10
|
Maki J, Oshimura A, Tsukano C, Yanagita RC, Saito Y, Sakakibara Y, Irie K. AI and computational chemistry-accelerated development of an alotaketal analogue with conventional PKC selectivity. Chem Commun (Camb) 2022; 58:6693-6696. [DOI: 10.1039/d2cc01759h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The protein kinase C (PKC) family consists of ten isozymes and is a potential target for treating cancer, Alzheimer’s disease, and HIV infection. Since known natural PKC agonists have little...
Collapse
|
11
|
Monteiro NRC, Ribeiro B, Arrais JP. Drug-Target Interaction Prediction: End-to-End Deep Learning Approach. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2364-2374. [PMID: 32142454 DOI: 10.1109/tcbb.2020.2977335] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The discovery of potential Drug-Target Interactions (DTIs) is a determining step in the drug discovery and repositioning process, as the effectiveness of the currently available antibiotic treatment is declining. Although putting efforts on the traditional in vivo or in vitro methods, pharmaceutical financial investment has been reduced over the years. Therefore, establishing effective computational methods is decisive to find new leads in a reasonable amount of time. Successful approaches have been presented to solve this problem but seldom protein sequences and structured data are used together. In this paper, we present a deep learning architecture model, which exploits the particular ability of Convolutional Neural Networks (CNNs) to obtain 1D representations from protein sequences (amino acid sequence) and compounds SMILES (Simplified Molecular Input Line Entry System) strings. These representations can be interpreted as features that express local dependencies or patterns that can then be used in a Fully Connected Neural Network (FCNN), acting as a binary classifier. The results achieved demonstrate that using CNNs to obtain representations of the data, instead of the traditional descriptors, lead to improved performance. The proposed end-to-end deep learning method outperformed traditional machine learning approaches in the correct classification of both positive and negative interactions.
Collapse
|
12
|
Androgen receptor antagonists produced by Streptomyces overcome resistance to enzalutamide. J Antibiot (Tokyo) 2021; 74:706-716. [PMID: 34282313 DOI: 10.1038/s41429-021-00453-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 06/02/2021] [Accepted: 06/03/2021] [Indexed: 02/06/2023]
Abstract
Prostate cancer (PC) is a leading cause of cancer-related death in men in Western countries. Androgen receptor (AR) signaling is a major driver of PC; therefore, androgen deprivation by medical and surgical castration is the standard treatment for patients with PC. However, over time, most patients will progress to metastatic castration-resistant PC. Enzalutamide is the only AR antagonist approved by the Food and Drug Administration for the treatment of metastatic castration-resistant PC. However, resistance to enzalutamide also develops in most patients with castration-resistant PC. Thus, there is an urgent need to develop new AR antagonists with new structures. For this purpose, we conducted both in silico and natural product screenings. From the in silico screening, we obtained T5853872 and more potent compound, STK765173. From the natural product screening, the novel compound arabilin was isolated from Streptomyces sp. MK756-CF1. Unlike STK765173, arabilin could overcome resistance to enzalutamide. Furthermore, we also extracted a novel compound, antarlide A, and its geometric isomers from Streptomyces sp. BB47. Antarlides A-F have novel 22-membered-ring macrocyclic structures, while antarlides G and H have 20-membered-ring structures. Both antarlides B and G showed potent AR antagonist activity in prostate cancer cells and could overcome resistance to enzalutamide.
Collapse
|
13
|
Watanabe N, Ohnuki Y, Sakakibara Y. Deep learning integration of molecular and interactome data for protein-compound interaction prediction. J Cheminform 2021; 13:36. [PMID: 33933121 PMCID: PMC8088618 DOI: 10.1186/s13321-021-00513-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 04/21/2021] [Indexed: 11/26/2022] Open
Abstract
Motivation Virtual screening, which can computationally predict the presence or absence of protein–compound interactions, has attracted attention as a large-scale, low-cost, and short-term search method for seed compounds. Existing machine learning methods for predicting protein–compound interactions are largely divided into those based on molecular structure data and those based on network data. The former utilize information on proteins and compounds, such as amino acid sequences and chemical structures; the latter rely on interaction network data, such as protein–protein interactions and compound–compound interactions. However, there have been few attempts to combine both types of data in molecular information and interaction networks. Results We developed a deep learning-based method that integrates protein features, compound features, and multiple types of interactome data to predict protein–compound interactions. We designed three benchmark datasets with different difficulties and applied them to evaluate the prediction method. The performance evaluations show that our deep learning framework for integrating molecular structure data and interactome data outperforms state-of-the-art machine learning methods for protein–compound interaction prediction tasks. The performance improvement is statistically significant according to the Wilcoxon signed-rank test. This finding reveals that the multi-interactome data captures perspectives other than amino acid sequence homology and chemical structure similarity and that both types of data synergistically improve the prediction accuracy. Furthermore, experiments on the three benchmark datasets show that our method is more robust than existing methods in accurately predicting interactions between proteins and compounds that are unseen in training samples.
Collapse
Affiliation(s)
- Narumi Watanabe
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa, 223-8522, Japan
| | - Yuuto Ohnuki
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa, 223-8522, Japan
| | - Yasubumi Sakakibara
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa, 223-8522, Japan.
| |
Collapse
|
14
|
Molecular docking and density functional theory studies of potent 1,3-disubstituted-9H-pyrido[3,4-b]indoles antifilarial compounds. Struct Chem 2021. [DOI: 10.1007/s11224-021-01772-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
15
|
Wang A, Wang M. Drug-Target Interaction Prediction via Dual Laplacian Graph Regularized Logistic Matrix Factorization. BIOMED RESEARCH INTERNATIONAL 2021; 2021:5599263. [PMID: 33855072 PMCID: PMC8019634 DOI: 10.1155/2021/5599263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 03/06/2021] [Accepted: 03/13/2021] [Indexed: 11/18/2022]
Abstract
Drug-target interactions provide useful information for biomedical drug discovery as well as drug development. However, it is costly and time consuming to find drug-target interactions by experimental methods. As a result, developing computational approaches for this task is necessary and has practical significance. In this study, we establish a novel dual Laplacian graph regularized logistic matrix factorization model for drug-target interaction prediction, referred to as DLGrLMF briefly. Specifically, DLGrLMF regards the task of drug-target interaction prediction as a weighted logistic matrix factorization problem, in which the experimentally validated interactions are allocated with larger weights. Meanwhile, by considering that drugs with similar chemical structure should have interactions with similar targets and targets with similar genomic sequence similarity should in turn have interactions with similar drugs, the drug pairwise chemical structure similarities as well as the target pairwise genomic sequence similarities are fully exploited to serve the matrix factorization problem by using a dual Laplacian graph regularization term. In addition, we design a gradient descent algorithm to solve the resultant optimization problem. Finally, the efficacy of DLGrLMF is validated on various benchmark datasets and the experimental results demonstrate that DLGrLMF performs better than other state-of-the-art methods. Case studies are also conducted to validate that DLGrLMF can successfully predict most of the experimental validated drug-target interactions.
Collapse
Affiliation(s)
- Aizhen Wang
- Department of Pharmacy, The Affiliated Huai'an Hospital of Xuzhou Medical University and The Second People's Hospital of Huai'an, Huai'an 223002, China
| | - Minhui Wang
- Department of Pharmacy, Lianshui People's Hospital Affiliated to Kangda College, Nanjing Medical University, Huai'an 223300, China
| |
Collapse
|
16
|
Xu H, Xu D, Zhang N, Zhang Y, Gao R. Protein-Protein Interaction Prediction Based on Spectral Radius and General Regression Neural Network. J Proteome Res 2021; 20:1657-1665. [PMID: 33555893 DOI: 10.1021/acs.jproteome.0c00871] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Protein-protein interaction (PPI) not only plays a critical role in cell life activities, but also plays an important role in discovering the mechanism of biological activity, protein function, and disease states. Developing computational methods is of great significance for PPIs prediction since experimental methods are time-consuming and laborious. In this paper, we proposed a PPI prediction algorithm called GRNN-PPI only using the amino acid sequence information based on general regression neural network and two feature extraction methods. Specifically, we designed a new feature extraction method named Mutation Spectral Radius (MSR) to extract evolutionary information by the BLOSUM62 matrix. Meanwhile, we integrated another feature extraction method, autocorrelation description, which can completely extract information on physicochemical properties and protein sequences. The principal component analysis was applied to eliminate noise, and the general regression neural network was adopted as a classifier. The prediction accuracy of the yeast, human, and Helicobacter pylori1 (H. pylori1) data sets were 97.47%, 99.63%, and 99.97%, respectively. In addition, we also conducted experiments on two important PPI networks and six independent data sets. All results were significantly higher than some state-of-the-art methods used for comparison, showing that our method is feasible and robust.
Collapse
Affiliation(s)
- Hanxiao Xu
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Da Xu
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Naiqian Zhang
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Yusen Zhang
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Rui Gao
- School of Control Science and Engineering, Shandong University, Jinan 250061, China
| |
Collapse
|
17
|
Ensemble Learning Prediction of Drug-Target Interactions Using GIST Descriptor Extracted from PSSM-Based Evolutionary Information. BIOMED RESEARCH INTERNATIONAL 2020; 2020:4516250. [PMID: 32908888 PMCID: PMC7463380 DOI: 10.1155/2020/4516250] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 08/02/2020] [Accepted: 08/10/2020] [Indexed: 12/02/2022]
Abstract
Identifying the drug-target interactions (DTIs) plays an essential role in new drug development. However, there still has the limited knowledge of DTIs and a significant number of unknown DTI pairs. Moreover, the traditional experimental methods have inevitable disadvantages such as high cost and time-consuming. Therefore, developing computational methods for predicting DTIs is attracting more and more attention. In this study, we report a novel computational approach for predicting DTI using GIST feature, position-specific scoring matrix (PSSM), and rotation forest (RF). Specifically, each target protein is first converted into a PSSM for retaining evolutionary information. Then, the GIST feature is extracted from PSSM and substructure fingerprint information is adopted to extract the feature of the drug. Finally, combining each protein and drug features to form a new drug-target pair, which is employed as input feature for RF classifier. In the experiment, the proposed method achieves high average accuracies of 89.25%, 85.93%, 82.36%, and 73.89% on enzyme, ion channel, G protein-coupled receptors (GPCRs), and nuclear receptor, respectively. For further evaluating the prediction performance of the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the same golden standard dataset. These promising results illustrate that the proposed method is more effective and stable than other methods. We expect the proposed method to be a useful tool for predicting large-scale DTIs.
Collapse
|
18
|
Wang W, Lv H, Zhao Y. Predicting DNA binding protein-drug interactions based on network similarity. BMC Bioinformatics 2020; 21:322. [PMID: 32689927 PMCID: PMC7372772 DOI: 10.1186/s12859-020-03664-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Accepted: 07/15/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND The study of DNA binding protein (DBP)-drug interactions can open a breakthrough for the treatment of genetic diseases and cancers. Currently, network-based methods are widely used for protein-drug interaction prediction, and many hidden relationships can be found through network analysis. We proposed a DCA (drug-cluster association) model for predicting DBP-drug interactions. The clusters are some similarities in the drug-binding site trimmers with their physicochemical properties. First, DBPs-drug binding sites are extracted from scPDB database. Second, each binding site is represented as a trimer which is obtained by sliding the window in the binding sites. Third, the trimers are clustered based on the physicochemical properties. Fourth, we build the network by generating the interaction matrix for representing the DCA network. Fifth, three link prediction methods are detected in the network. Finally, the common neighbor (CN) method is selected to predict drug-cluster associations in the DBP-drug network model. RESULT This network shows that drugs tend to bind to positively charged sites and the binding process is more likely to occur inside the DBPs. The results of the link prediction indicate that the CN method has better prediction performance than the PA and JA methods. The DBP-drug network prediction model is generated by using the CN method which predicted more accurately drug-trimer interactions and DBP-drug interactions. Such as, we found that Erythromycin (ERY) can establish an interaction relationship with HTH-type transcriptional repressor, which is fitted well with silico DBP-drug prediction. CONCLUSION The drug and protein bindings are local events. The binding of the drug-DBPs binding site represents this local binding event, which helps to understand the mechanism of DBP-drug interactions.
Collapse
Affiliation(s)
- Wei Wang
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, 453007, China. .,Big Data Engineering Laboratory for Teaching Resources & assessment of Education Quality, Henan Province, Xinxiang, China.
| | - Hehe Lv
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, 453007, China
| | - Yuan Zhao
- Department of Computer Science and Technology, College of Computer and Information Engineering, Henan Normal University, Xinxiang, 453007, China
| |
Collapse
|
19
|
Redkar S, Mondal S, Joseph A, Hareesha KS. A Machine Learning Approach for Drug-target Interaction Prediction using Wrapper Feature Selection and Class Balancing. Mol Inform 2020; 39:e1900062. [PMID: 32003548 DOI: 10.1002/minf.201900062] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 01/28/2020] [Indexed: 01/19/2023]
Abstract
Drug-Target interaction (DTI) plays a crucial role in drug discovery, drug repositioning and understanding the drug side effects which helps to identify new therapeutic profiles for various diseases. However, the exponential growth in the genomic and drugs data makes it difficult to identify the new associations between drugs and targets. Therefore, we use computational methods as it helps in accelerating the DTI identification process. Usually, available data driven sources consisting of known DTI is used to train the classifier to predict the new DTIs. Such datasets often face the problem of class imbalance. Therefore, in this study we address two challenges faced by such datasets, i. e., class imbalance and high dimensionality to develop a predictive model for DTI prediction. The study is carried out on four protein classes namely Enzyme, Ion Channel, G Protein-Coupled Receptor (GPCR) and Nuclear Receptor. We encoded the target protein sequence using the dipeptide composition and drug with a molecular descriptor. A machine learning approach is employed to predict the DTI using wrapper feature selection and synthetic minority oversampling technique (SMOTE). The ensemble approach achieved at the best an accuracy of 95.9 %, 93.4 %, 90.8 % and 90.6 % and 96.3 %, 92.8 %, 90.1 %, and 90.2 % of precision on Enzyme, Ion Channel, GPCR and Nuclear Receptor datasets, respectively, when evaluated excluding SMOTE samples with 10-fold cross validation. Furthermore, our method could predict new drug-target interactions not contained in training dataset. Selected features using wrapper feature selection may be important to understand the DTI for the protein categories under this study. Based on our evaluation, the proposed method can be used for understanding and identifying new drug-target interactions. We provide the readers with a standalone package available at https://github.com/shwetagithub1/predDTI which will be able to provide the DTI predictions to user for new query DTI pairs.
Collapse
Affiliation(s)
- Shweta Redkar
- Department of Computer Applications, Manipal Institute of Technology, Manipal Academy of Higher Education, 576104, Manipal, Karnataka, India
| | - Sukanta Mondal
- Department of Biological Sciences, Birla Institute of Technology and Science-Pilani, K.K.Birla Goa Campus, 403726, Zuarinagar, Goa, -India
| | - Alex Joseph
- Department of Pharmaceutical Chemistry, Manipal College of Pharmaceutical Sciences, Manipal Academy of Higher Education, 576104, Manipal, Karnataka, India
| | - K S Hareesha
- Department of Computer Applications, Manipal Institute of Technology, Manipal Academy of Higher Education, 576104, Manipal, Karnataka, India
| |
Collapse
|
20
|
Bagherian M, Sabeti E, Wang K, Sartor MA, Nikolovska-Coleska Z, Najarian K. Machine learning approaches and databases for prediction of drug-target interaction: a survey paper. Brief Bioinform 2020; 22:247-269. [PMID: 31950972 PMCID: PMC7820849 DOI: 10.1093/bib/bbz157] [Citation(s) in RCA: 172] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Revised: 11/01/2019] [Accepted: 11/07/2019] [Indexed: 12/12/2022] Open
Abstract
The task of predicting the interactions between drugs and targets plays a key role in the process of drug discovery. There is a need to develop novel and efficient prediction approaches in order to avoid costly and laborious yet not-always-deterministic experiments to determine drug–target interactions (DTIs) by experiments alone. These approaches should be capable of identifying the potential DTIs in a timely manner. In this article, we describe the data required for the task of DTI prediction followed by a comprehensive catalog consisting of machine learning methods and databases, which have been proposed and utilized to predict DTIs. The advantages and disadvantages of each set of methods are also briefly discussed. Lastly, the challenges one may face in prediction of DTI using machine learning approaches are highlighted and we conclude by shedding some lights on important future research directions.
Collapse
Affiliation(s)
- Maryam Bagherian
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Elyas Sabeti
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Kai Wang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Maureen A Sartor
- Department of Pathology, University of Michigan, Ann Arbor, MI, 48109, USA
| | | | - Kayvan Najarian
- Department of Electrical Engineering and Computer Science, College of Engineering, University of Michigan, Ann Arbor, MI, 48109, USA
| |
Collapse
|
21
|
Zhang W, Huai Y, Miao Z, Qian A, Wang Y. Systems Pharmacology for Investigation of the Mechanisms of Action of Traditional Chinese Medicine in Drug Discovery. Front Pharmacol 2019; 10:743. [PMID: 31379563 PMCID: PMC6657703 DOI: 10.3389/fphar.2019.00743] [Citation(s) in RCA: 111] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 06/07/2019] [Indexed: 01/01/2023] Open
Abstract
As a traditional medical intervention in Asia and a complementary and alternative medicine in western countries, traditional Chinese medicine (TCM) has attracted global attention in the life science field. TCM provides extensive natural resources for medicinal compounds, and these resources are generally regarded as effective and safe for use in drug discovery. However, owing to the complexity of compounds and their related multiple targets of TCM, it remains difficult to dissect the mechanisms of action of herbal medicines at a holistic level. To solve the issue, in the review, we proposed a novel approach of systems pharmacology to identify the bioactive compounds, predict their related targets, and illustrate the molecular mechanisms of action of TCM. With a predominant focus on the mechanisms of actions of TCM, we also highlighted the application of the systems pharmacology approach for the prediction of drug combination and dynamic analysis, the synergistic effects of TCMs, formula dissection, and theory analysis. In summary, the systems pharmacology method contributes to understand the complex interactions among biological systems, drugs, and complex diseases from a network perspective. Consequently, systems pharmacology provides a novel approach to promote drug discovery in a precise manner and a systems level, thus facilitating the modernization of TCM.
Collapse
Affiliation(s)
- Wenjuan Zhang
- Lab for Bone Metabolism, Key Lab for Space Biosciences and Biotechnology, School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
- Research Center for Special Medicine and Health Systems Engineering, School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
- NPU-UAB Joint Laboratory for Bone Metabolism, School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
| | - Ying Huai
- Lab for Bone Metabolism, Key Lab for Space Biosciences and Biotechnology, School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
- Research Center for Special Medicine and Health Systems Engineering, School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
- NPU-UAB Joint Laboratory for Bone Metabolism, School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
| | - Zhiping Miao
- Lab for Bone Metabolism, Key Lab for Space Biosciences and Biotechnology, School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
- Research Center for Special Medicine and Health Systems Engineering, School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
- NPU-UAB Joint Laboratory for Bone Metabolism, School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
| | - Airong Qian
- Lab for Bone Metabolism, Key Lab for Space Biosciences and Biotechnology, School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
- Research Center for Special Medicine and Health Systems Engineering, School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
- NPU-UAB Joint Laboratory for Bone Metabolism, School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
| | - Yonghua Wang
- Lab of Systems Pharmacology, College of Life Sciences, Northwest University, Xi’an, China
| |
Collapse
|
22
|
Kuthuru S, Szafran AT, Stossi F, Mancini MA, Rao A. Leveraging Image-Derived Phenotypic Measurements for Drug-Target Interaction Predictions. Cancer Inform 2019; 18:1176935119856595. [PMID: 31217689 PMCID: PMC6563400 DOI: 10.1177/1176935119856595] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Accepted: 05/18/2019] [Indexed: 11/25/2022] Open
Abstract
In recent years, protein kinases have become some of the most significant drug targets in cancer patients. Kinases are known to regulate the activity of many human proteins, and consequently their inhibition has been used to control cancer proliferation. A significant challenge in drug discovery is the rapid and efficient identification of new small molecules. In this study, we propose a novel in silico drug discovery approach to identify kinase targets that impinge on nuclear receptor signaling with data generated using high-content analysis (HCA). A high-throughput imaging dataset was generated from an siRNA human kinome screen on engineered cells that allow direct visualization of effects on estrogen receptor-α or a chimeric progesterone receptor B binding to specific DNA. Two types of kinase descriptors are extracted from these imaging data: first, a population-median-based descriptor and second a bag-of-words (BoW) descriptor that can capture heterogeneity information in the imaging data. Using these descriptors, we provide prediction results of drug-kinase-target interactions based on single-task learning, multi-task learning, and collaborative filtering methods. The best performing model in target-based drug discovery gives an area under the receiver operating characteristic curve (AUC) of 0.86, whereas the best model in ligand-based discovery gives an AUC of 0.79. These promising results suggest that imaging-based information can be used as an additional source of information to existing virtual screening methods, thereby making the drug discovery process more time and cost efficient.
Collapse
Affiliation(s)
- Srikanth Kuthuru
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Adam T Szafran
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA.,Gulf Coast Consortium Center for Advanced Microscopy and Image Informatics, Houston, TX, USA
| | - Fabio Stossi
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA.,Gulf Coast Consortium Center for Advanced Microscopy and Image Informatics, Houston, TX, USA.,Institute of Biosciences and Technology, Texas A&M University, Houston, TX, USA
| | - Michael A Mancini
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA.,Gulf Coast Consortium Center for Advanced Microscopy and Image Informatics, Houston, TX, USA.,Institute of Biosciences and Technology, Texas A&M University, Houston, TX, USA
| | - Arvind Rao
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.,Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
23
|
Zhou M, Chen Y, Xu R. A Drug-Side Effect Context-Sensitive Network approach for drug target prediction. Bioinformatics 2019; 35:2100-2107. [PMID: 30428013 PMCID: PMC6581434 DOI: 10.1093/bioinformatics/bty906] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 10/05/2018] [Accepted: 11/13/2018] [Indexed: 01/21/2023] Open
Abstract
SUMMARY Computational drug target prediction has become an important process in drug discovery. Network-based approaches are commonly used in computational drug-target interaction (DTI) prediction. Existing network-based approaches are limited in capturing the contextual information on how diseases, drugs and genes are connected. Here, we proposed a context-sensitive network (CSN) model for DTI prediction by modeling contextual drug phenotypic relationships. We constructed a Drug-Side Effect Context-Sensitive Network (DSE-CSN) of 139 760 drug-side effect pairs, representing 1480 drugs and 5868 side effects. We also built a protein-protein interaction network (PPIN) of 15 267 gene nodes and 178 972 weighted edges. A heterogeneous network was built by connecting the DSE-CSN and the PPIN through 3684 known DTIs. For each drug on the DSE-CSN, its genetic targets were predicted and prioritized using a network-based ranking algorithm. Our approach was evaluated in both de novo and leave-one-out cross-validation analysis using known DTIs as the gold standard. We compared our DSE-CSN-based model to the traditional similarity-based network (SBN)-based prediction model. The results suggested that the DSE-CSN-based model was able to rank known DTIs highly. In a de novo cross-validation, the area under the receiver operating characteristic (ROC) curve was 0.95. In a leave-one-out cross-validation, the average rank was top 3.2% for known DTIs. When it was compared to the SBN-based model using the Precision-Recall curve, our CSN-based model achieved a higher mean average precision (MAP) (0.23 versus 0.19, P-value<1e-4) in a de novo cross-validation analysis. We further improved the CSN-based DTI prediction by differentially weighting the drug-side effect pairs on the network and showed a significant improvement of the MAP (0.29 versus 0.23, P-value<1e-4). We also showed that the CSN-based model consistently achieved better performances than the traditional SBN-based model across different drug classes. Moreover, we demonstrated that our novel DTI predictions can be supported by published literature. In summary, the CSN-based model, by modeling the context-specific inter-relationships among drugs and side effects, has a high potential in drug target prediction. AVAILABILITY AND IMPLEMENTATION nlp/case/edu/public/data/DSE/CSN_DTI.
Collapse
Affiliation(s)
| | - Yang Chen
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Rong Xu
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| |
Collapse
|
24
|
Sachdev K, Gupta MK. A comprehensive review of feature based methods for drug target interaction prediction. J Biomed Inform 2019; 93:103159. [PMID: 30926470 DOI: 10.1016/j.jbi.2019.103159] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 03/25/2019] [Accepted: 03/26/2019] [Indexed: 12/22/2022]
Abstract
Drug target interaction is a prominent research area in the field of drug discovery. It refers to the recognition of interactions between chemical compounds and the protein targets in the human body. Wet lab experiments to identify these interactions are expensive as well as time consuming. The computational methods of interaction prediction help limit the search space for these experiments. These computational methods can be divided into ligand based approaches, docking approaches and chemogenomic approaches. In this review, we aim to describe the various feature based chemogenomic methods for drug target interaction prediction. It provides a comprehensive overview of the various techniques, datasets, tools and metrics. The feature based methods have been categorized, explained and compared. A novel framework for drug target interaction prediction has also been proposed that aims to improve the performance of existing methods. To the best of our knowledge, this is the first comprehensive review focusing only on feature based methods of drug target interaction.
Collapse
Affiliation(s)
- Kanica Sachdev
- Computer Science and Engineering Department, SMVDU, J&K, India.
| | | |
Collapse
|
25
|
Wang Y, You ZH, Yang S, Li X, Jiang TH, Zhou X. A High Efficient Biological Language Model for Predicting Protein⁻Protein Interactions. Cells 2019; 8:cells8020122. [PMID: 30717470 PMCID: PMC6406841 DOI: 10.3390/cells8020122] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Revised: 01/26/2019] [Accepted: 02/02/2019] [Indexed: 01/06/2023] Open
Abstract
Many life activities and key functions in organisms are maintained by different types of protein–protein interactions (PPIs). In order to accelerate the discovery of PPIs for different species, many computational methods have been developed. Unfortunately, even though computational methods are constantly evolving, efficient methods for predicting PPIs from protein sequence information have not been found for many years due to limiting factors including both methodology and technology. Inspired by the similarity of biological sequences and languages, developing a biological language processing technology may provide a brand new theoretical perspective and feasible method for the study of biological sequences. In this paper, a pure biological language processing model is proposed for predicting protein–protein interactions only using a protein sequence. The model was constructed based on a feature representation method for biological sequences called bio-to-vector (Bio2Vec) and a convolution neural network (CNN). The Bio2Vec obtains protein sequence features by using a “bio-word” segmentation system and a word representation model used for learning the distributed representation for each “bio-word”. The Bio2Vec supplies a frame that allows researchers to consider the context information and implicit semantic information of a bio sequence. A remarkable improvement in PPIs prediction performance has been observed by using the proposed model compared with state-of-the-art methods. The presentation of this approach marks the start of “bio language processing technology,” which could cause a technological revolution and could be applied to improve the quality of predictions in other problems.
Collapse
Affiliation(s)
- Yanbin Wang
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.
- University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Zhu-Hong You
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.
| | - Shan Yang
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.
- University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Xiao Li
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.
| | - Tong-Hai Jiang
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.
| | - Xi Zhou
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China.
| |
Collapse
|
26
|
Shi JY, Zhang AQ, Zhang SW, Mao KT, Yiu SM. A unified solution for different scenarios of predicting drug-target interactions via triple matrix factorization. BMC SYSTEMS BIOLOGY 2018; 12:136. [PMID: 30598094 PMCID: PMC6311903 DOI: 10.1186/s12918-018-0663-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Background During the identification of potential candidates, computational prediction of drug-target interactions (DTIs) is important to subsequent expensive validation in wet-lab. DTI screening considers four scenarios, depending on whether the drug is an existing or a new drug and whether the target is an existing or a new target. However, existing approaches have the following limitations. First, only a few of them can address the most difficult scenario (i.e., predicting interactions between new drugs and new targets). More importantly, none of the existing approaches could provide the explicit information for understanding the mechanism of forming interactions, such as the drug-target feature pairs contributing to the interactions. Results In this paper, we propose a Triple Matrix Factorization-based model (TMF) to tackle these problems. Compared with former state-of-the-art predictive methods, TMF demonstrates its significant superiority by assessing the predictions on four benchmark datasets over four kinds of screening scenarios. Also, it exhibits its outperformance by validating predicted novel interactions. More importantly, by using PubChem fingerprints of chemical structures as drug features and occurring frequencies of amino acid trimer as protein features, TMF shows its ability to find out the features determining interactions, including dominant feature pairs, frequently occurring substructures, and conserved triplet of amino acids. Conclusions Our TMF provides a unified framework of DTI prediction for all the screening scenarios. It also presents a new insight for the underlying mechanism of DTIs by indicating dominant features, which play important roles in the forming of DTI. Electronic supplementary material The online version of this article (10.1186/s12918-018-0663-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'An, China.
| | - An-Qi Zhang
- School of Life Sciences, Northwestern Polytechnical University, Xi'An, China
| | - Shao-Wu Zhang
- School of Automations, Northwestern Polytechnical University, Xi'An, China
| | - Kui-Tao Mao
- School of Computer Science, Northwestern Polytechnical University, Xi'An, China
| | - Siu-Ming Yiu
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
27
|
Wang M, Tang C, Chen J. Drug-Target Interaction Prediction via Dual Laplacian Graph Regularized Matrix Completion. BIOMED RESEARCH INTERNATIONAL 2018; 2018:1425608. [PMID: 30627536 PMCID: PMC6304580 DOI: 10.1155/2018/1425608] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 09/03/2018] [Accepted: 10/24/2018] [Indexed: 01/16/2023]
Abstract
Drug-target interactions play an important role for biomedical drug discovery and development. However, it is expensive and time-consuming to accomplish this task by experimental determination. Therefore, developing computational techniques for drug-target interaction prediction is urgent and has practical significance. In this work, we propose an effective computational model of dual Laplacian graph regularized matrix completion, referred to as DLGRMC briefly, to infer the unknown drug-target interactions. Specifically, DLGRMC transforms the task of drug-target interaction prediction into a matrix completion problem, in which the potential interactions between drugs and targets can be obtained based on the prediction scores after the matrix completion procedure. In DLGRMC, the drug pairwise chemical structure similarities and the target pairwise genomic sequence similarities are fully exploited to serve the matrix completion by using a dual Laplacian graph regularization term; i.e., drugs with similar chemical structure are more likely to have interactions with similar targets and targets with similar genomic sequence similarity are more likely to have interactions with similar drugs. In addition, during the matrix completion process, an indicator matrix with binary values which indicates the indices of the observed drug-target interactions is deployed to preserve the experimental confirmed interactions. Furthermore, we develop an alternative iterative strategy to solve the constrained matrix completion problem based on Augmented Lagrange Multiplier algorithm. We evaluate DLGRMC on five benchmark datasets and the results show that DLGRMC outperforms several state-of-the-art approaches in terms of 10-fold cross validation based AUPR values and PR curves. In addition, case studies also demonstrate that DLGRMC can successfully predict most of the experimental validated drug-target interactions.
Collapse
Affiliation(s)
- Minhui Wang
- Department of Pharmacy, People's Hospital of Lian'shui County, Huai'an 223300, China
| | - Chang Tang
- School of Computer Science, China University of Geosciences, Wuhan 430074, China
| | - Jiajia Chen
- Department of Pharmacy, The Affiliated Huai'an Hospital of Xuzhou Medical University, Huai'an 223002, China
| |
Collapse
|
28
|
Imoto M. Chemistry and biology for the small molecules targeting characteristics of cancer cells. Biosci Biotechnol Biochem 2018; 83:1-10. [PMID: 30247093 DOI: 10.1080/09168451.2018.1518704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Accepted: 08/22/2018] [Indexed: 10/28/2022]
Abstract
Despite the marked progress of cancer research, cancer is the predominant cause of death in Japan, and therefore development of effective therapeutic drugs is expected. Chemical biology is a research field utilizing small molecules to investigate biological phenomena. One of the most important aims of chemical biology is to find the small molecules, and natural products are ideal screening sources due to their structural diversity. Therefore, natural product screening based on the progress of chemical biology prompted us to find small molecules targeting cancer characteristics. Another contribution of chemical biology is to facilitate the target identification of small molecule. Therefore, among a variety of methods to uncover protein function, chemical biology is a remarkable approach in which small molecules are used as probes to elucidate protein functions related to cancer development. ABBREVIATIONS EGF: Epidermal growth factor; PDGF: Platelet-derived growth factor; CRPC: Castration-resistant prostate cancer; AR: Androgen receptor; FTase: Farnesyl transferase; 5-LOX: 5-Lipoxygenase; LT: Leukotriene; CysLT1: Cysteinyl leukotriene receptor 1; GPA: Glucopiericidin A; PA: Piericidin A; XN: Xanthohumol; VCP: Valosin-containing protein; ACACA: Acetyl-CoA carboxylase-α.
Collapse
Affiliation(s)
- Masaya Imoto
- a Department of Biosciences and Informatics, Faculty of Science and Technology , Keio University , Kohoku-ku, Yokohama , Japan
| |
Collapse
|
29
|
Chen R, Liu X, Jin S, Lin J, Liu J. Machine Learning for Drug-Target Interaction Prediction. Molecules 2018; 23:E2208. [PMID: 30200333 PMCID: PMC6225477 DOI: 10.3390/molecules23092208] [Citation(s) in RCA: 123] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2018] [Revised: 08/27/2018] [Accepted: 08/27/2018] [Indexed: 12/18/2022] Open
Abstract
Identifying drug-target interactions will greatly narrow down the scope of search of candidate medications, and thus can serve as the vital first step in drug discovery. Considering that in vitro experiments are extremely costly and time-consuming, high efficiency computational prediction methods could serve as promising strategies for drug-target interaction (DTI) prediction. In this review, our goal is to focus on machine learning approaches and provide a comprehensive overview. First, we summarize a brief list of databases frequently used in drug discovery. Next, we adopt a hierarchical classification scheme and introduce several representative methods of each category, especially the recent state-of-the-art methods. In addition, we compare the advantages and limitations of methods in each category. Lastly, we discuss the remaining challenges and future outlook of machine learning in DTI prediction. This article may provide a reference and tutorial insights on machine learning-based DTI prediction for future researchers.
Collapse
Affiliation(s)
- Ruolan Chen
- Department of Computer Science, School of Information Science and Technology, Xiamen University, Xiamen 361005, China.
| | - Xiangrong Liu
- Department of Computer Science, School of Information Science and Technology, Xiamen University, Xiamen 361005, China.
| | - Shuting Jin
- Department of Computer Science, School of Information Science and Technology, Xiamen University, Xiamen 361005, China.
| | - Jiawei Lin
- Department of Computer Science, School of Information Science and Technology, Xiamen University, Xiamen 361005, China.
| | - Juan Liu
- Department of Instrumental and Electrical Engineering, School of Aerospace Engineering, Xiamen University, Xiamen 361005, China.
| |
Collapse
|
30
|
|
31
|
Ezzat A, Wu M, Li XL, Kwoh CK. Computational prediction of drug–target interactions using chemogenomic approaches: an empirical survey. Brief Bioinform 2018; 20:1337-1357. [DOI: 10.1093/bib/bby002] [Citation(s) in RCA: 117] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Revised: 12/21/2017] [Indexed: 01/18/2023] Open
Abstract
Abstract
Computational prediction of drug–target interactions (DTIs) has become an essential task in the drug discovery process. It narrows down the search space for interactions by suggesting potential interaction candidates for validation via wet-lab experiments that are well known to be expensive and time-consuming. In this article, we aim to provide a comprehensive overview and empirical evaluation on the computational DTI prediction techniques, to act as a guide and reference for our fellow researchers. Specifically, we first describe the data used in such computational DTI prediction efforts. We then categorize and elaborate the state-of-the-art methods for predicting DTIs. Next, an empirical comparison is performed to demonstrate the prediction performance of some representative methods under different scenarios. We also present interesting findings from our evaluation study, discussing the advantages and disadvantages of each method. Finally, we highlight potential avenues for further enhancement of DTI prediction performance as well as related research directions.
Collapse
|
32
|
Predicting inhibitory and activatory drug targets by chemically and genetically perturbed transcriptome signatures. Sci Rep 2018; 8:156. [PMID: 29317676 PMCID: PMC5760621 DOI: 10.1038/s41598-017-18315-9] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Accepted: 12/08/2017] [Indexed: 02/06/2023] Open
Abstract
Genome-wide identification of all target proteins of drug candidate compounds is a challenging issue in drug discovery. Moreover, emerging phenotypic effects, including therapeutic and adverse effects, are heavily dependent on the inhibition or activation of target proteins. Here we propose a novel computational method for predicting inhibitory and activatory targets of drug candidate compounds. Specifically, we integrated chemically-induced and genetically-perturbed gene expression profiles in human cell lines, which avoided dependence on chemical structures of compounds or proteins. Predictive models for individual target proteins were simultaneously constructed by the joint learning algorithm based on transcriptomic changes in global patterns of gene expression profiles following chemical treatments, and following knock-down and over-expression of proteins. This method discriminates between inhibitory and activatory targets and enables accurate identification of therapeutic effects. Herein, we comprehensively predicted drug-target-disease association networks for 1,124 drugs, 829 target proteins, and 365 human diseases, and validated some of these predictions in vitro. The proposed method is expected to facilitate identification of new drug indications and potential adverse effects.
Collapse
|
33
|
Yamanishi Y. Linear and Kernel Model Construction Methods for Predicting Drug-Target Interactions in a Chemogenomic Framework. Methods Mol Biol 2018; 1825:355-368. [PMID: 30334213 DOI: 10.1007/978-1-4939-8639-2_12] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Identification of drug-target interactions is a crucial process in drug discovery. In this chapter, we present protocols for recent advancements in machine learning methods for predicting drug-target interactions from heterogeneous biological data in a chemogenomic framework, in which prediction is based on the chemical structure data of drug candidate compounds and translated genomic sequence data of target candidate proteins. Most existing methods are based on either linear modeling or kernel modeling. To illustrate linear modeling, we introduce sparsity-induced binary classifiers and sparse canonical correlation analysis. To illustrate kernel modeling, we introduce pairwise kernel-based support vector machines and kernel-based distance learning. Workflows for using these techniques are presented. We also discuss the characteristics of each method and suggest some directions for future research.
Collapse
Affiliation(s)
- Yoshihiro Yamanishi
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan.
- PRESTO, Japan Science and Technology Agency, Kawaguchi, Saitama, Japan.
| |
Collapse
|
34
|
Abstract
Most drugs produce their phenotypic effects by interacting with target proteins, and understanding the molecular features that underpin drug-target interactions is crucial when designing a novel drug. In this chapter, we introduce the protocols that have driven recent advances in sparse modeling methods for analyzing drug-target interaction networks within a chemogenomic framework. In this approach, the chemical structures of candidate drug compounds are correlated with the genomic sequences of the candidate target proteins. We demonstrate the use of sparse canonical correspondence analysis and sparsity-induced binary classifiers to extract the underlying molecular features that are most strongly involved in drug-target interactions. We focus on drug chemical substructures and protein domains. Workflows for applying these methods are presented, and an application is described in detail. We consider the characteristics of each method and suggest possible directions for future research.
Collapse
|
35
|
Zhang W, Chen Y, Li D. Drug-Target Interaction Prediction through Label Propagation with Linear Neighborhood Information. Molecules 2017; 22:molecules22122056. [PMID: 29186828 PMCID: PMC6149680 DOI: 10.3390/molecules22122056] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Revised: 11/19/2017] [Accepted: 11/20/2017] [Indexed: 11/16/2022] Open
Abstract
Interactions between drugs and target proteins provide important information for the drug discovery. Currently, experiments identified only a small number of drug-target interactions. Therefore, the development of computational methods for drug-target interaction prediction is an urgent task of theoretical interest and practical significance. In this paper, we propose a label propagation method with linear neighborhood information (LPLNI) for predicting unobserved drug-target interactions. Firstly, we calculate drug-drug linear neighborhood similarity in the feature spaces, by considering how to reconstruct data points from neighbors. Then, we take similarities as the manifold of drugs, and assume the manifold unchanged in the interaction space. At last, we predict unobserved interactions between known drugs and targets by using drug-drug linear neighborhood similarity and known drug-target interactions. The experiments show that LPLNI can utilize only known drug-target interactions to make high-accuracy predictions on four benchmark datasets. Furthermore, we consider incorporating chemical structures into LPLNI models. Experimental results demonstrate that the model with integrated information (LPLNI-II) can produce improved performances, better than other state-of-the-art methods. The known drug-target interactions are an important information source for computational predictions. The usefulness of the proposed method is demonstrated by cross validation and the case study.
Collapse
Affiliation(s)
- Wen Zhang
- School of Computer, Wuhan University, Wuhan 430072, China.
| | - Yanlin Chen
- School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China.
| | - Dingfang Li
- School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China.
| |
Collapse
|
36
|
Bolgár B, Antal P. VB-MK-LMF: fusion of drugs, targets and interactions using variational Bayesian multiple kernel logistic matrix factorization. BMC Bioinformatics 2017; 18:440. [PMID: 28978313 PMCID: PMC5628496 DOI: 10.1186/s12859-017-1845-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 09/21/2017] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Computational fusion approaches to drug-target interaction (DTI) prediction, capable of utilizing multiple sources of background knowledge, were reported to achieve superior predictive performance in multiple studies. Other studies showed that specificities of the DTI task, such as weighting the observations and focusing the side information are also vital for reaching top performance. METHOD We present Variational Bayesian Multiple Kernel Logistic Matrix Factorization (VB-MK-LMF), which unifies the advantages of (1) multiple kernel learning, (2) weighted observations, (3) graph Laplacian regularization, and (4) explicit modeling of probabilities of binary drug-target interactions. RESULTS VB-MK-LMF achieves significantly better predictive performance in standard benchmarks compared to state-of-the-art methods, which can be traced back to multiple factors. The systematic evaluation of the effect of multiple kernels confirm their benefits, but also highlights the limitations of linear kernel combinations, already recognized in other fields. The analysis of the effect of prior kernels using varying sample sizes sheds light on the balance of data and knowledge in DTI tasks and on the rate at which the effect of priors vanishes. This also shows the existence of "small sample size" regions where using side information offers significant gains. Alongside favorable predictive performance, a notable property of MF methods is that they provide a unified space for drugs and targets using latent representations. Compared to earlier studies, the dimensionality of this space proved to be surprisingly low, which makes the latent representations constructed by VB-ML-LMF especially well-suited for visual analytics. The probabilistic nature of the predictions allows the calculation of the expected values of hits in functionally relevant sets, which we demonstrate by predicting drug promiscuity. The variational Bayesian approximation is also implemented for general purpose graphics processing units yielding significantly improved computational time. CONCLUSION In standard benchmarks, VB-MK-LMF shows significantly improved predictive performance in a wide range of settings. Beyond these benchmarks, another contribution of our work is highlighting and providing estimates for further pharmaceutically relevant quantities, such as promiscuity, druggability and total number of interactions.
Collapse
Affiliation(s)
- Bence Bolgár
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Magyar tudósok krt. 2., Budapest, 1117 Hungary
| | - Péter Antal
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Magyar tudósok krt. 2., Budapest, 1117 Hungary
| |
Collapse
|
37
|
Yuan Q, Gao J, Wu D, Zhang S, Mamitsuka H, Zhu S. DrugE-Rank: improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank. Bioinformatics 2017; 32:i18-i27. [PMID: 27307615 PMCID: PMC4908328 DOI: 10.1093/bioinformatics/btw244] [Citation(s) in RCA: 99] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Motivation: Identifying drug–target interactions is an important task in drug discovery. To reduce heavy time and financial cost in experimental way, many computational approaches have been proposed. Although these approaches have used many different principles, their performance is far from satisfactory, especially in predicting drug–target interactions of new candidate drugs or targets. Methods: Approaches based on machine learning for this problem can be divided into two types: feature-based and similarity-based methods. Learning to rank is the most powerful technique in the feature-based methods. Similarity-based methods are well accepted, due to their idea of connecting the chemical and genomic spaces, represented by drug and target similarities, respectively. We propose a new method, DrugE-Rank, to improve the prediction performance by nicely combining the advantages of the two different types of methods. That is, DrugE-Rank uses LTR, for which multiple well-known similarity-based methods can be used as components of ensemble learning. Results: The performance of DrugE-Rank is thoroughly examined by three main experiments using data from DrugBank: (i) cross-validation on FDA (US Food and Drug Administration) approved drugs before March 2014; (ii) independent test on FDA approved drugs after March 2014; and (iii) independent test on FDA experimental drugs. Experimental results show that DrugE-Rank outperforms competing methods significantly, especially achieving more than 30% improvement in Area under Prediction Recall curve for FDA approved new drugs and FDA experimental drugs. Availability:http://datamining-iip.fudan.edu.cn/service/DrugE-Rank Contact:zhusf@fudan.edu.cn Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qingjun Yuan
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
| | - Junning Gao
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
| | - Dongliang Wu
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
| | - Shihua Zhang
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Japan Department of Computer Science, Aalto University, Finland
| | - Shanfeng Zhu
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China Centre for Computational System Biology, Fudan University, Shanghai, China
| |
Collapse
|
38
|
He T, Heidemeyer M, Ban F, Cherkasov A, Ester M. SimBoost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines. J Cheminform 2017; 9:24. [PMID: 29086119 PMCID: PMC5395521 DOI: 10.1186/s13321-017-0209-z] [Citation(s) in RCA: 171] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Accepted: 03/30/2017] [Indexed: 02/06/2023] Open
Abstract
Computational prediction of the interaction between drugs and targets is a standing challenge in the field of drug discovery. A number of rather accurate predictions were reported for various binary drug–target benchmark datasets. However, a notable drawback of a binary representation of interaction data is that missing endpoints for non-interacting drug–target pairs are not differentiated from inactive cases, and that predicted levels of activity depend on pre-defined binarization thresholds. In this paper, we present a method called SimBoost that predicts continuous (non-binary) values of binding affinities of compounds and proteins and thus incorporates the whole interaction spectrum from true negative to true positive interactions. Additionally, we propose a version of the method called SimBoostQuant which computes a prediction interval in order to assess the confidence of the predicted affinity, thus defining the Applicability Domain metrics explicitly. We evaluate SimBoost and SimBoostQuant on two established drug–target interaction benchmark datasets and one new dataset that we propose to use as a benchmark for read-across cheminformatics applications. We demonstrate that our methods outperform the previously reported models across the studied datasets.
Collapse
Affiliation(s)
- Tong He
- School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada
| | - Marten Heidemeyer
- School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada
| | - Fuqiang Ban
- Faculty of Medicine, Vancouver Prostate Center, University of British Columbia, Vancouver, BC, V6H 3Z6, Canada
| | - Artem Cherkasov
- Faculty of Medicine, Vancouver Prostate Center, University of British Columbia, Vancouver, BC, V6H 3Z6, Canada
| | - Martin Ester
- School of Computing Science, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada.
| |
Collapse
|
39
|
Wang C, Liu J, Luo F, Hu QN. Multi-fields model for predicting target–ligand interaction. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.03.079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
40
|
Yao ZJ, Dong J, Che YJ, Zhu MF, Wen M, Wang NN, Wang S, Lu AP, Cao DS. TargetNet: a web service for predicting potential drug-target interaction profiling via multi-target SAR models. J Comput Aided Mol Des 2016; 30:413-24. [PMID: 27167132 DOI: 10.1007/s10822-016-9915-2] [Citation(s) in RCA: 214] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Accepted: 05/06/2016] [Indexed: 02/01/2023]
Abstract
Drug-target interactions (DTIs) are central to current drug discovery processes and public health fields. Analyzing the DTI profiling of the drugs helps to infer drug indications, adverse drug reactions, drug-drug interactions, and drug mode of actions. Therefore, it is of high importance to reliably and fast predict DTI profiling of the drugs on a genome-scale level. Here, we develop the TargetNet server, which can make real-time DTI predictions based only on molecular structures, following the spirit of multi-target SAR methodology. Naïve Bayes models together with various molecular fingerprints were employed to construct prediction models. Ensemble learning from these fingerprints was also provided to improve the prediction ability. When the user submits a molecule, the server will predict the activity of the user's molecule across 623 human proteins by the established high quality SAR model, thus generating a DTI profiling that can be used as a feature vector of chemicals for wide applications. The 623 SAR models related to 623 human proteins were strictly evaluated and validated by several model validation strategies, resulting in the AUC scores of 75-100 %. We applied the generated DTI profiling to successfully predict potential targets, toxicity classification, drug-drug interactions, and drug mode of action, which sufficiently demonstrated the wide application value of the potential DTI profiling. The TargetNet webserver is designed based on the Django framework in Python, and is freely accessible at http://targetnet.scbdd.com .
Collapse
Affiliation(s)
- Zhi-Jiang Yao
- School of Pharmaceutical Sciences, Central South University, Changsha, 410013, People's Republic of China
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, People's Republic of China
| | - Jie Dong
- School of Pharmaceutical Sciences, Central South University, Changsha, 410013, People's Republic of China
| | - Yu-Jing Che
- School of Mathematics and Statistics, Central South University, Changsha, 410083, People's Republic of China
| | - Min-Feng Zhu
- School of Mathematics and Statistics, Central South University, Changsha, 410083, People's Republic of China
| | - Ming Wen
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, People's Republic of China
| | - Ning-Ning Wang
- School of Pharmaceutical Sciences, Central South University, Changsha, 410013, People's Republic of China
| | - Shan Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, People's Republic of China
| | - Ai-Ping Lu
- Institute of Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, SAR, People's Republic of China
| | - Dong-Sheng Cao
- School of Pharmaceutical Sciences, Central South University, Changsha, 410013, People's Republic of China.
- Institute of Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, SAR, People's Republic of China.
| |
Collapse
|
41
|
Shar PA, Tao W, Gao S, Huang C, Li B, Zhang W, Shahen M, Zheng C, Bai Y, Wang Y. Pred-binding: large-scale protein-ligand binding affinity prediction. J Enzyme Inhib Med Chem 2016; 31:1443-50. [PMID: 26888050 DOI: 10.3109/14756366.2016.1144594] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Drug target interactions (DTIs) are crucial in pharmacology and drug discovery. Presently, experimental determination of compound-protein interactions remains challenging because of funding investment and difficulties of purifying proteins. In this study, we proposed two in silico models based on support vector machine (SVM) and random forest (RF), using 1589 molecular descriptors and 1080 protein descriptors in 9948 ligand-protein pairs to predict DTIs that were quantified by Ki values. The cross-validation coefficient of determination of 0.6079 for SVM and 0.6267 for RF were obtained, respectively. In addition, the two-dimensional (2D) autocorrelation, topological charge indices and three-dimensional (3D)-MoRSE descriptors of compounds, the autocorrelation descriptors and the amphiphilic pseudo-amino acid composition of protein are found most important for Ki predictions. These models provide a new opportunity for the prediction of ligand-receptor interactions that will facilitate the target discovery and toxicity evaluation in drug development.
Collapse
Affiliation(s)
- Piar Ali Shar
- a Bioinformatics Center, College of Life Sciences, Northwest A & F University , Yangling , Shaanxi , China
| | - Weiyang Tao
- a Bioinformatics Center, College of Life Sciences, Northwest A & F University , Yangling , Shaanxi , China
| | - Shuo Gao
- a Bioinformatics Center, College of Life Sciences, Northwest A & F University , Yangling , Shaanxi , China
| | - Chao Huang
- a Bioinformatics Center, College of Life Sciences, Northwest A & F University , Yangling , Shaanxi , China
| | - Bohui Li
- a Bioinformatics Center, College of Life Sciences, Northwest A & F University , Yangling , Shaanxi , China
| | - Wenjuan Zhang
- a Bioinformatics Center, College of Life Sciences, Northwest A & F University , Yangling , Shaanxi , China
| | - Mohamed Shahen
- a Bioinformatics Center, College of Life Sciences, Northwest A & F University , Yangling , Shaanxi , China
| | - Chunli Zheng
- a Bioinformatics Center, College of Life Sciences, Northwest A & F University , Yangling , Shaanxi , China
| | - Yaofei Bai
- a Bioinformatics Center, College of Life Sciences, Northwest A & F University , Yangling , Shaanxi , China
| | - Yonghua Wang
- a Bioinformatics Center, College of Life Sciences, Northwest A & F University , Yangling , Shaanxi , China
| |
Collapse
|
42
|
Neighborhood Regularized Logistic Matrix Factorization for Drug-Target Interaction Prediction. PLoS Comput Biol 2016; 12:e1004760. [PMID: 26872142 PMCID: PMC4752318 DOI: 10.1371/journal.pcbi.1004760] [Citation(s) in RCA: 200] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Accepted: 01/14/2016] [Indexed: 12/19/2022] Open
Abstract
In pharmaceutical sciences, a crucial step of the drug discovery process is the identification of drug-target interactions. However, only a small portion of the drug-target interactions have been experimentally validated, as the experimental validation is laborious and costly. To improve the drug discovery efficiency, there is a great need for the development of accurate computational approaches that can predict potential drug-target interactions to direct the experimental verification. In this paper, we propose a novel drug-target interaction prediction algorithm, namely neighborhood regularized logistic matrix factorization (NRLMF). Specifically, the proposed NRLMF method focuses on modeling the probability that a drug would interact with a target by logistic matrix factorization, where the properties of drugs and targets are represented by drug-specific and target-specific latent vectors, respectively. Moreover, NRLMF assigns higher importance levels to positive observations (i.e., the observed interacting drug-target pairs) than negative observations (i.e., the unknown pairs). Because the positive observations are already experimentally verified, they are usually more trustworthy. Furthermore, the local structure of the drug-target interaction data has also been exploited via neighborhood regularization to achieve better prediction accuracy. We conducted extensive experiments over four benchmark datasets, and NRLMF demonstrated its effectiveness compared with five state-of-the-art approaches. This work introduces a computational approach, namely neighborhood regularized logistic matrix factorization (NRLMF), to predicting potential interactions between drugs and targets. The novelty of NRLMF lies in integrating logistic matrix factorization with neighborhood regularization for drug-target interaction prediction. In NRLMF, we model the interaction probability for each drug-target pair using logistic matrix factorization. As the observed interacting drug-target pairs are experimentally verified, they are more trustworthy than the unknown pairs. We propose to assign higher importance levels to interaction pairs and lower importance levels to unknown pairs. In addition, we further improve the prediction accuracy by neighborhood regularization, which considers the neighborhood influences from most similar drugs and most similar targets. To evaluate the performance of NRLMF, we conducted extensive experiments on four benchmark datasets. The experimental results demonstrated that NRLMF usually outperformed five state-of-the-art methods under three different cross-validation settings, in terms of the area under the ROC curve (AUC) and the area under the precision-recall curve (AUPR). In addition, we confirmed the practical prediction ability of NRLMF by mapping with the latest version of four online biological databases, including ChEMBL, DrugBank, KEGG, and Matador.
Collapse
|
43
|
Liao Q, Guan N, Wu C, Zhang Q. Predicting Unknown Interactions Between Known Drugs and Targets via Matrix Completion. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING 2016. [DOI: 10.1007/978-3-319-31753-3_47] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
44
|
Hizukuri Y, Sawada R, Yamanishi Y. Predicting target proteins for drug candidate compounds based on drug-induced gene expression data in a chemical structure-independent manner. BMC Med Genomics 2015; 8:82. [PMID: 26684652 PMCID: PMC4683716 DOI: 10.1186/s12920-015-0158-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Accepted: 12/08/2015] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND Phenotype-based high-throughput screening is a useful technique for identifying drug candidate compounds that have a desired phenotype. However, the molecular mechanisms of the hit compounds remain unknown, and substantial effort is required to identify the target proteins associated with the phenotype. METHODS In this study, we propose a new method to predict target proteins of drug candidate compounds based on drug-induced gene expression data in Connectivity Map and a machine learning classification technique, which we call the "transcriptomic approach." RESULTS Unlike existing methods such as the chemogenomic approach, the transcriptomic approach enabled the prediction of target proteins without dependence on prior knowledge of compound chemical structures. The prediction accuracy of the chemogenomic approach was highly depended on compounds structure similarities in data sets. In contrast, the prediction accuracy of the transcriptomic approach was maintained at a sufficient level, even for benchmark data consisting of structurally diverse compounds. CONCLUSIONS The transcriptomic approach reported here is expected to be a useful tool for structure-independent prediction of target proteins for drug candidate compounds.
Collapse
Affiliation(s)
- Yoshiyuki Hizukuri
- Faculty of Exploratory Technology, Asubio Pharma Co. Ltd., 6-4-3 Minatojima-Minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan.
| | - Ryusuke Sawada
- Division of System Cohort, Multi-Scale Research Center for Medical Science, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka, Fukuoka, 812-8582, Japan.
| | - Yoshihiro Yamanishi
- Division of System Cohort, Multi-Scale Research Center for Medical Science, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka, Fukuoka, 812-8582, Japan. .,Institute for Advanced Study, Kyushu University, 6-10-1, Hakozaki, Higashi-ku, Fukuoka, Fukuoka, 812-8581, Japan.
| |
Collapse
|
45
|
Sawada R, Iwata H, Mizutani S, Yamanishi Y. Target-Based Drug Repositioning Using Large-Scale Chemical-Protein Interactome Data. J Chem Inf Model 2015; 55:2717-30. [PMID: 26580494 DOI: 10.1021/acs.jcim.5b00330] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Drug repositioning, or the identification of new indications for known drugs, is a useful strategy for drug discovery. In this study, we developed novel computational methods to predict potential drug targets and new drug indications for systematic drug repositioning using large-scale chemical-protein interactome data. We explored the target space of drugs (including primary targets and off-targets) based on chemical structure similarity and phenotypic effect similarity by making optimal use of millions of compound-protein interactions. On the basis of the target profiles of drugs, we constructed statistical models to predict new drug indications for a wide range of diseases with various molecular features. The proposed method outperformed previous methods in terms of interpretability, applicability, and accuracy. Finally, we conducted a comprehensive prediction of the drug-target-disease association network for 8270 drugs and 1401 diseases and showed biologically meaningful examples of newly predicted drug targets and drug indications. The predictive model is useful to understand the mechanisms of the predicted drug indications.
Collapse
Affiliation(s)
- Ryusuke Sawada
- Division of System Cohort, Multi-scale Research Center for Medical Science, Medical Institute of Bioregulation, Kyushu University , 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Hiroaki Iwata
- Division of System Cohort, Multi-scale Research Center for Medical Science, Medical Institute of Bioregulation, Kyushu University , 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Sayaka Mizutani
- Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology , 2-12-1 Ookayama, Meguro-ku, Tokyo 152-8550, Japan
| | - Yoshihiro Yamanishi
- Division of System Cohort, Multi-scale Research Center for Medical Science, Medical Institute of Bioregulation, Kyushu University , 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan.,Institute for Advanced Study, Kyushu University , 6-10-1, Hakozaki, Higashi-ku, Fukuoka 812-8581, Japan
| |
Collapse
|
46
|
Tashiro E, Imoto M. Chemical biology of compounds obtained from screening using disease models. Arch Pharm Res 2015; 38:1651-60. [PMID: 26177809 DOI: 10.1007/s12272-015-0633-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Accepted: 07/06/2015] [Indexed: 01/06/2023]
Abstract
Bioactive compounds are extremely powerful tools for studying biological systems because they can rapidly, conditionally, often reversibly, and dose-dependently modulate the biological function of living cells. Moreover, they are expected to be drug seeds for chemotherapy of several diseases. Two approaches are used to find and obtain bioactive compounds, namely, molecular-target-based screening and phenotypic screening. Through phenotypic screening that mimics tumor metastasis, multi-drug resistance, and Parkinson's disease, we identified several compounds that inhibit cancer cell migration, anti-apoptotic function of Bcl-2/Bcl-xL, and neuronal cell death. By using MEK inhibitor that was developed by target-based screening, we discovered that MEK inhibitor selectively induces apoptosis in tumor cells with β-catenin mutation. Using target-based screening, we identified arabilin, a novel androgen antagonist. In this review, we introduce our recent studies on the identification of bioactive compounds by phenotypic screening and by target-based screening for drug-seed discovery.
Collapse
Affiliation(s)
- Estu Tashiro
- Department of Biosciences and Informatics, Faculty of Science and Technology, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, 223-8522, Japan
| | - Masaya Imoto
- Department of Biosciences and Informatics, Faculty of Science and Technology, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, 223-8522, Japan.
| |
Collapse
|
47
|
Wang C, Liu J, Luo F, Deng Z, Hu QN. Predicting target-ligand interactions using protein ligand-binding site and ligand substructures. BMC SYSTEMS BIOLOGY 2015; 9 Suppl 1:S2. [PMID: 25707321 PMCID: PMC4331677 DOI: 10.1186/1752-0509-9-s1-s2] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Background Cell proliferation, differentiation, Gene expression, metabolism, immunization and signal transduction require the participation of ligands and targets. It is a great challenge to identify rules governing molecular recognition between chemical topological substructures of ligands and the binding sites of the targets. Methods We suppose that the ligand-target interactions are determined by ligand substructures as well as the physical-chemical properties of the binding sites. Therefore, we propose a fragment interaction model (FIM) to describe the interactions between ligands and targets, with the purpose of facilitating the chemical interpretation of ligand-target binding. First we extract target-ligand complexes from sc-PDB database, based on which, we get the target binding sites and the ligands. Then we represent each binding site as a fragment vector based on a target fragment dictionary that is composed of 199 clusters (denoted as fragements in this work) obtained by clustering 4200 trimers according to their physical-chemical properties. And then, we represent each ligand as a substructure vector based on a dictionary containing 747 substructures. Finally, we build the FIM by generating the interaction matrix M (representing the fragment interaction network), and the FIM can later be used for predicting unknown ligand-target interactions as well as providing the binding details of the interactions. Results The five-fold cross validation results show that the proposed model can get higher AUC score (92%) than three prevalence algorithms CS-PD (80%), BLM-NII (85%) and RF (85%), demonstrating the remarkable predictive ability of FIM. We also show that the ligand binding sites (local information) overweight the sequence similarities (global information) in ligand-target binding, and introducing too much global information would be harmful to the predictive ability. Moreover, The derived fragment interaction network can provide the chemical insights on the interactions. Conclusions The target and ligand bindings are local events, and the local information dominate the binding ability. Though integrating of the global information can promote the predictive ability, the role is very limited. The fragment interaction network is helpful for understanding the mechanism of the ligand-target interaction.
Collapse
|
48
|
Sawada R, Kotera M, Yamanishi Y. Benchmarking a Wide Range of Chemical Descriptors for Drug-Target Interaction Prediction Using a Chemogenomic Approach. Mol Inform 2014; 33:719-31. [DOI: 10.1002/minf.201400066] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Accepted: 07/23/2014] [Indexed: 01/28/2023]
|
49
|
Cao DS, Zhang LX, Tan GS, Xiang Z, Zeng WB, Xu QS, Chen AF. Computational Prediction of DrugTarget Interactions Using Chemical, Biological, and Network Features. Mol Inform 2014; 33:669-81. [PMID: 27485302 DOI: 10.1002/minf.201400009] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2014] [Accepted: 04/22/2014] [Indexed: 02/02/2023]
Abstract
Drugtarget interactions (DTIs) are central to current drug discovery processes. Efforts have been devoted to the development of methodology for predicting DTIs and drugtarget interaction networks. Most existing methods mainly focus on the application of information about drug or protein structure features. In the present work, we proposed a computational method for DTI prediction by combining the information from chemical, biological and network properties. The method was developed based on a learning algorithm-random forest (RF) combined with integrated features for predicting DTIs. Four classes of drugtarget interaction networks in humans involving enzymes, ion channels, G-protein-coupled receptors (GPCRs) and nuclear receptors, are independently used for establishing predictive models. The RF models gave prediction accuracy of 93.52 %, 94.84 %, 89.68 % and 84.72 % for four pharmaceutically useful datasets, respectively. The prediction ability of our approach is comparative to or even better than that of other DTI prediction methods. These comparative results demonstrated the relevance of the network topology as source of information for predicting DTIs. Further analysis confirmed that among our top ranked predictions of DTIs, several DTIs are supported by databases, while the others represent novel potential DTIs. We believe that our proposed approach can help to limit the search space of DTIs and provide a new way towards repositioning old drugs and identifying targets.
Collapse
Affiliation(s)
- Dong-Sheng Cao
- School of Pharmaceutical Sciences, Central South University, Changsha, 410013, P.R. China.
| | - Liu-Xia Zhang
- The 163rdHospital of The Chinese People's Liberation Army, Changsha 410003, P.R. China
| | - Gui-Shan Tan
- School of Pharmaceutical Sciences, Central South University, Changsha, 410013, P.R. China
| | - Zheng Xiang
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou 325035, P.R. China
| | - Wen-Bin Zeng
- School of Pharmaceutical Sciences, Central South University, Changsha, 410013, P.R. China
| | - Qing-Song Xu
- School of Mathematics and Statistics, Central South University, Changsha 410083, P.R. China
| | - Alex F Chen
- School of Pharmaceutical Sciences, Central South University, Changsha, 410013, P.R. China.
| |
Collapse
|
50
|
Mousavian Z, Masoudi-Nejad A. Drug-target interaction prediction via chemogenomic space: learning-based methods. Expert Opin Drug Metab Toxicol 2014; 10:1273-87. [PMID: 25112457 DOI: 10.1517/17425255.2014.950222] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
INTRODUCTION Identification of the interaction between drugs and target proteins is a crucial task in genomic drug discovery. The in silico prediction is an appropriate alternative for the laborious and costly experimental process of drug-target interaction prediction. Developing a variety of computational methods opens a new direction in analyzing and detecting new drug-target pairs. AREAS COVERED In this review, we will focus on chemogenomic methods which have established a learning framework for predicting drug-target interactions. Learning-based methods are classified into supervised and semi-supervised, and the supervised learning methods are studied as two separate parts including similarity-based methods and feature-based methods. EXPERT OPINION In spite of many improvements for pharmacology applications by learning-based methods, there are many over simplification settings in construction of predictive models that may lead to over-optimistic results on drug-target interaction prediction.
Collapse
Affiliation(s)
- Zaynab Mousavian
- University of Tehran, Institute of Biochemistry and Biophysics, Laboratory of Systems Biology and Bioinformatics (LBB) , Tehran , Iran +98 21 6695 9256 ; +98 21 6640 4680 ;
| | | |
Collapse
|