1
|
Binatlı OC, Gönen M. MOKPE: drug-target interaction prediction via manifold optimization based kernel preserving embedding. BMC Bioinformatics 2023; 24:276. [PMID: 37407927 DOI: 10.1186/s12859-023-05401-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 06/25/2023] [Indexed: 07/07/2023] Open
Abstract
BACKGROUND In many applications of bioinformatics, data stem from distinct heterogeneous sources. One of the well-known examples is the identification of drug-target interactions (DTIs), which is of significant importance in drug discovery. In this paper, we propose a novel framework, manifold optimization based kernel preserving embedding (MOKPE), to efficiently solve the problem of modeling heterogeneous data. Our model projects heterogeneous drug and target data into a unified embedding space by preserving drug-target interactions and drug-drug, target-target similarities simultaneously. RESULTS We performed ten replications of ten-fold cross validation on four different drug-target interaction network data sets for predicting DTIs for previously unseen drugs. The classification evaluation metrics showed better or comparable performance compared to previous similarity-based state-of-the-art methods. We also evaluated MOKPE on predicting unknown DTIs of a given network. Our implementation of the proposed algorithm in R together with the scripts that replicate the reported experiments is publicly available at https://github.com/ocbinatli/mokpe .
Collapse
Affiliation(s)
- Oğuz C Binatlı
- Graduate School of Sciences and Engineering, Koç University, 34450, Istanbul, Turkey
| | - Mehmet Gönen
- Department of Industrial Engineering, College of Engineering, Koç University, 34450, Istanbul, Turkey.
- School of Medicine, Koç University, 34450, Istanbul, Turkey.
| |
Collapse
|
2
|
Dalkıran A, Atakan A, Rifaioğlu AS, Martin MJ, Atalay RÇ, Acar AC, Doğan T, Atalay V. Transfer learning for drug-target interaction prediction. Bioinformatics 2023; 39:i103-i110. [PMID: 37387156 DOI: 10.1093/bioinformatics/btad234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/19/2023] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Utilizing AI-driven approaches for drug-target interaction (DTI) prediction require large volumes of training data which are not available for the majority of target proteins. In this study, we investigate the use of deep transfer learning for the prediction of interactions between drug candidate compounds and understudied target proteins with scarce training data. The idea here is to first train a deep neural network classifier with a generalized source training dataset of large size and then to reuse this pre-trained neural network as an initial configuration for re-training/fine-tuning purposes with a small-sized specialized target training dataset. To explore this idea, we selected six protein families that have critical importance in biomedicine: kinases, G-protein-coupled receptors (GPCRs), ion channels, nuclear receptors, proteases, and transporters. In two independent experiments, the protein families of transporters and nuclear receptors were individually set as the target datasets, while the remaining five families were used as the source datasets. Several size-based target family training datasets were formed in a controlled manner to assess the benefit provided by the transfer learning approach. RESULTS Here, we present a systematic evaluation of our approach by pre-training a feed-forward neural network with source training datasets and applying different modes of transfer learning from the pre-trained source network to a target dataset. The performance of deep transfer learning is evaluated and compared with that of training the same deep neural network from scratch. We found that when the training dataset contains fewer than 100 compounds, transfer learning outperforms the conventional strategy of training the system from scratch, suggesting that transfer learning is advantageous for predicting binders to under-studied targets. AVAILABILITY AND IMPLEMENTATION The source code and datasets are available at https://github.com/cansyl/TransferLearning4DTI. Our web-based service containing the ready-to-use pre-trained models is accessible at https://tl4dti.kansil.org.
Collapse
Affiliation(s)
- Alperen Dalkıran
- Department of Computer Engineering, Middle East Technical University, Ankara 06800, Turkey
- Department of Computer Engineering, Adana Alparslan Türkeş Science and Technology University, Adana 01250, Turkey
| | - Ahmet Atakan
- Department of Computer Engineering, Middle East Technical University, Ankara 06800, Turkey
- Department of Computer Engineering, Erzincan Binali Yıldırım University, Erzincan 24002, Turkey
| | - Ahmet S Rifaioğlu
- Department of Computer Engineering, Iskenderun Technical University, Hatay 31200, Turkey
- Faculty of Medicine, Institute for Computational Biomedicine, Heidelberg University and Heidelberg University Hospital, Heidelberg 69120, Germany
| | - Maria J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, Hinxton CB10 1SD, United Kingdom
| | - Rengül Çetin Atalay
- Faculty of Pulmonary and Critical Care Medicine, the University of Chicago, Chicago, IL, 60637, United States
| | - Aybar C Acar
- Cancer Systems Biology Laboratory (Kansil), Middle East Technical University, Ankara 06800, Turkey
| | - Tunca Doğan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, Hinxton CB10 1SD, United Kingdom
- Department of Computer Engineering, Hacettepe University, Ankara 06800, Turkey
| | - Volkan Atalay
- Department of Computer Engineering, Middle East Technical University, Ankara 06800, Turkey
| |
Collapse
|
3
|
Wu Y, Gao M, Zeng M, Zhang J, Li M. BridgeDPI: A Novel Graph Neural Network for Predicting Drug-Protein Interactions. Bioinformatics 2022; 38:2571-2578. [PMID: 35274672 DOI: 10.1093/bioinformatics/btac155] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 01/20/2022] [Accepted: 03/10/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Exploring drug-protein interactions (DPIs) provides a rapid and precise approach to assist in laboratory experiments for discovering new drugs. Network-based methods usually utilize a drug-protein association network and predict DPIs by the information of its associated proteins or drugs, called "guilt-by-association" principle. However, the "guilt-by-association" principle is not always true because sometimes similar proteins cannot interact with similar drugs. Recently, learning-based methods learn molecule properties underlying DPIs by utilizing existing databases of characterized interactions but neglect the network-level information. RESULTS We propose a novel method, namely BridgeDPI. We devise a class of virtual nodes to bridge the gap between drugs and proteins and construct a learnable drug-protein association network. The network is optimized based on the supervised signals from the downstream task - the DPI prediction. Through information passing on this drug-protein association network, a graph neural network can capture the network-level information among diverse drugs and proteins. By combining the network-level information and the learning-based method, BridgeDPI achieves significant improvement in three real-world DPI datasets. Moreover, the case study further verifies the effectiveness and reliability of BridgeDPI. AVAILABILITY The source code of BridgeDPI can be accessed at https://github.com/SenseTime-Knowledge-Mining/BridgeDPI.
Collapse
Affiliation(s)
- Yifan Wu
- SenseTime Research, Shanghai, 200233, China.,School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Min Gao
- SenseTime Research, Shanghai, 200233, China
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Jie Zhang
- SenseTime Research, Shanghai, 200233, China.,Qing yuan Research Institute, Shanghai Jiao Tong University, Shanghai, China.,Merck Advisory Committee for AI-enabled Health Solution, Shanghai, 200126, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| |
Collapse
|
4
|
Turbo prediction: a new approach for bioactivity prediction. J Comput Aided Mol Des 2022; 36:77-85. [DOI: 10.1007/s10822-021-00440-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 12/17/2021] [Indexed: 12/29/2022]
|
5
|
Antifungal Activity of N-(4-Halobenzyl)amides against Candida spp. and Molecular Modeling Studies. Int J Mol Sci 2021; 23:ijms23010419. [PMID: 35008845 PMCID: PMC8745543 DOI: 10.3390/ijms23010419] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Revised: 12/08/2021] [Accepted: 12/10/2021] [Indexed: 12/28/2022] Open
Abstract
Fungal infections remain a high-incidence worldwide health problem that is aggravated by limited therapeutic options and the emergence of drug-resistant strains. Cinnamic and benzoic acid amides have previously shown bioactivity against different species belonging to the Candida genus. Here, 20 cinnamic and benzoic acid amides were synthesized and tested for inhibition of C. krusei ATCC 14243 and C. parapsilosis ATCC 22019. Five compounds inhibited the Candida strains tested, with compound 16 (MIC = 7.8 µg/mL) producing stronger antifungal activity than fluconazole (MIC = 16 µg/mL) against C. krusei ATCC 14243. It was also tested against eight Candida strains, including five clinical strains resistant to fluconazole, and showed an inhibitory effect against all strains tested (MIC = 85.3–341.3 µg/mL). The MIC value against C. krusei ATCC 6258 was 85.3 mcg/mL, while against C. krusei ATCC 14243, it was 10.9 times smaller. This strain had greater sensitivity to the antifungal action of compound 16. The inhibition of C. krusei ATCC 14243 and C. parapsilosis ATCC 22019 was also achieved by compounds 2, 9, 12, 14 and 15. Computational experiments combining target fishing, molecular docking and molecular dynamics simulations were performed to study the potential mechanism of action of compound 16 against C. krusei. From these, a multi-target mechanism of action is proposed for this compound that involves proteins related to critical cellular processes such as the redox balance, kinases-mediated signaling, protein folding and cell wall synthesis. The modeling results might guide future experiments focusing on the wet-lab investigation of the mechanism of action of this series of compounds, as well as on the optimization of their inhibitory potency.
Collapse
|
6
|
Mathai N, Chen Y, Kirchmair J. Validation strategies for target prediction methods. Brief Bioinform 2021; 21:791-802. [PMID: 31220208 PMCID: PMC7299289 DOI: 10.1093/bib/bbz026] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 01/14/2019] [Accepted: 02/17/2019] [Indexed: 12/11/2022] Open
Abstract
Computational methods for target prediction, based on molecular similarity and network-based approaches, machine learning, docking and others, have evolved as valuable and powerful tools to aid the challenging task of mode of action identification for bioactive small molecules such as drugs and drug-like compounds. Critical to discerning the scope and limitations of a target prediction method is understanding how its performance was evaluated and reported. Ideally, large-scale prospective experiments are conducted to validate the performance of a model; however, this expensive and time-consuming endeavor is often not feasible. Therefore, to estimate the predictive power of a method, statistical validation based on retrospective knowledge is commonly used. There are multiple statistical validation techniques that vary in rigor. In this review we discuss the validation strategies employed, highlighting the usefulness and constraints of the validation schemes and metrics that are employed to measure and describe performance. We address the limitations of measuring only generalized performance, given that the underlying bioactivity and structural data are biased towards certain small-molecule scaffolds and target families, and suggest additional aspects of performance to consider in order to produce more detailed and realistic estimates of predictive power. Finally, we describe the validation strategies that were employed by some of the most thoroughly validated and accessible target prediction methods.
Collapse
Affiliation(s)
- Neann Mathai
- Department of Chemistry, University of Bergen, Bergen, Norway.,Computational Biology Unit (CBU), University of Bergen, Bergen, Norway.,Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany
| | - Ya Chen
- Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany
| | - Johannes Kirchmair
- Department of Chemistry, University of Bergen, Bergen, Norway.,Computational Biology Unit (CBU), University of Bergen, Bergen, Norway.,Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany
| |
Collapse
|
7
|
Hinnerichs T, Hoehndorf R. DTI-Voodoo: machine learning over interaction networks and ontology-based background knowledge predicts drug-target interactions. Bioinformatics 2021; 37:4835-4843. [PMID: 34320178 PMCID: PMC8665763 DOI: 10.1093/bioinformatics/btab548] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 07/14/2021] [Accepted: 07/26/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION In silico drug-target interaction (DTI) prediction is important for drug discovery and drug repurposing. Approaches to predict DTIs can proceed indirectly, top-down, using phenotypic effects of drugs to identify potential drug targets, or they can be direct, bottom-up and use molecular information to directly predict binding affinities. Both approaches can be combined with information about interaction networks. RESULTS We developed DTI-Voodoo as a computational method that combines molecular features and ontology-encoded phenotypic effects of drugs with protein-protein interaction networks, and uses a graph convolutional neural network to predict DTIs. We demonstrate that drug effect features can exploit information in the interaction network whereas molecular features do not. DTI-Voodoo is designed to predict candidate drugs for a given protein; we use this formulation to show that common DTI datasets contain intrinsic biases with major effects on performance evaluation and comparison of DTI prediction methods. Using a modified evaluation scheme, we demonstrate that DTI-Voodoo improves significantly over state of the art DTI prediction methods. AVAILABILITY DTI-Voodoo source code and data necessary to reproduce results are freely available at https://github.com/THinnerichs/DTI-VOODOO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tilman Hinnerichs
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| |
Collapse
|
8
|
Wang J, Wang W, Yan C, Luo J, Zhang G. Predicting Drug-Disease Association Based on Ensemble Strategy. Front Genet 2021; 12:666575. [PMID: 34012464 PMCID: PMC8128144 DOI: 10.3389/fgene.2021.666575] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 03/23/2021] [Indexed: 12/29/2022] Open
Abstract
Drug repositioning is used to find new uses for existing drugs, effectively shortening the drug research and development cycle and reducing costs and risks. A new model of drug repositioning based on ensemble learning is proposed. This work develops a novel computational drug repositioning approach called CMAF to discover potential drug-disease associations. First, for new drugs and diseases or unknown drug-disease pairs, based on their known neighbor information, an association probability can be obtained by implementing the weighted K nearest known neighbors (WKNKN) method and improving the drug-disease association information. Then, a new drug similarity network and new disease similarity network can be constructed. Three prediction models are applied and ensembled to enable the final association of drug-disease pairs based on improved drug-disease association information and the constructed similarity network. The experimental results demonstrate that the developed approach outperforms recent state-of-the-art prediction models. Case studies further confirm the predictive ability of the proposed method. Our proposed method can effectively improve the prediction results.
Collapse
Affiliation(s)
- Jianlin Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Wenxiu Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Ge Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| |
Collapse
|
9
|
Wu G, Yang M, Li Y, Wang J. De Novo Prediction of Drug-Target Interactions Using Laplacian Regularized Schatten p-Norm Minimization. J Comput Biol 2021; 28:660-673. [PMID: 33481664 DOI: 10.1089/cmb.2020.0538] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In pharmaceutical sciences, a crucial step of the drug discovery is the identification of drug-target interactions (DTIs). However, only a small portion of the DTIs have been experimentally validated. Moreover, it is an extremely laborious, expensive, and time-consuming procedure to capture new interactions between drugs and targets through traditional biochemical experiments. Therefore, designing computational methods for predicting potential interactions to guide the experimental verification is of practical significance, especially for de novo situation. In this article, we propose a new algorithm, namely Laplacian regularized Schatten p-norm minimization (LRSpNM), to predict potential target proteins for novel drugs and potential drugs for new targets where there are no known interactions. Specifically, we first take advantage of the drug and target similarity information to dynamically prefill the partial unknown interactions. Then based on the assumption that the interaction matrix is low-rank, we use Schatten p-norm minimization model combined with Laplacian regularization terms to improve prediction performance in the new drug/target cases. Finally, we numerically solve the LRSpNM model by an efficient alternating direction method of multipliers algorithm. We evaluate LRSpNM on five data sets and an extensive set of numerical experiments show that LRSpNM achieves better and more robust performance than five state-of-the-art DTIs prediction algorithms. In addition, we conduct two case studies for new drug and new target prediction, which illustrates that LRSpNM can successfully predict most of the experimental validated DTIs.
Collapse
Affiliation(s)
- Gaoyan Wu
- The Hunan Provincial Key Lab of Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Mengyun Yang
- School of Science, Shaoyang University, Shaoyang, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, Virginia, USA
| | - Jianxin Wang
- The Hunan Provincial Key Lab of Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
10
|
Wang K, Hu G, Wu Z, Su H, Yang J, Kurgan L. Comprehensive Survey and Comparative Assessment of RNA-Binding Residue Predictions with Analysis by RNA Type. Int J Mol Sci 2020; 21:E6879. [PMID: 32961749 PMCID: PMC7554811 DOI: 10.3390/ijms21186879] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Revised: 09/15/2020] [Accepted: 09/17/2020] [Indexed: 02/07/2023] Open
Abstract
With close to 30 sequence-based predictors of RNA-binding residues (RBRs), this comparative survey aims to help with understanding and selection of the appropriate tools. We discuss past reviews on this topic, survey a comprehensive collection of predictors, and comparatively assess six representative methods. We provide a novel and well-designed benchmark dataset and we are the first to report and compare protein-level and datasets-level results, and to contextualize performance to specific types of RNAs. The methods considered here are well-cited and rely on machine learning algorithms on occasion combined with homology-based prediction. Empirical tests reveal that they provide relatively accurate predictions. Virtually all methods perform well for the proteins that interact with rRNAs, some generate accurate predictions for mRNAs, snRNA, SRP and IRES, while proteins that bind tRNAs are predicted poorly. Moreover, except for DRNApred, they confuse DNA and RNA-binding residues. None of the six methods consistently outperforms the others when tested on individual proteins. This variable and complementary protein-level performance suggests that users should not rely on applying just the single best dataset-level predictor. We recommend that future work should focus on the development of approaches that facilitate protein-level selection of accurate predictors and the consensus-based prediction of RBRs.
Collapse
Affiliation(s)
- Kui Wang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China; (K.W.); (Z.W.); (H.S.); (J.Y.)
| | - Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China;
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China; (K.W.); (Z.W.); (H.S.); (J.Y.)
| | - Hong Su
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China; (K.W.); (Z.W.); (H.S.); (J.Y.)
| | - Jianyi Yang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China; (K.W.); (Z.W.); (H.S.); (J.Y.)
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
11
|
Mathai N, Kirchmair J. Similarity-Based Methods and Machine Learning Approaches for Target Prediction in Early Drug Discovery: Performance and Scope. Int J Mol Sci 2020; 21:ijms21103585. [PMID: 32438666 PMCID: PMC7279241 DOI: 10.3390/ijms21103585] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 05/13/2020] [Accepted: 05/16/2020] [Indexed: 12/20/2022] Open
Abstract
Computational methods for predicting the macromolecular targets of drugs and drug-like compounds have evolved as a key technology in drug discovery. However, the established validation protocols leave several key questions regarding the performance and scope of methods unaddressed. For example, prediction success rates are commonly reported as averages over all compounds of a test set and do not consider the structural relationship between the individual test compounds and the training instances. In order to obtain a better understanding of the value of ligand-based methods for target prediction, we benchmarked a similarity-based method and a random forest based machine learning approach (both employing 2D molecular fingerprints) under three testing scenarios: a standard testing scenario with external data, a standard time-split scenario, and a scenario that is designed to most closely resemble real-world conditions. In addition, we deconvoluted the results based on the distances of the individual test molecules from the training data. We found that, surprisingly, the similarity-based approach generally outperformed the machine learning approach in all testing scenarios, even in cases where queries were structurally clearly distinct from the instances in the training (or reference) data, and despite a much higher coverage of the known target space.
Collapse
Affiliation(s)
- Neann Mathai
- Department of Chemistry and Computational Biology Unit (CBU), University of Bergen, N-5020 Bergen, Norway;
| | - Johannes Kirchmair
- Department of Chemistry and Computational Biology Unit (CBU), University of Bergen, N-5020 Bergen, Norway;
- Department of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria
- Correspondence:
| |
Collapse
|
12
|
Chen Y, Mathai N, Kirchmair J. Scope of 3D Shape-Based Approaches in Predicting the Macromolecular Targets of Structurally Complex Small Molecules Including Natural Products and Macrocyclic Ligands. J Chem Inf Model 2020; 60:2858-2875. [PMID: 32368908 PMCID: PMC7312400 DOI: 10.1021/acs.jcim.0c00161] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
![]()
A plethora
of similarity-based, network-based, machine learning,
docking and hybrid approaches for predicting the macromolecular targets
of small molecules are available today and recognized as valuable
tools for providing guidance in early drug discovery. With the increasing
maturity of target prediction methods, researchers have started to
explore ways to expand their scope to more challenging molecules such
as structurally complex natural products and macrocyclic small molecules.
In this work, we systematically explore the capacity of an alignment-based
approach to identify the targets of structurally complex small molecules
(including large and flexible natural products and macrocyclic compounds)
based on the similarity of their 3D molecular shape to noncomplex
molecules (i.e., more conventional, “drug-like”, synthetic
compounds). For this analysis, query sets of 10 representative, structurally
complex molecules were compiled for each of the 28 pharmaceutically
relevant proteins. Subsequently, ROCS, a leading shape-based screening
engine, was utilized to generate rank-ordered lists of the potential
targets of the 28 × 10 queries according to the similarity of
their 3D molecular shapes with those of compounds from a knowledge
base of 272 640 noncomplex small molecules active on a total of 3642
different proteins. Four of the scores implemented in ROCS were explored
for target ranking, with the TanimotoCombo score consistently outperforming
all others. The score successfully recovered the targets of 30% and
41% of the 280 queries among the top-5 and top-20 positions, respectively.
For 24 out of the 28 investigated targets (86%), the method correctly
assigned the first rank (out of 3642) to the target of interest for
at least one of the 10 queries. The shape-based target prediction
approach showed remarkable robustness, with good success rates obtained
even for compounds that are clearly distinct from any of the ligands
present in the knowledge base. However, complex natural products and
macrocyclic compounds proved to be challenging even with this approach,
although cases of complete failure were recorded only for a small
number of targets.
Collapse
Affiliation(s)
- Ya Chen
- Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, 20146 Hamburg, Germany
| | - Neann Mathai
- Department of Chemistry and Computational Biology Unit (CBU), University of Bergen, N-5020 Bergen, Norway
| | - Johannes Kirchmair
- Center for Bioinformatics (ZBH), Department of Computer Science, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, 20146 Hamburg, Germany.,Department of Chemistry and Computational Biology Unit (CBU), University of Bergen, N-5020 Bergen, Norway.,Department of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria
| |
Collapse
|
13
|
Oldfield CJ, Fan X, Wang C, Dunker AK, Kurgan L. Computational Prediction of Intrinsic Disorder in Protein Sequences with the disCoP Meta-predictor. Methods Mol Biol 2020; 2141:21-35. [PMID: 32696351 DOI: 10.1007/978-1-0716-0524-0_2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Intrinsically disordered proteins are either entirely disordered or contain disordered regions in their native state. These proteins and regions function without the prerequisite of a stable structure and were found to be abundant across all kingdoms of life. Experimental annotation of disorder lags behind the rapidly growing number of sequenced proteins, motivating the development of computational methods that predict disorder in protein sequences. DisCoP is a user-friendly webserver that provides accurate sequence-based prediction of protein disorder. It relies on meta-architecture in which the outputs generated by multiple disorder predictors are combined together to improve predictive performance. The architecture of disCoP is presented, and its accuracy relative to several other disorder predictors is briefly discussed. We describe usage of the web interface and explain how to access and read results generated by this computational tool. We also provide an example of prediction results and interpretation. The disCoP's webserver is publicly available at http://biomine.cs.vcu.edu/servers/disCoP/ .
Collapse
Affiliation(s)
| | - Xiao Fan
- Department of Pediatrics, Columbia University, New York, NY, USA
| | - Chen Wang
- Department of Medicine, Columbia University, New York, NY, USA
| | - A Keith Dunker
- Department of Biochemistry and Molecular Biology, Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
14
|
Abstract
Intrinsically disordered regions (IDRs) are estimated to be highly abundant in nature. While only several thousand proteins are annotated with experimentally derived IDRs, computational methods can be used to predict IDRs for the millions of currently uncharacterized protein chains. Several dozen disorder predictors were developed over the last few decades. While some of these methods provide accurate predictions, unavoidably they also make some mistakes. Consequently, one of the challenges facing users of these methods is how to decide which predictions can be trusted and which are likely incorrect. This practical problem can be solved using quality assessment (QA) scores that predict correctness of the underlying (disorder) predictions at a residue level. We motivate and describe a first-of-its-kind toolbox of QA methods, QUARTER (QUality Assessment for pRotein inTrinsic disordEr pRedictions), which provides the scores for a diverse set of ten disorder predictors. QUARTER is available to the end users as a free and convenient webserver at http://biomine.cs.vcu.edu/servers/QUARTER/ . We briefly describe the predictive architecture of QUARTER and provide detailed instructions on how to use the webserver. We also explain how to interpret results produced by QUARTER with the help of a case study.
Collapse
|
15
|
Ghadermarzi S, Li X, Li M, Kurgan L. Sequence-Derived Markers of Drug Targets and Potentially Druggable Human Proteins. Front Genet 2019; 10:1075. [PMID: 31803227 PMCID: PMC6872670 DOI: 10.3389/fgene.2019.01075] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Accepted: 10/09/2019] [Indexed: 12/16/2022] Open
Abstract
Recent research shows that majority of the druggable human proteome is yet to be annotated and explored. Accurate identification of these unexplored druggable proteins would facilitate development, screening, repurposing, and repositioning of drugs, as well as prediction of new drug–protein interactions. We contrast the current drug targets against the datasets of non-druggable and possibly druggable proteins to formulate markers that could be used to identify druggable proteins. We focus on the markers that can be extracted from protein sequences or names/identifiers to ensure that they can be applied across the entire human proteome. These markers quantify key features covered in the past works (topological features of PPIs, cellular functions, and subcellular locations) and several novel factors (intrinsic disorder, residue-level conservation, alternative splicing isoforms, domains, and sequence-derived solvent accessibility). We find that the possibly druggable proteins have significantly higher abundance of alternative splicing isoforms, relatively large number of domains, higher degree of centrality in the protein-protein interaction networks, and lower numbers of conserved and surface residues, when compared with the non-druggable proteins. We show that the current drug targets and possibly druggable proteins share involvement in the catalytic and signaling functions. However, unlike the drug targets, the possibly druggable proteins participate in the metabolic and biosynthesis processes, are enriched in the intrinsic disorder, interact with proteins and nucleic acids, and are localized across the cell. To sum up, we formulate several markers that can help with finding novel druggable human proteins and provide interesting insights into the cellular functions and subcellular locations of the current drug targets and potentially druggable proteins.
Collapse
Affiliation(s)
- Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
16
|
Tanoli Z, Alam Z, Ianevski A, Wennerberg K, Vähä-Koskela M, Aittokallio T. Interactive visual analysis of drug–target interaction networks using Drug Target Profiler, with applications to precision medicine and drug repurposing. Brief Bioinform 2018; 21:211-220. [PMID: 30566623 DOI: 10.1093/bib/bby119] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 11/01/2018] [Accepted: 11/19/2018] [Indexed: 12/13/2022] Open
Abstract
Knowledge of the full target space of drugs (or drug-like compounds) provides important insights into the potential therapeutic use of the agents to modulate or avoid their various on- and off-targets in drug discovery and precision medicine. However, there is a lack of consolidated databases and associated data exploration tools that allow for systematic profiling of drug target-binding potencies of both approved and investigational agents using a network-centric approach. We recently initiated a community-driven platform, Drug Target Commons (DTC), which is an open-data crowdsourcing platform designed to improve the management, reproducibility and extended use of compound-target bioactivity data for drug discovery and repurposing, as well as target identification applications. In this work, we demonstrate an integrated use of the rich bioactivity data from DTC and related drug databases using Drug Target Profiler (DTP), an open-source software and web tool for interactive exploration of drug-target interaction networks. DTP was designed for network-centric modeling of mode-of-action of multi-targeting anticancer compounds, especially for precision oncology applications. DTP enables users to construct an interaction network based on integrated bioactivity data across selected chemical compounds and their protein targets, further customizable using various visualization and filtering options, as well as cross-links to several drug and protein databases to provide comprehensive information of the network nodes and interactions. We demonstrate here the operation of the DTP tool and its unique features by several use cases related to both drug discovery and drug repurposing applications, using examples of anticancer drugs with shared target profiles. DTP is freely accessible at http://drugtargetprofiler.fimm.fi/.
Collapse
Affiliation(s)
- Ziaurrehman Tanoli
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
| | - Zaid Alam
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
| | - Aleksandr Ianevski
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
- Helsinki Institute for Information Technology, Aalto University, Espoo, Finland
| | | | - Markus Vähä-Koskela
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
- Helsinki Institute for Information Technology, Aalto University, Espoo, Finland
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
| |
Collapse
|