1
|
Moumbock AF, Li J, Mishra P, Gao M, Günther S. Current computational methods for predicting protein interactions of natural products. Comput Struct Biotechnol J 2019; 17:1367-1376. [PMID: 31762960 PMCID: PMC6861622 DOI: 10.1016/j.csbj.2019.08.008] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 08/09/2019] [Accepted: 08/23/2019] [Indexed: 01/08/2023] Open
Abstract
Natural products (NPs) are an indispensable source of drugs and they have a better coverage of the pharmacological space than synthetic compounds, owing to their high structural diversity. The prediction of their interaction profiles with druggable protein targets remains a major challenge in modern drug discovery. Experimental (off-)target predictions of NPs are cost- and time-consuming, whereas computational methods, on the other hand, are much faster and cheaper. As a result, computational predictions are preferentially used in the first instance for NP profiling, prior to experimental validations. This review covers recent advances in computational approaches which have been developed to aid the annotation of unknown drug-target interactions (DTIs), by focusing on three broad classes, namely: ligand-based, target-based, and target-ligand-based (hybrid) approaches. Computational DTI prediction methods have the potential to significantly advance the discovery and development of novel selective drugs exhibiting minimal side effects. We highlight some inherent caveats of these methods which must be overcome to enable them to realize their full potential, and a future outlook is given.
Collapse
Affiliation(s)
| | | | | | | | - Stefan Günther
- Institute of Pharmaceutical Sciences, Research Group Pharmaceutical Bioinformatics, Albert-Ludwigs-Universität Freiburg, Germany
| |
Collapse
|
2
|
Tresadern G, Trabanco AA, Pérez-Benito L, Overington JP, van Vlijmen HWT, van Westen GJP. Identification of Allosteric Modulators of Metabotropic Glutamate 7 Receptor Using Proteochemometric Modeling. J Chem Inf Model 2017; 57:2976-2985. [PMID: 29172488 PMCID: PMC5755953 DOI: 10.1021/acs.jcim.7b00338] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Indexed: 01/07/2023]
Abstract
Proteochemometric modeling (PCM) is a computational approach that can be considered an extension of quantitative structure-activity relationship (QSAR) modeling, where a single model incorporates information for a family of targets and all the associated ligands instead of modeling activity versus one target. This is especially useful for situations where bioactivity data exists for similar proteins but is scarce for the protein of interest. Here we demonstrate the application of PCM to identify allosteric modulators of metabotropic glutamate (mGlu) receptors. Given our long-running interest in modulating mGlu receptor function we compiled a matrix of compound-target bioactivity data. Some members of the mGlu family are well explored both internally and in the public domain, while there are much fewer examples of ligands for other targets such as the mGlu7 receptor. Using a PCM approach mGlu7 receptor hits were found. In comparison to conventional single target modeling the identified hits were more diverse, had a better confirmation rate, and provide starting points for further exploration. We conclude that the robust structure-activity relationship from well explored target family members translated to better quality hits for PCM compared to virtual screening (VS) based on a single target.
Collapse
Affiliation(s)
- Gary Tresadern
- Computational
Chemistry and Neuroscience Medicinal Chemistry, Janssen
Research & Development, Janssen-Cilag
S.A., Jarama 75A, 45007 Toledo, Spain
| | - Andres A. Trabanco
- Computational
Chemistry and Neuroscience Medicinal Chemistry, Janssen
Research & Development, Janssen-Cilag
S.A., Jarama 75A, 45007 Toledo, Spain
| | - Laura Pérez-Benito
- Computational
Chemistry and Neuroscience Medicinal Chemistry, Janssen
Research & Development, Janssen-Cilag
S.A., Jarama 75A, 45007 Toledo, Spain
| | - John P. Overington
- ChEMBL Group, EMBL-EBI,
Wellcome Trust Genome Campus, CB10 1SD Hinxton, United Kingdom
| | - Herman W. T. van Vlijmen
- Computational
Chemistry, Janssen Research & Development, Turnhoutseweg 30, B-2340 Beerse, Belgium
| | | |
Collapse
|
3
|
Cortés-Ciriano I, Ain QU, Subramanian V, Lenselink EB, Méndez-Lucio O, IJzerman AP, Wohlfahrt G, Prusis P, Malliavin TE, van Westen GJP, Bender A. Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects. MEDCHEMCOMM 2015. [DOI: 10.1039/c4md00216d] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Proteochemometric (PCM) modelling is a computational method to model the bioactivity of multiple ligands against multiple related protein targets simultaneously.
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Unité de Bioinformatique Structurale
- Institut Pasteur and CNRS UMR 3825
- Structural Biology and Chemistry Department
- 75 724 Paris
- France
| | - Qurrat Ul Ain
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| | | | - Eelke B. Lenselink
- Division of Medicinal Chemistry
- Leiden Academic Centre for Drug Research
- Leiden
- The Netherlands
| | - Oscar Méndez-Lucio
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| | - Adriaan P. IJzerman
- Division of Medicinal Chemistry
- Leiden Academic Centre for Drug Research
- Leiden
- The Netherlands
| | - Gerd Wohlfahrt
- Computer-Aided Drug Design
- Orion Pharma
- FIN-02101 Espoo
- Finland
| | - Peteris Prusis
- Computer-Aided Drug Design
- Orion Pharma
- FIN-02101 Espoo
- Finland
| | - Thérèse E. Malliavin
- Unité de Bioinformatique Structurale
- Institut Pasteur and CNRS UMR 3825
- Structural Biology and Chemistry Department
- 75 724 Paris
- France
| | - Gerard J. P. van Westen
- European Molecular Biology Laboratory
- European Bioinformatics Institute
- Wellcome Trust Genome Campus
- Hinxton
- UK
| | - Andreas Bender
- Unilever Centre for Molecular Informatics
- Department of Chemistry
- CB2 1EW Cambridge
- UK
| |
Collapse
|
4
|
Cortes-Ciriano I, van Westen GJ, Lenselink EB, Murrell DS, Bender A, Malliavin T. Proteochemometric modeling in a Bayesian framework. J Cheminform 2014; 6:35. [PMID: 25045403 PMCID: PMC4083135 DOI: 10.1186/1758-2946-6-35] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Accepted: 06/18/2014] [Indexed: 11/10/2022] Open
Abstract
Proteochemometrics (PCM) is an approach for bioactivity predictive modeling which models the relationship between protein and chemical information. Gaussian Processes (GP), based on Bayesian inference, provide the most objective estimation of the uncertainty of the predictions, thus permitting the evaluation of the applicability domain (AD) of the model. Furthermore, the experimental error on bioactivity measurements can be used as input for this probabilistic model. In this study, we apply GP implemented with a panel of kernels on three various (and multispecies) PCM datasets. The first dataset consisted of information from 8 human and rat adenosine receptors with 10,999 small molecule ligands and their binding affinity. The second consisted of the catalytic activity of four dengue virus NS3 proteases on 56 small peptides. Finally, we have gathered bioactivity information of small molecule ligands on 91 aminergic GPCRs from 9 different species, leading to a dataset of 24,593 datapoints with a matrix completeness of only 2.43%. GP models trained on these datasets are statistically sound, at the same level of statistical significance as Support Vector Machines (SVM), with R02 values on the external dataset ranging from 0.68 to 0.92, and RMSEP values close to the experimental error. Furthermore, the best GP models obtained with the normalized polynomial and radial kernels provide intervals of confidence for the predictions in agreement with the cumulative Gaussian distribution. GP models were also interpreted on the basis of individual targets and of ligand descriptors. In the dengue dataset, the model interpretation in terms of the amino-acid positions in the tetra-peptide ligands gave biologically meaningful results.
Collapse
Affiliation(s)
- Isidro Cortes-Ciriano
- Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3825; Département de Biologie Structurale et Chimie
| | - Gerard Jp van Westen
- ChEMBL Group, European Molecular Biology Laboratory European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Hinxton, Cambridge, UK
| | - Eelke Bart Lenselink
- Division of Medicinal Chemistry, Leiden Academic Center for Drug Research, Leiden, The Netherlands
| | - Daniel S Murrell
- Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Andreas Bender
- Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Thérèse Malliavin
- Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3825; Département de Biologie Structurale et Chimie
| |
Collapse
|
5
|
Kruger FA, Rostom R, Overington JP. Mapping small molecule binding data to structural domains. BMC Bioinformatics 2012; 13 Suppl 17:S11. [PMID: 23282026 PMCID: PMC3521243 DOI: 10.1186/1471-2105-13-s17-s11] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Large-scale bioactivity/SAR Open Data has recently become available, and this has allowed new analyses and approaches to be developed to help address the productivity and translational gaps of current drug discovery. One of the current limitations of these data is the relative sparsity of reported interactions per protein target, and complexities in establishing clear relationships between bioactivity and targets using bioinformatics tools. We detail in this paper the indexing of targets by the structural domains that bind (or are likely to bind) the ligand within a full-length protein. Specifically, we present a simple heuristic to map small molecule binding to Pfam domains. This profiling can be applied to all proteins within a genome to give some indications of the potential pharmacological modulation and regulation of all proteins. RESULTS In this implementation of our heuristic, ligand binding to protein targets from the ChEMBL database was mapped to structural domains as defined by profiles contained within the Pfam-A database. Our mapping suggests that the majority of assay targets within the current version of the ChEMBL database bind ligands through a small number of highly prevalent domains, and conversely the majority of Pfam domains sampled by our data play no currently established role in ligand binding. Validation studies, carried out firstly against Uniprot entries with expert binding-site annotation and secondly against entries in the wwPDB repository of crystallographic protein structures, demonstrate that our simple heuristic maps ligand binding to the correct domain in about 90 percent of all assessed cases. Using the mappings obtained with our heuristic, we have assembled ligand sets associated with each Pfam domain. CONCLUSIONS Small molecule binding has been mapped to Pfam-A domains of protein targets in the ChEMBL bioactivity database. The result of this mapping is an enriched annotation of small molecule bioactivity data and a grouping of activity classes following the Pfam-A specifications of protein domains. This is valuable for data-focused approaches in drug discovery, for example when extrapolating potential targets of a small molecule with known activity against one or few targets, or in the assessment of a potential target for drug discovery or screening studies.
Collapse
Affiliation(s)
- Felix A Kruger
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | | | | |
Collapse
|
6
|
Li Q, Cheng T, Wang Y, Bryant SH. Characterizing protein domain associations by Small-molecule ligand binding. JOURNAL OF PROTEOME SCIENCE AND COMPUTATIONAL BIOLOGY 2012; 1:6. [PMID: 23745168 PMCID: PMC3671605 DOI: 10.7243/2050-2273-1-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
BACKGROUND Protein domains are evolutionarily conserved building blocks for protein structure and function, which are conventionally identified based on protein sequence or structure similarity. Small molecule binding domains are of great importance for the recognition of small molecules in biological systems and drug development. Many small molecules, including drugs, have been increasingly identified to bind to multiple targets, leading to promiscuous interactions with protein domains. Thus, a large scale characterization of the protein domains and their associations with respect to small-molecule binding is of particular interest to system biology research, drug target identification, as well as drug repurposing. METHODS We compiled a collection of 13,822 physical interactions of small molecules and protein domains derived from the Protein Data Bank (PDB) structures. Based on the chemical similarity of these small molecules, we characterized pairwise associations of the protein domains and further investigated their global associations from a network point of view. RESULTS We found that protein domains, despite lack of similarity in sequence and structure, were comprehensively associated through binding the same or similar small-molecule ligands. Moreover, we identified modules in the domain network that consisted of closely related protein domains by sharing similar biochemical mechanisms, being involved in relevant biological pathways, or being regulated by the same cognate cofactors. CONCLUSIONS A novel protein domain relationship was identified in the context of small-molecule binding, which is complementary to those identified by traditional sequence-based or structure-based approaches. The protein domain network constructed in the present study provides a novel perspective for chemogenomic study and network pharmacology, as well as target identification for drug repurposing.
Collapse
Affiliation(s)
- Qingliang Li
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Tiejun Cheng
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Yanli Wang
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Stephen H. Bryant
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
7
|
Lusher SJ, McGuire R, Azevedo R, Boiten JW, van Schaik RC, de Vlieg J. A molecular informatics view on best practice in multi-parameter compound optimization. Drug Discov Today 2011; 16:555-68. [DOI: 10.1016/j.drudis.2011.05.005] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2010] [Revised: 02/25/2011] [Accepted: 05/06/2011] [Indexed: 01/30/2023]
|
8
|
van Westen GJP, Wegner JK, IJzerman AP, van Vlijmen HWT, Bender A. Proteochemometric modeling as a tool to design selective compounds and for extrapolating to novel targets. MEDCHEMCOMM 2011. [DOI: 10.1039/c0md00165a] [Citation(s) in RCA: 123] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Proteochemometric modeling is founded on the principles of QSAR but is able to benefit from additional information in model training due to the inclusion of target information.
Collapse
Affiliation(s)
- Gerard J. P. van Westen
- Division of Medicinal Chemistry
- Leiden/Amsterdam Center for Drug Research
- Leiden
- The Netherlands
| | | | - Adriaan P. IJzerman
- Division of Medicinal Chemistry
- Leiden/Amsterdam Center for Drug Research
- Leiden
- The Netherlands
| | - Herman W. T. van Vlijmen
- Division of Medicinal Chemistry
- Leiden/Amsterdam Center for Drug Research
- Leiden
- The Netherlands
- Tibotec BVBA
| | - A. Bender
- Division of Medicinal Chemistry
- Leiden/Amsterdam Center for Drug Research
- Leiden
- The Netherlands
- Unilever Centre for Molecular Science Informatics
| |
Collapse
|
9
|
van der Horst E, Peironcely JE, Ijzerman AP, Beukers MW, Lane JR, van Vlijmen HWT, Emmerich MTM, Okuno Y, Bender A. A novel chemogenomics analysis of G protein-coupled receptors (GPCRs) and their ligands: a potential strategy for receptor de-orphanization. BMC Bioinformatics 2010; 11:316. [PMID: 20537162 PMCID: PMC2897831 DOI: 10.1186/1471-2105-11-316] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2010] [Accepted: 06/10/2010] [Indexed: 02/03/2023] Open
Abstract
Background G protein-coupled receptors (GPCRs) represent a family of well-characterized drug targets with significant therapeutic value. Phylogenetic classifications may help to understand the characteristics of individual GPCRs and their subtypes. Previous phylogenetic classifications were all based on the sequences of receptors, adding only minor information about the ligand binding properties of the receptors. In this work, we compare a sequence-based classification of receptors to a ligand-based classification of the same group of receptors, and evaluate the potential to use sequence relatedness as a predictor for ligand interactions thus aiding the quest for ligands of orphan receptors. Results We present a classification of GPCRs that is purely based on their ligands, complementing sequence-based phylogenetic classifications of these receptors. Targets were hierarchically classified into phylogenetic trees, for both sequence space and ligand (substructure) space. The overall organization of the sequence-based tree and substructure-based tree was similar; in particular, the adenosine receptors cluster together as well as most peptide receptor subtypes (e.g. opioid, somatostatin) and adrenoceptor subtypes. In ligand space, the prostanoid and cannabinoid receptors are more distant from the other targets, whereas the tachykinin receptors, the oxytocin receptor, and serotonin receptors are closer to the other targets, which is indicative for ligand promiscuity. In 93% of the receptors studied, de-orphanization of a simulated orphan receptor using the ligands of related receptors performed better than random (AUC > 0.5) and for 35% of receptors de-orphanization performance was good (AUC > 0.7). Conclusions We constructed a phylogenetic classification of GPCRs that is solely based on the ligands of these receptors. The similarities and differences with traditional sequence-based classifications were investigated: our ligand-based classification uncovers relationships among GPCRs that are not apparent from the sequence-based classification. This will shed light on potential cross-reactivity of GPCR ligands and will aid the design of new ligands with the desired activity profiles. In addition, we linked the ligand-based classification with a ligand-focused sequence-based classification described in literature and proved the potential of this method for de-orphanization of GPCRs.
Collapse
Affiliation(s)
- Eelke van der Horst
- Division of Medicinal Chemistry, Leiden/Amsterdam Center for Drug Research, Leiden University, 2333CC, The Netherlands.
| | | | | | | | | | | | | | | | | |
Collapse
|