1
|
Huang T, Mi H, Lin CY, Zhao L, Zhong LLD, Liu FB, Zhang G, Lu AP, Bian ZX. MOST: most-similar ligand based approach to target prediction. BMC Bioinformatics 2017; 18:165. [PMID: 28284192 PMCID: PMC5346209 DOI: 10.1186/s12859-017-1586-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Accepted: 03/04/2017] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Many computational approaches have been used for target prediction, including machine learning, reverse docking, bioactivity spectra analysis, and chemical similarity searching. Recent studies have suggested that chemical similarity searching may be driven by the most-similar ligand. However, the extent of bioactivity of most-similar ligands has been oversimplified or even neglected in these studies, and this has impaired the prediction power. RESULTS Here we propose the MOst-Similar ligand-based Target inference approach, namely MOST, which uses fingerprint similarity and explicit bioactivity of the most-similar ligands to predict targets of the query compound. Performance of MOST was evaluated by using combinations of different fingerprint schemes, machine learning methods, and bioactivity representations. In sevenfold cross-validation with a benchmark Ki dataset from CHEMBL release 19 containing 61,937 bioactivity data of 173 human targets, MOST achieved high average prediction accuracy (0.95 for pKi ≥ 5, and 0.87 for pKi ≥ 6). Morgan fingerprint was shown to be slightly better than FP2. Logistic Regression and Random Forest methods performed better than Naïve Bayes. In a temporal validation, the Ki dataset from CHEMBL19 were used to train models and predict the bioactivity of newly deposited ligands in CHEMBL20. MOST also performed well with high accuracy (0.90 for pKi ≥ 5, and 0.76 for pKi ≥ 6), when Logistic Regression and Morgan fingerprint were employed. Furthermore, the p values associated with explicit bioactivity were found be a robust index for removing false positive predictions. Implicit bioactivity did not offer this capability. Finally, p values generated with Logistic Regression, Morgan fingerprint and explicit activity were integrated with a false discovery rate (FDR) control procedure to reduce false positives in multiple-target prediction scenario, and the success of this strategy it was demonstrated with a case of fluanisone. In the case of aloe-emodin's laxative effect, MOST predicted that acetylcholinesterase was the mechanism-of-action target; in vivo studies validated this prediction. CONCLUSIONS Using the MOST approach can result in highly accurate and robust target prediction. Integrated with a FDR control procedure, MOST provides a reliable framework for multiple-target inference. It has prospective applications in drug repurposing and mechanism-of-action target prediction.
Collapse
Affiliation(s)
- Tao Huang
- Lab of Brain and Gut Research, School of Chinese Medicine, Hong Kong Baptist University, 7 Baptist University Road, Hong Kong, People's Republic of China
| | - Hong Mi
- Lab of Brain and Gut Research, School of Chinese Medicine, Hong Kong Baptist University, 7 Baptist University Road, Hong Kong, People's Republic of China.,Department of Gastroenterology, the First Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, 510405, People's Republic of China
| | - Cheng-Yuan Lin
- Lab of Brain and Gut Research, School of Chinese Medicine, Hong Kong Baptist University, 7 Baptist University Road, Hong Kong, People's Republic of China.,YMU-HKBU Joint Laboratory of Traditional Natural Medicine, Yunnan Minzu University, Kunming, 650500, People's Republic of China
| | - Ling Zhao
- Lab of Brain and Gut Research, School of Chinese Medicine, Hong Kong Baptist University, 7 Baptist University Road, Hong Kong, People's Republic of China
| | - Linda L D Zhong
- Lab of Brain and Gut Research, School of Chinese Medicine, Hong Kong Baptist University, 7 Baptist University Road, Hong Kong, People's Republic of China.,Hong Kong Chinese Medicine Clinical Study Centre, Hong Kong Baptist University, 7 Baptist University Road, Hong Kong, People's Republic of China
| | - Feng-Bin Liu
- Department of Gastroenterology, the First Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, 510405, People's Republic of China
| | - Ge Zhang
- Lab of Brain and Gut Research, School of Chinese Medicine, Hong Kong Baptist University, 7 Baptist University Road, Hong Kong, People's Republic of China
| | - Ai-Ping Lu
- Lab of Brain and Gut Research, School of Chinese Medicine, Hong Kong Baptist University, 7 Baptist University Road, Hong Kong, People's Republic of China.,Hong Kong Chinese Medicine Clinical Study Centre, Hong Kong Baptist University, 7 Baptist University Road, Hong Kong, People's Republic of China
| | - Zhao-Xiang Bian
- Lab of Brain and Gut Research, School of Chinese Medicine, Hong Kong Baptist University, 7 Baptist University Road, Hong Kong, People's Republic of China. .,Hong Kong Chinese Medicine Clinical Study Centre, Hong Kong Baptist University, 7 Baptist University Road, Hong Kong, People's Republic of China.
| | | |
Collapse
|
2
|
Clark AM, Dole K, Ekins S. Open Source Bayesian Models. 3. Composite Models for Prediction of Binned Responses. J Chem Inf Model 2016; 56:275-85. [PMID: 26750305 PMCID: PMC4764945 DOI: 10.1021/acs.jcim.5b00555] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
![]()
Bayesian models constructed from
structure-derived fingerprints
have been a popular and useful method for drug discovery research
when applied to bioactivity measurements that can be effectively classified
as active or inactive. The results can be used to rank candidate structures
according to their probability of activity, and this ranking benefits
from the high degree of interpretability when structure-based fingerprints
are used, making the results chemically intuitive. Besides selecting
an activity threshold, building a Bayesian model is fast and requires
few or no parameters or user intervention. The method also does not
suffer from such acute overtraining problems as quantitative structure–activity
relationships or quantitative structure–property relationships
(QSAR/QSPR). This makes it an approach highly suitable for automated
workflows that are independent of user expertise or prior knowledge
of the training data. We now describe a new method for creating a
composite group of Bayesian models to extend the method to work with
multiple states, rather than just binary. Incoming activities are
divided into bins, each covering a mutually exclusive range of activities.
For each of these bins, a Bayesian model is created to model whether
or not the compound belongs in the bin. Analyzing putative molecules
using the composite model involves making a prediction for each bin
and examining the relative likelihood for each assignment, for example,
highest value wins. The method has been evaluated on a collection
of hundreds of data sets extracted from ChEMBL v20 and validated data
sets for ADME/Tox and bioactivity.
Collapse
Affiliation(s)
- Alex M Clark
- Molecular Materials Informatics, Inc. , 1900 St. Jacques #302, Montreal H3J 2S1, Quebec, Canada
| | - Krishna Dole
- Collaborative Drug Discovery, Inc. , 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| | - Sean Ekins
- Collaborative Drug Discovery, Inc. , 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States.,Collaborations in Chemistry , 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| |
Collapse
|
3
|
Kumar KH, Paricharak S, Mohan CD, Bharathkumar H, Nagabhushana GP, Rajashekar DK, Chandrappa GT, Bender A, Basappa B, Rangappa KS. Nano-MoO3-mediated synthesis of bioactive thiazolidin-4-ones acting as anti-bacterial agents and their mode-of-action analysis using in silico target prediction, docking and similarity searching. NEW J CHEM 2016. [DOI: 10.1039/c5nj02729b] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Thiazolidin-4-ones inhibit bacterial growth by potentially targeting the FtsK motor domain of DNA translocase ofSalmonella typhi.
Collapse
Affiliation(s)
- Keerthy Hosadurga Kumar
- Laboratory of Chemical Biology
- Department of Chemistry
- Bangalore University
- Bangalore-560001
- India
| | - Shardul Paricharak
- Centre for Molecular Informatics
- Department of Chemistry
- Cambridge
- UK
- Division of Medicinal Chemistry
| | | | | | | | | | | | - Andreas Bender
- Centre for Molecular Informatics
- Department of Chemistry
- Cambridge
- UK
| | - Basappa Basappa
- Laboratory of Chemical Biology
- Department of Chemistry
- Bangalore University
- Bangalore-560001
- India
| | | |
Collapse
|
4
|
Mervin LH, Afzal AM, Drakakis G, Lewis R, Engkvist O, Bender A. Target prediction utilising negative bioactivity data covering large chemical space. J Cheminform 2015; 7:51. [PMID: 26500705 PMCID: PMC4619454 DOI: 10.1186/s13321-015-0098-y] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2015] [Accepted: 09/29/2015] [Indexed: 02/25/2023] Open
Abstract
BACKGROUND In silico analyses are increasingly being used to support mode-of-action investigations; however many such approaches do not utilise the large amounts of inactive data held in chemogenomic repositories. The objective of this work is concerned with the integration of such bioactivity data in the target prediction of orphan compounds to produce the probability of activity and inactivity for a range of targets. To this end, a novel human bioactivity data set was constructed through the assimilation of over 195 million bioactivity data points deposited in the ChEMBL and PubChem repositories, and the subsequent application of a sphere-exclusion selection algorithm to oversample presumed inactive compounds. RESULTS A Bernoulli Naïve Bayes algorithm was trained using the data and evaluated using fivefold cross-validation, achieving a mean recall and precision of 67.7 and 63.8 % for active compounds and 99.6 and 99.7 % for inactive compounds, respectively. We show the performances of the models are considerably influenced by the underlying intraclass training similarity, the size of a given class of compounds, and the degree of additional oversampling. The method was also validated using compounds extracted from WOMBAT producing average precision-recall AUC and BEDROC scores of 0.56 and 0.85, respectively. Inactive data points used for this test are based on presumed inactivity, producing an approximated indication of the true extrapolative ability of the models. A distance-based applicability domain analysis was also conducted; indicating an average Tanimoto Coefficient distance of 0.3 or greater between a test and training set can be used to give a global measure of confidence in model predictions. A final comparison to a method trained solely on active data from ChEMBL performed with precision-recall AUC and BEDROC scores of 0.45 and 0.76. CONCLUSIONS The inclusion of inactive data for model training produces models with superior AUC and improved early recognition capabilities, although the results from internal and external validation of the models show differing performance between the breadth of models. The realised target prediction protocol is available at https://github.com/lhm30/PIDGIN.Graphical abstractThe inclusion of large scale negative training data for in silico target prediction improves the precision and recall AUC and BEDROC scores for target models.
Collapse
Affiliation(s)
- Lewis H. Mervin
- />Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| | - Avid M. Afzal
- />Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| | - Georgios Drakakis
- />Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| | - Richard Lewis
- />Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| | - Ola Engkvist
- />Discovery Sciences, Chemistry Innovation Centre, AstraZeneca R&D, 43183 Mölndal, Sweden
| | - Andreas Bender
- />Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| |
Collapse
|
5
|
Schirle M, Jenkins JL. Identifying compound efficacy targets in phenotypic drug discovery. Drug Discov Today 2015; 21:82-89. [PMID: 26272035 DOI: 10.1016/j.drudis.2015.08.001] [Citation(s) in RCA: 102] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2015] [Revised: 07/10/2015] [Accepted: 08/03/2015] [Indexed: 12/30/2022]
Abstract
The identification of the efficacy target(s) for hits from phenotypic compound screens remains a key step to progress compounds into drug development. In addition to efficacy targets, the characterization of epistatic proteins influencing compound activity often facilitates the elucidation of the underlying mechanism of action; and, further, early determination of off-targets that cause potentially unwanted secondary phenotypes helps in assessing potential liabilities. This short review discusses the most important technologies currently available for characterizing the direct and indirect target space of bioactive compounds following phenotypic screening. We present a comprehensive strategy employing complementary approaches to balance individual technology strengths and weaknesses.
Collapse
Affiliation(s)
- Markus Schirle
- Developmental & Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA 02139, USA.
| | - Jeremy L Jenkins
- Developmental & Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, MA 02139, USA.
| |
Collapse
|
6
|
Patel H, Lucas X, Bendik I, Günther S, Merfort I. Target Fishing by Cross-Docking to Explain Polypharmacological Effects. ChemMedChem 2015; 10:1209-17. [DOI: 10.1002/cmdc.201500123] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Revised: 04/29/2015] [Indexed: 01/18/2023]
|
7
|
Paricharak S, Cortés-Ciriano I, IJzerman AP, Malliavin TE, Bender A. Proteochemometric modelling coupled to in silico target prediction: an integrated approach for the simultaneous prediction of polypharmacology and binding affinity/potency of small molecules. J Cheminform 2015; 7:15. [PMID: 25926892 PMCID: PMC4413554 DOI: 10.1186/s13321-015-0063-9] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 03/17/2015] [Indexed: 11/10/2022] Open
Abstract
The rampant increase of public bioactivity databases has fostered the development of computational chemogenomics methodologies to evaluate potential ligand-target interactions (polypharmacology) both in a qualitative and quantitative way. Bayesian target prediction algorithms predict the probability of an interaction between a compound and a panel of targets, thus assessing compound polypharmacology qualitatively, whereas structure-activity relationship techniques are able to provide quantitative bioactivity predictions. We propose an integrated drug discovery pipeline combining in silico target prediction and proteochemometric modelling (PCM) for the respective prediction of compound polypharmacology and potency/affinity. The proposed pipeline was evaluated on the retrospective discovery of Plasmodium falciparum DHFR inhibitors. The qualitative in silico target prediction model comprised 553,084 ligand-target associations (a total of 262,174 compounds), covering 3,481 protein targets and used protein domain annotations to extrapolate predictions across species. The prediction of bioactivities for plasmodial DHFR led to a recall value of 79% and a precision of 100%, where the latter high value arises from the structural similarity of plasmodial DHFR inhibitors and T. gondii DHFR inhibitors in the training set. Quantitative PCM models were then trained on a dataset comprising 20 eukaryotic, protozoan and bacterial DHFR sequences, and 1,505 distinct compounds (in total 3,099 data points). The most predictive PCM model exhibited R20test and RMSEtest values of 0.79 and 0.59 pIC50 units respectively, which was shown to outperform models based exclusively on compound (R20test/RMSEtest = 0.63/0.78) and target information (R20test/RMSEtest = 0.09/1.22), as well as inductive transfer knowledge between targets, with respective R20test and RMSEtest values of 0.76 and 0.63 pIC50 units. Finally, both methods were integrated to predict the protein targets and the potency on plasmodial DHFR for the GSK TCAMS dataset, which comprises 13,533 compounds displaying strong anti-malarial activity. 534 of those compounds were identified as DHFR inhibitors by the target prediction algorithm, while the PCM algorithm identified 25 compounds, and 23 compounds (predicted pIC50 > 7) were identified by both methods. Overall, this integrated approach simultaneously provides target and potency/affinity predictions for small molecules. Proteochemometric modelling coupled to in silico target prediction. ![]()
Collapse
Affiliation(s)
- Shardul Paricharak
- Department of Chemistry, Centre for Molecular Science Informatics, University of Cambridge, Lensfield Road, CB2 1EW Cambridge, UK.,Division of Medicinal Chemistry, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, , 2300 RA Leiden, The Netherlands
| | - Isidro Cortés-Ciriano
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR 3825, Structural Biology and Chemistry Department, 25-28, rue du Dr. Roux, 75 724 Paris, France
| | - Adriaan P IJzerman
- Division of Medicinal Chemistry, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, , 2300 RA Leiden, The Netherlands
| | - Thérèse E Malliavin
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR 3825, Structural Biology and Chemistry Department, 25-28, rue du Dr. Roux, 75 724 Paris, France
| | - Andreas Bender
- Department of Chemistry, Centre for Molecular Science Informatics, University of Cambridge, Lensfield Road, CB2 1EW Cambridge, UK
| |
Collapse
|
8
|
Silvério-Machado R, Couto BRGM, dos Santos MA. Retrieval of Enterobacteriaceae drug targets using singular value decomposition. Bioinformatics 2014; 31:1267-73. [DOI: 10.1093/bioinformatics/btu792] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Accepted: 11/23/2014] [Indexed: 01/25/2023] Open
|
9
|
Koutsoukas A, Lowe R, Kalantarmotamedi Y, Mussa HY, Klaffke W, Mitchell JBO, Glen RC, Bender A. In silico target predictions: defining a benchmarking data set and comparison of performance of the multiclass Naïve Bayes and Parzen-Rosenblatt window. J Chem Inf Model 2013; 53:1957-66. [PMID: 23829430 DOI: 10.1021/ci300435j] [Citation(s) in RCA: 119] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
In this study, two probabilistic machine-learning algorithms were compared for in silico target prediction of bioactive molecules, namely the well-established Laplacian-modified Naïve Bayes classifier (NB) and the more recently introduced (to Cheminformatics) Parzen-Rosenblatt Window. Both classifiers were trained in conjunction with circular fingerprints on a large data set of bioactive compounds extracted from ChEMBL, covering 894 human protein targets with more than 155,000 ligand-protein pairs. This data set is also provided as a benchmark data set for future target prediction methods due to its size as well as the number of bioactivity classes it contains. In addition to evaluating the methods, different performance measures were explored. This is not as straightforward as in binary classification settings, due to the number of classes, the possibility of multiple class memberships, and the need to translate model scores into "yes/no" predictions for assessing model performance. Both algorithms achieved a recall of correct targets that exceeds 80% in the top 1% of predictions. Performance depends significantly on the underlying diversity and size of a given class of bioactive compounds, with small classes and low structural similarity affecting both algorithms to different degrees. When tested on an external test set extracted from WOMBAT covering more than 500 targets by excluding all compounds with Tanimoto similarity above 0.8 to compounds from the ChEMBL data set, the current methodologies achieved a recall of 63.3% and 66.6% among the top 1% for Naïve Bayes and Parzen-Rosenblatt Window, respectively. While those numbers seem to indicate lower performance, they are also more realistic for settings where protein targets need to be established for novel chemical substances.
Collapse
Affiliation(s)
- Alexios Koutsoukas
- Unilever Centre for Molecular Sciences Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Kruger FA, Rostom R, Overington JP. Mapping small molecule binding data to structural domains. BMC Bioinformatics 2012; 13 Suppl 17:S11. [PMID: 23282026 PMCID: PMC3521243 DOI: 10.1186/1471-2105-13-s17-s11] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Large-scale bioactivity/SAR Open Data has recently become available, and this has allowed new analyses and approaches to be developed to help address the productivity and translational gaps of current drug discovery. One of the current limitations of these data is the relative sparsity of reported interactions per protein target, and complexities in establishing clear relationships between bioactivity and targets using bioinformatics tools. We detail in this paper the indexing of targets by the structural domains that bind (or are likely to bind) the ligand within a full-length protein. Specifically, we present a simple heuristic to map small molecule binding to Pfam domains. This profiling can be applied to all proteins within a genome to give some indications of the potential pharmacological modulation and regulation of all proteins. RESULTS In this implementation of our heuristic, ligand binding to protein targets from the ChEMBL database was mapped to structural domains as defined by profiles contained within the Pfam-A database. Our mapping suggests that the majority of assay targets within the current version of the ChEMBL database bind ligands through a small number of highly prevalent domains, and conversely the majority of Pfam domains sampled by our data play no currently established role in ligand binding. Validation studies, carried out firstly against Uniprot entries with expert binding-site annotation and secondly against entries in the wwPDB repository of crystallographic protein structures, demonstrate that our simple heuristic maps ligand binding to the correct domain in about 90 percent of all assessed cases. Using the mappings obtained with our heuristic, we have assembled ligand sets associated with each Pfam domain. CONCLUSIONS Small molecule binding has been mapped to Pfam-A domains of protein targets in the ChEMBL bioactivity database. The result of this mapping is an enriched annotation of small molecule bioactivity data and a grouping of activity classes following the Pfam-A specifications of protein domains. This is valuable for data-focused approaches in drug discovery, for example when extrapolating potential targets of a small molecule with known activity against one or few targets, or in the assessment of a potential target for drug discovery or screening studies.
Collapse
Affiliation(s)
- Felix A Kruger
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | | | | |
Collapse
|
11
|
Li Q, Cheng T, Wang Y, Bryant SH. Characterizing protein domain associations by Small-molecule ligand binding. JOURNAL OF PROTEOME SCIENCE AND COMPUTATIONAL BIOLOGY 2012; 1:6. [PMID: 23745168 PMCID: PMC3671605 DOI: 10.7243/2050-2273-1-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
BACKGROUND Protein domains are evolutionarily conserved building blocks for protein structure and function, which are conventionally identified based on protein sequence or structure similarity. Small molecule binding domains are of great importance for the recognition of small molecules in biological systems and drug development. Many small molecules, including drugs, have been increasingly identified to bind to multiple targets, leading to promiscuous interactions with protein domains. Thus, a large scale characterization of the protein domains and their associations with respect to small-molecule binding is of particular interest to system biology research, drug target identification, as well as drug repurposing. METHODS We compiled a collection of 13,822 physical interactions of small molecules and protein domains derived from the Protein Data Bank (PDB) structures. Based on the chemical similarity of these small molecules, we characterized pairwise associations of the protein domains and further investigated their global associations from a network point of view. RESULTS We found that protein domains, despite lack of similarity in sequence and structure, were comprehensively associated through binding the same or similar small-molecule ligands. Moreover, we identified modules in the domain network that consisted of closely related protein domains by sharing similar biochemical mechanisms, being involved in relevant biological pathways, or being regulated by the same cognate cofactors. CONCLUSIONS A novel protein domain relationship was identified in the context of small-molecule binding, which is complementary to those identified by traditional sequence-based or structure-based approaches. The protein domain network constructed in the present study provides a novel perspective for chemogenomic study and network pharmacology, as well as target identification for drug repurposing.
Collapse
Affiliation(s)
- Qingliang Li
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Tiejun Cheng
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Yanli Wang
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Stephen H. Bryant
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
12
|
Jenkins JL. Large-Scale QSAR in Target Prediction and Phenotypic HTS Assessment. Mol Inform 2012; 31:508-14. [PMID: 27477469 DOI: 10.1002/minf.201200002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2012] [Accepted: 06/25/2012] [Indexed: 01/31/2023]
Abstract
The advent of in silico compound target prediction offers a potential paradigm shift in how large compound collections are understood and used strategically in high-throughput screens (HTS). Specifically, phenotypic HTS hits may be annotated both with known targets and predicted targets using large-scale QSAR models, enabling a more sophisticated hit assessment. Efforts in massive bioactivity data integration and standardization is empowering such compound-target annotations. These approaches differ fundamentally from the traditional role of QSAR in lead optimization and binding affinity predictions to global, probabilistic target predictions for thousands of human proteins.
Collapse
Affiliation(s)
- Jeremy L Jenkins
- Developmental and Molecular Pathways, Quantitative Biology, Novartis Institutes for BioMedical Research, 220 Massachusetts Ave., Cambridge, MA 02139 phone: 617-871-7155.
| |
Collapse
|
13
|
Cong F, Cheung AK, Huang SMA. Chemical Genetics–Based Target Identification in Drug Discovery. Annu Rev Pharmacol Toxicol 2012; 52:57-78. [DOI: 10.1146/annurev-pharmtox-010611-134639] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Feng Cong
- Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, Massachusetts 02139;
| | - Atwood K. Cheung
- Global Discovery Chemistry – Chemogenetics and Proteomics, Novartis Institutes for BioMedical Research, Cambridge, Massachusetts 02139
| | - Shih-Min A. Huang
- Developmental and Molecular Pathways, Novartis Institutes for BioMedical Research, Cambridge, Massachusetts 02139;
- Current address: Sanofi-Aventis Oncology, Cambridge, Massachusetts 02139
| |
Collapse
|
14
|
Koutsoukas A, Simms B, Kirchmair J, Bond PJ, Whitmore AV, Zimmer S, Young MP, Jenkins JL, Glick M, Glen RC, Bender A. From in silico target prediction to multi-target drug design: current databases, methods and applications. J Proteomics 2011; 74:2554-74. [PMID: 21621023 DOI: 10.1016/j.jprot.2011.05.011] [Citation(s) in RCA: 186] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2011] [Revised: 04/10/2011] [Accepted: 05/06/2011] [Indexed: 01/31/2023]
Abstract
Given the tremendous growth of bioactivity databases, the use of computational tools to predict protein targets of small molecules has been gaining importance in recent years. Applications span a wide range, from the 'designed polypharmacology' of compounds to mode-of-action analysis. In this review, we firstly survey databases that can be used for ligand-based target prediction and which have grown tremendously in size in the past. We furthermore outline methods for target prediction that exist, both based on the knowledge of bioactivities from the ligand side and methods that can be applied in situations when a protein structure is known. Applications of successful in silico target identification attempts are discussed in detail, which were based partly or in whole on computational target predictions in the first instance. This includes the authors' own experience using target prediction tools, in this case considering phenotypic antibacterial screens and the analysis of high-throughput screening data. Finally, we will conclude with the prospective application of databases to not only predict, retrospectively, the protein targets of a small molecule, but also how to design ligands with desired polypharmacology in a prospective manner.
Collapse
Affiliation(s)
- Alexios Koutsoukas
- Unilever Centre for Molecular Sciences Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Wang C, Zhou J, Wang S, Ye M, Jiang C, Fan G, Zou H. Combined Comparative and Chemical Proteomics on the Mechanisms of levo-Tetrahydropalmatine-Induced Antinociception in the Formalin Test. J Proteome Res 2010; 9:3225-34. [DOI: 10.1021/pr1001274] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Chen Wang
- Department of Pharmaceutical Analysis, School of Pharmacy, Second Military Medical University, No.325 Guohe Road, Shanghai 200433, People's Republic of China, Division of Biotechnology, Dalian Institute of Chemical Physics, CAS, No.457 Zhongshan Road, Dalian 116023, People's Republic of China, Laboratory of Stress Medicine, Department of Nautical Medicine, Second Military Medical University, No.800 Xiangyin Road, Shanghai 200433, People's Republic of China, and Shanghai Key Laboratory for Pharmaceutical
| | - Jiangrui Zhou
- Department of Pharmaceutical Analysis, School of Pharmacy, Second Military Medical University, No.325 Guohe Road, Shanghai 200433, People's Republic of China, Division of Biotechnology, Dalian Institute of Chemical Physics, CAS, No.457 Zhongshan Road, Dalian 116023, People's Republic of China, Laboratory of Stress Medicine, Department of Nautical Medicine, Second Military Medical University, No.800 Xiangyin Road, Shanghai 200433, People's Republic of China, and Shanghai Key Laboratory for Pharmaceutical
| | - Shuowen Wang
- Department of Pharmaceutical Analysis, School of Pharmacy, Second Military Medical University, No.325 Guohe Road, Shanghai 200433, People's Republic of China, Division of Biotechnology, Dalian Institute of Chemical Physics, CAS, No.457 Zhongshan Road, Dalian 116023, People's Republic of China, Laboratory of Stress Medicine, Department of Nautical Medicine, Second Military Medical University, No.800 Xiangyin Road, Shanghai 200433, People's Republic of China, and Shanghai Key Laboratory for Pharmaceutical
| | - Mingliang Ye
- Department of Pharmaceutical Analysis, School of Pharmacy, Second Military Medical University, No.325 Guohe Road, Shanghai 200433, People's Republic of China, Division of Biotechnology, Dalian Institute of Chemical Physics, CAS, No.457 Zhongshan Road, Dalian 116023, People's Republic of China, Laboratory of Stress Medicine, Department of Nautical Medicine, Second Military Medical University, No.800 Xiangyin Road, Shanghai 200433, People's Republic of China, and Shanghai Key Laboratory for Pharmaceutical
| | - Chunlei Jiang
- Department of Pharmaceutical Analysis, School of Pharmacy, Second Military Medical University, No.325 Guohe Road, Shanghai 200433, People's Republic of China, Division of Biotechnology, Dalian Institute of Chemical Physics, CAS, No.457 Zhongshan Road, Dalian 116023, People's Republic of China, Laboratory of Stress Medicine, Department of Nautical Medicine, Second Military Medical University, No.800 Xiangyin Road, Shanghai 200433, People's Republic of China, and Shanghai Key Laboratory for Pharmaceutical
| | - Guorong Fan
- Department of Pharmaceutical Analysis, School of Pharmacy, Second Military Medical University, No.325 Guohe Road, Shanghai 200433, People's Republic of China, Division of Biotechnology, Dalian Institute of Chemical Physics, CAS, No.457 Zhongshan Road, Dalian 116023, People's Republic of China, Laboratory of Stress Medicine, Department of Nautical Medicine, Second Military Medical University, No.800 Xiangyin Road, Shanghai 200433, People's Republic of China, and Shanghai Key Laboratory for Pharmaceutical
| | - Hanfa Zou
- Department of Pharmaceutical Analysis, School of Pharmacy, Second Military Medical University, No.325 Guohe Road, Shanghai 200433, People's Republic of China, Division of Biotechnology, Dalian Institute of Chemical Physics, CAS, No.457 Zhongshan Road, Dalian 116023, People's Republic of China, Laboratory of Stress Medicine, Department of Nautical Medicine, Second Military Medical University, No.800 Xiangyin Road, Shanghai 200433, People's Republic of China, and Shanghai Key Laboratory for Pharmaceutical
| |
Collapse
|
16
|
Feng Y, Mitchison TJ, Bender A, Young DW, Tallarico JA. Multi-parameter phenotypic profiling: using cellular effects to characterize small-molecule compounds. Nat Rev Drug Discov 2009; 8:567-78. [PMID: 19568283 DOI: 10.1038/nrd2876] [Citation(s) in RCA: 235] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Multi-parameter phenotypic profiling of small molecules provides important insights into their mechanisms of action, as well as a systems level understanding of biological pathways and their responses to small molecule treatments. It therefore deserves more attention at an early step in the drug discovery pipeline. Here, we summarize the technologies that are currently in use for phenotypic profiling--including mRNA-, protein- and imaging-based multi-parameter profiling--in the drug discovery context. We think that an earlier integration of phenotypic profiling technologies, combined with effective experimental and in silico target identification approaches, can improve success rates of lead selection and optimization in the drug discovery process.
Collapse
Affiliation(s)
- Yan Feng
- Developmental and Molecular Pathways, Novartis Institutes for Biomedical Research, 250 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA.
| | | | | | | | | |
Collapse
|