Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

51
(from Reference Citation Analysis)

Article PDFs (25)

Cited by > 0 (48)

Searched Name

Anna Gaulton

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res 2024;52:D1180-D1192. [PMID: 37933841 PMCID: PMC10767899 DOI: 10.1093/nar/gkad1004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Revised: 10/09/2023] [Accepted: 10/23/2023] [Indexed: 11/08/2023] Open Abstract ChEMBL (https://www.ebi.ac.uk/chembl/) is a manually curated, high-quality, large-scale, open, FAIR and Global Core Biodata Resource of bioactive molecules with drug-like properties, previously described in the 2012, 2014, 2017 and 2019 Nucleic Acids Research Database Issues. Since its introduction in 2009, ChEMBL's content has changed dramatically in size and diversity of data types. Through incorporation of multiple new datasets from depositors since the 2019 update, ChEMBL now contains slightly more bioactivity data from deposited data vs data extracted from literature. In collaboration with the EUbOPEN consortium, chemical probe data is now regularly deposited into ChEMBL. Release 27 made curated data available for compounds screened for potential anti-SARS-CoV-2 activity from several large-scale drug repurposing screens. In addition, new patent bioactivity data have been added to the latest ChEMBL releases, and various new features have been incorporated, including a Natural Product likeness score, updated flags for Natural Products, a new flag for Chemical Probes, and the initial annotation of the action type for ∼270 000 bioactivity measurements. Collapse Key Words Collapse MESH Headings Databases, Factual Drug Discovery Time Factors Collapse Grants Wellcome Trust 104104/A/14/Z Wellcome Trust U54 CA189205 NCI NIH HHS Member States of the European Molecular Biology Laboratory US National Institutes of Health University of New Mexico European Bioinformatics Institute University of Miami Innovative Medicines Initiative 2 Joint Undertaking Collapse
2	Biomedical data analyses facilitated by open cheminformatics workflows. J Cheminform 2023;15:46. [PMID: 37069670 PMCID: PMC10108476 DOI: 10.1186/s13321-023-00718-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2023] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
3	Illuminating the druggable genome through patent bioactivity data. PeerJ 2023;11:e15153. [PMID: 37151295 PMCID: PMC10162037 DOI: 10.7717/peerj.15153] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 03/10/2023] [Indexed: 05/09/2023] Open Abstract The patent literature is a potentially valuable source of bioactivity data. In this article we describe a process to prioritise 3.7 million life science relevant patents obtained from the SureChEMBL database (https://www.surechembl.org/), according to how likely they were to contain bioactivity data for potent small molecules on less-studied targets, based on the classification developed by the Illuminating the Druggable Genome (IDG) project. The overall goal was to select a smaller number of patents that could be manually curated and incorporated into the ChEMBL database. Using relatively simple annotation and filtering pipelines, we have been able to identify a substantial number of patents containing quantitative bioactivity data for understudied targets that had not previously been reported in the peer-reviewed medicinal chemistry literature. We quantify the added value of such methods in terms of the numbers of targets that are so identified, and provide some specific illustrative examples. Our work underlines the potential value in searching the patent corpus in addition to the more traditional peer-reviewed literature. The small molecules found in these patents, together with their measured activity against the targets, are now accessible via the ChEMBL database. Collapse Key Words Bioactive compounds Drug targets Druggable genome Patents Small molecules Understudied targets Collapse MESH Headings Collapse Grants Collapse
4	Validation of lipid-related therapeutic targets for coronary heart disease prevention using human genetics. Nat Commun 2021;12:6120. [PMID: 34675202 PMCID: PMC8531035 DOI: 10.1038/s41467-021-25731-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Accepted: 08/26/2021] [Indexed: 12/14/2022] Open Abstract Drug target Mendelian randomization (MR) studies use DNA sequence variants in or near a gene encoding a drug target, that alter the target's expression or function, as a tool to anticipate the effect of drug action on the same target. Here we apply MR to prioritize drug targets for their causal relevance for coronary heart disease (CHD). The targets are further prioritized using independent replication, co-localization, protein expression profiles and data from the British National Formulary and clinicaltrials.gov. Out of the 341 drug targets identified through their association with blood lipids (HDL-C, LDL-C and triglycerides), we robustly prioritize 30 targets that might elicit beneficial effects in the prevention or treatment of CHD, including NPC1L1 and PCSK9, the targets of drugs used in CHD prevention. We discuss how this approach can be generalized to other targets, disease biomarkers and endpoints to help prioritize and validate targets during the drug development process. Collapse Key Words target validation genetics cardiovascular diseases Collapse MESH Headings Cholesterol, HDL/blood Cholesterol, LDL/blood Coronary Disease/blood Coronary Disease/drug therapy Coronary Disease/genetics Humans Membrane Transport Proteins/genetics Mendelian Randomization Analysis Proprotein Convertase 9/genetics Triglycerides/blood Collapse Grants RG/10/12/28456 British Heart Foundation CH/F/20/90003 British Heart Foundation FS/17/70/33482 British Heart Foundation MR/R024227/1 Medical Research Council S011676 Medical Research Council PG/18/50/33837 British Heart Foundation MR/V033867/1 Medical Research Council R01 AG056477 NIA NIH HHS MC_UU_00011/6 Medical Research Council MC_UU_00011/4 Medical Research Council RG/19/4/34452 British Heart Foundation 221854/Z/20/Z Wellcome Trust 29019 Cancer Research UK RF1 AG062553 NIA NIH HHS SP/13/6/30554 British Heart Foundation MR/S011676/1 Medical Research Council Wellcome Trust R024227 Medical Research Council Thailand Research Fund (TRF) Wellcome Trust (Wellcome) RCUK \| Medical Research Council (MRC) U.S. Department of Health & Human Services \| NIH \| National Institute on Aging (U.S. National Institute on Aging) Academy of Finland (Suomen Akatemia) British Heart Foundation (BHF) RCUK \| Economic and Social Research Council (ESRC) EC \| Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020) DH \| National Institute for Health Research (NIHR) Member States of EMBL Rosetrees Trust Collapse
5	Target-Based Evaluation of "Drug-Like" Properties and Ligand Efficiencies. J Med Chem 2021;64:7210-7230. [PMID: 33983732 PMCID: PMC7610969 DOI: 10.1021/acs.jmedchem.1c00416] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Abstract Physicochemical descriptors commonly used to define "drug-likeness" and ligand efficiency measures are assessed for their ability to differentiate marketed drugs from compounds reported to bind to their efficacious target or targets. Using ChEMBL version 26, a data set of 643 drugs acting on 271 targets was assembled, comprising 1104 drug-target pairs having ≥100 published compounds per target. Taking into account changes in their physicochemical properties over time, drugs are analyzed according to their target class, therapy area, and route of administration. Recent drugs, approved in 2010-2020, display no overall differences in molecular weight, lipophilicity, hydrogen bonding, or polar surface area from their target comparator compounds. Drugs are differentiated from target comparators by higher potency, ligand efficiency (LE), lipophilic ligand efficiency (LLE), and lower carboaromaticity. Overall, 96% of drugs have LE or LLE values, or both, greater than the median values of their target comparator compounds. Collapse Key Words Collapse MESH Headings Databases, Chemical Drug Administration Routes Hydrogen Bonding Hydrophobic and Hydrophilic Interactions Ligands Molecular Weight Pharmaceutical Preparations/chemistry Pharmaceutical Preparations/metabolism Collapse Grants 218244/Z/19/Z Wellcome Trust 104104/A/14/Z Wellcome Trust 104104 Wellcome Trust 218244 Wellcome Trust Wellcome Trust Collapse
6	Actionable druggable genome-wide Mendelian randomization identifies repurposing opportunities for COVID-19. Nat Med 2021;27:668-676. [PMID: 33837377 PMCID: PMC7612986 DOI: 10.1038/s41591-021-01310-z] [Citation(s) in RCA: 86] [Impact Index Per Article: 28.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 03/05/2021] [Indexed: 12/31/2022] Abstract Drug repurposing provides a rapid approach to meet the urgent need for therapeutics to address COVID-19. To identify therapeutic targets relevant to COVID-19, we conducted Mendelian randomization analyses, deriving genetic instruments based on transcriptomic and proteomic data for 1,263 actionable proteins that are targeted by approved drugs or in clinical phase of drug development. Using summary statistics from the Host Genetics Initiative and the Million Veteran Program, we studied 7,554 patients hospitalized with COVID-19 and >1 million controls. We found significant Mendelian randomization results for three proteins (ACE2, P = 1.6 × 10-6; IFNAR2, P = 9.8 × 10-11 and IL-10RB, P = 2.3 × 10-14) using cis-expression quantitative trait loci genetic instruments that also had strong evidence for colocalization with COVID-19 hospitalization. To disentangle the shared expression quantitative trait loci signal for IL10RB and IFNAR2, we conducted phenome-wide association scans and pathway enrichment analysis, which suggested that IFNAR2 is more likely to play a role in COVID-19 hospitalization. Our findings prioritize trials of drugs targeting IFNAR2 and ACE2 for early management of COVID-19. Collapse Key Words mendelian randomization drug repurposing covid-19 actionable druggable genome Collapse MESH Headings Angiotensin-Converting Enzyme 2/genetics Angiotensin-Converting Enzyme 2/physiology COVID-19/genetics Drug Repositioning Genome-Wide Association Study Humans Interleukin-10 Receptor beta Subunit/genetics Interleukin-10 Receptor beta Subunit/physiology Mendelian Randomization Analysis/methods Quantitative Trait Loci Receptor, Interferon alpha-beta/genetics Receptor, Interferon alpha-beta/physiology SARS-CoV-2 COVID-19 Drug Treatment Collapse Grants I01 BX004821 BLRD VA MR/L003120/1 Medical Research Council RG/13/13/30194 British Heart Foundation MC_UU_12015/1 Medical Research Council I01 CX001897 CSRD VA MR/S004068/2 Medical Research Council MC_UU_00002/7 Medical Research Council 204623/Z/16/Z Wellcome Trust Wellcome Trust RG/18/13/33946 British Heart Foundation Collapse
7	Drug Safety Data Curation and Modeling in ChEMBL: Boxed Warnings and Withdrawn Drugs. Chem Res Toxicol 2021;34:385-395. [PMID: 33507738 PMCID: PMC7888266 DOI: 10.1021/acs.chemrestox.0c00296] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Indexed: 12/15/2022] Abstract The safety of marketed drugs is an ongoing concern, with some of the more frequently prescribed medicines resulting in serious or life-threatening adverse effects in some patients. Safety-related information for approved drugs has been curated to include the assignment of toxicity class(es) based on their withdrawn status and/or black box warning information described on medicinal product labels. The ChEMBL resource contains a wide range of bioactivity data types, from early "Discovery" stage preclinical data for individual compounds through to postclinical data on marketed drugs; the inclusion of the curated drug safety data set within this framework can support a wide range of safety-related drug discovery questions. The curated drug safety data set will be made freely available through ChEMBL and updated in future database releases. Collapse Key Words Collapse MESH Headings Data Curation Drug Approval Drug-Related Side Effects and Adverse Reactions Humans Models, Molecular Pharmaceutical Preparations/chemistry Collapse Grants 218244/Z/19/Z Wellcome Trust 104104/A/14/Z Wellcome Trust Collapse
8	An open source chemical structure curation pipeline using RDKit. J Cheminform 2020;12:51. [PMID: 33431044 PMCID: PMC7458899 DOI: 10.1186/s13321-020-00456-1] [Citation(s) in RCA: 128] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 08/24/2020] [Indexed: 11/13/2022] Open Abstract BACKGROUND The ChEMBL database is one of a number of public databases that contain bioactivity data on small molecule compounds curated from diverse sources. Incoming compounds are typically not standardised according to consistent rules. In order to maintain the quality of the final database and to easily compare and integrate data on the same compound from different sources it is necessary for the chemical structures in the database to be appropriately standardised. RESULTS A chemical curation pipeline has been developed using the open source toolkit RDKit. It comprises three components: a Checker to test the validity of chemical structures and flag any serious errors; a Standardizer which formats compounds according to defined rules and conventions and a GetParent component that removes any salts and solvents from the compound to create its parent. This pipeline has been applied to the latest version of the ChEMBL database as well as uncurated datasets from other sources to test the robustness of the process and to identify common issues in database molecular structures. CONCLUSION All the components of the structure pipeline have been made freely available for other researchers to use and adapt for their own use. The code is available in a GitHub repository and it can also be accessed via the ChEMBL Beaker webservices. It has been used successfully to standardise the nearly 2 million compounds in the ChEMBL database and the compound validity checker has been used to identify compounds with the most serious issues so that they can be prioritised for manual curation. Collapse Key Words ChEMBL Chemistry Curation Open source RDKit Standardisation Collapse MESH Headings Collapse Grants Wellcome Trust WT086151/Z/08/Z Wellcome Trust WT104104/Z/14/Z Wellcome Trust European Molecular Biology Laboratory Collapse
9	The Global Phosphorylation Landscape of SARS-CoV-2 Infection. Cell 2020;182:685-712.e19. [PMID: 32645325 PMCID: PMC7321036 DOI: 10.1016/j.cell.2020.06.034] [Citation(s) in RCA: 677] [Impact Index Per Article: 169.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 06/09/2020] [Accepted: 06/23/2020] [Indexed: 02/07/2023] Abstract The causative agent of the coronavirus disease 2019 (COVID-19) pandemic, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has infected millions and killed hundreds of thousands of people worldwide, highlighting an urgent need to develop antiviral therapies. Here we present a quantitative mass spectrometry-based phosphoproteomics survey of SARS-CoV-2 infection in Vero E6 cells, revealing dramatic rewiring of phosphorylation on host and viral proteins. SARS-CoV-2 infection promoted casein kinase II (CK2) and p38 MAPK activation, production of diverse cytokines, and shutdown of mitotic kinases, resulting in cell cycle arrest. Infection also stimulated a marked induction of CK2-containing filopodial protrusions possessing budding viral particles. Eighty-seven drugs and compounds were identified by mapping global phosphorylation profiles to dysregulated kinases and pathways. We found pharmacologic inhibition of the p38, CK2, CDK, AXL, and PIKFYVE kinases to possess antiviral efficacy, representing potential COVID-19 therapies. Collapse Key Words AXL CDK MAPK PIKFYVE SARS-CoV-2 antiviral casein kinase II mass spectrometry p38 phosphoproteomics Collapse MESH Headings A549 Cells Angiotensin-Converting Enzyme 2 Animals Antiviral Agents/pharmacology Betacoronavirus/metabolism COVID-19 Caco-2 Cells Casein Kinase II/antagonists & inhibitors Casein Kinase II/metabolism Chlorocebus aethiops Coronavirus Infections/metabolism Coronavirus Infections/virology Cyclin-Dependent Kinases/antagonists & inhibitors Cyclin-Dependent Kinases/metabolism Drug Evaluation, Preclinical/methods HEK293 Cells Host-Pathogen Interactions Humans Pandemics Peptidyl-Dipeptidase A/genetics Peptidyl-Dipeptidase A/metabolism Phosphatidylinositol 3-Kinases/metabolism Phosphoinositide-3 Kinase Inhibitors/pharmacology Phosphorylation Pneumonia, Viral/metabolism Pneumonia, Viral/virology Protein Kinase Inhibitors/pharmacology Proteomics/methods Proto-Oncogene Proteins/antagonists & inhibitors Proto-Oncogene Proteins/metabolism Receptor Protein-Tyrosine Kinases/antagonists & inhibitors Receptor Protein-Tyrosine Kinases/metabolism SARS-CoV-2 Spike Glycoprotein, Coronavirus/metabolism Vero Cells p38 Mitogen-Activated Protein Kinases/antagonists & inhibitors p38 Mitogen-Activated Protein Kinases/metabolism Axl Receptor Tyrosine Kinase Collapse Grants HHSN272201400008C NIAID NIH HHS T32 GM007618 NIGMS NIH HHS R01 GM117189 NIGMS NIH HHS U19 AI135990 NIAID NIH HHS R01 AI122747 NIAID NIH HHS F32 CA239333 NCI NIH HHS R01 CA244550 NCI NIH HHS R35 GM122481 NIGMS NIH HHS R01 GM133981 NIGMS NIH HHS U19 AI118610 NIAID NIH HHS U19 AI135972 NIAID NIH HHS P50 AI150476 NIAID NIH HHS R01 AI143292 NIAID NIH HHS F32 CA236347 NCI NIH HHS European Research Council R01 AI120694 NIAID NIH HHS R35 GM118119 NIGMS NIH HHS R01 CA221969 NCI NIH HHS Collapse
10	ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 2020;47:D930-D940. [PMID: 30398643 PMCID: PMC6323927 DOI: 10.1093/nar/gky1075] [Citation(s) in RCA: 973] [Impact Index Per Article: 243.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 10/18/2018] [Indexed: 12/31/2022] Open Abstract ChEMBL is a large, open-access bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012, 2014 and 2017 Nucleic Acids Research Database Issues. In the last two years, several important improvements have been made to the database and are described here. These include more robust capture and representation of assay details; a new data deposition system, allowing updating of data sets and deposition of supplementary data; and a completely redesigned web interface, with enhanced search and filtering capabilities. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
11	Reply to "Missed opportunities in large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery". J Cheminform 2019;11:64. [PMID: 33430932 PMCID: PMC6831531 DOI: 10.1186/s13321-019-0388-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Accepted: 10/22/2019] [Indexed: 11/10/2022] Open Abstract In response to Krstajic's letter to the editor concerning our published paper, we here take the opportunity to reply, to re-iterate that no errors in our work were identified, to provide further details, and to re-emphasise the outputs of our study. Moreover, we highlight that all of the data are freely available for the wider scientific community (including the aforementioned correspondent) to undertake follow-on studies and comparisons. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
12	Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery. J Cheminform 2019;11:4. [PMID: 30631996 PMCID: PMC6690068 DOI: 10.1186/s13321-018-0325-4] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 12/24/2018] [Indexed: 12/22/2022] Open Abstract Structure–activity relationship modelling is frequently used in the early stage of drug discovery to assess the activity of a compound on one or several targets, and can also be used to assess the interaction of compounds with liability targets. QSAR models have been used for these and related applications over many years, with good success. Conformal prediction is a relatively new QSAR approach that provides information on the certainty of a prediction, and so helps in decision-making. However, it is not always clear how best to make use of this additional information. In this article, we describe a case study that directly compares conformal prediction with traditional QSAR methods for large-scale predictions of target-ligand binding. The ChEMBL database was used to extract a data set comprising data from 550 human protein targets with different bioactivity profiles. For each target, a QSAR model and a conformal predictor were trained and their results compared. The models were then evaluated on new data published since the original models were built to simulate a “real world” application. The comparative study highlights the similarities between the two techniques but also some differences that it is important to bear in mind when the methods are used in practical drug discovery applications. Collapse Key Words ChEMBL Cheminformatics Classification models Mondrian conformal prediction QSAR Collapse MESH Headings Collapse Grants Collapse
13	A large-scale dataset of in vivo pharmacology assay results. Sci Data 2018;5:180230. [PMID: 30351302 PMCID: PMC6206617 DOI: 10.1038/sdata.2018.230] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Accepted: 09/03/2018] [Indexed: 12/17/2022] Open Abstract ChEMBL is a large-scale, open-access drug discovery resource containing bioactivity information primarily extracted from scientific literature. A substantial dataset of more than 135,000 in vivo assays has been collated as a key resource of animal models for translational medicine within drug discovery. To improve the utility of the in vivo data, an extensive data curation task has been undertaken that allows the assays to be grouped by animal disease model or phenotypic endpoint. The dataset contains previously unavailable information about compounds or drugs tested in animal models and, in conjunction with assay data on protein targets or cell- or tissue- based systems, allows the investigation of the effects of compounds at differing levels of biological complexity. Equally, it enables researchers to identify compounds that have been investigated for a group of disease-, pharmacology- or toxicity-relevant assays. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
14	Unexplored therapeutic opportunities in the human genome. Nat Rev Drug Discov 2018;17:317-332. [PMID: 29472638 PMCID: PMC6339563 DOI: 10.1038/nrd.2018.14] [Citation(s) in RCA: 204] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Abstract A large proportion of biomedical research and the development of therapeutics is focused on a small fraction of the human genome. In a strategic effort to map the knowledge gaps around proteins encoded by the human genome and to promote the exploration of currently understudied, but potentially druggable, proteins, the US National Institutes of Health launched the Illuminating the Druggable Genome (IDG) initiative in 2014. In this article, we discuss how the systematic collection and processing of a wide array of genomic, proteomic, chemical and disease-related resource data by the IDG Knowledge Management Center have enabled the development of evidence-based criteria for tracking the target development level (TDL) of human proteins, which indicates a substantial knowledge deficit for approximately one out of three proteins in the human proteome. We then present spotlights on the TDL categories as well as key drug target classes, including G protein-coupled receptors, protein kinases and ion channels, which illustrate the nature of the unexplored opportunities for biomedical research and therapeutic development. Collapse Key Words Collapse MESH Headings Collapse Grants U01 MH104999 NIMH NIH HHS P30 CA118100 NCI NIH HHS U24 CA224370 NCI NIH HHS UL1 TR001449 NCATS NIH HHS P50 CA058223 NCI NIH HHS R01 CA177993 NCI NIH HHS U01 MH105026 NIMH NIH HHS U24 DK116214 NIDDK NIH HHS U24 TR002278 NCATS NIH HHS UM1 HG006370 NHGRI NIH HHS U01 MH104984 NIMH NIH HHS U54 CA189201 NCI NIH HHS Wellcome Trust U01 MH105028 NIMH NIH HHS U24 DK116195 NIDDK NIH HHS U24 CA224260 NCI NIH HHS U54 CA189205 NCI NIH HHS U24 DK116204 NIDDK NIH HHS U01 MH104974 NIMH NIH HHS Collapse
15	Unexplored therapeutic opportunities in the human genome. Nat Rev Drug Discov 2018;17:377. [PMID: 29567993 DOI: 10.1038/nrd.2018.52] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Abstract This corrects the article DOI: 10.1038/nrd.2018.14. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
16	Using ChEMBL web services for building applications and data processing workflows relevant to drug discovery. Expert Opin Drug Discov 2017;12:757-767. [PMID: 28602100 DOI: 10.1080/17460441.2017.1339032] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Abstract INTRODUCTION ChEMBL is a manually curated database of bioactivity data on small drug-like molecules, used by drug discovery scientists. Among many access methods, a REST API provides programmatic access, allowing the remote retrieval of ChEMBL data and its integration into other applications. This approach allows scientists to move from a world where they go to the ChEMBL web site to search for relevant data, to one where ChEMBL data can be simply integrated into their everyday tools and work environment. Areas covered: This review highlights some of the audiences who may benefit from using the ChEMBL API, and the goals they can address, through the description of several use cases. The examples cover a team communication tool (Slack), a data analytics platform (KNIME), batch job management software (Luigi) and Rich Internet Applications. Expert opinion: The advent of web technologies, cloud computing and micro services oriented architectures have made REST APIs an essential ingredient of modern software development models. The widespread availability of tools consuming RESTful resources have made them useful for many groups of users. The ChEMBL API is a valuable resource of drug discovery bioactivity data for professional chemists, chemistry students, data scientists, scientific and web developers. Collapse Key Words API ChEMBL KNIME Luigi Python REST Slack pipeline service workflow Collapse MESH Headings Collapse Grants Collapse
17	The druggable genome and support for target identification and validation in drug development. Sci Transl Med 2017;9:eaag1166. [PMID: 28356508 PMCID: PMC6321762 DOI: 10.1126/scitranslmed.aag1166] [Citation(s) in RCA: 319] [Impact Index Per Article: 45.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Accepted: 01/27/2017] [Indexed: 12/11/2022] Abstract Target identification (determining the correct drug targets for a disease) and target validation (demonstrating an effect of target perturbation on disease biomarkers and disease end points) are important steps in drug development. Clinically relevant associations of variants in genes encoding drug targets model the effect of modifying the same targets pharmacologically. To delineate drug development (including repurposing) opportunities arising from this paradigm, we connected complex disease- and biomarker-associated loci from genome-wide association studies to an updated set of genes encoding druggable human proteins, to agents with bioactivity against these targets, and, where there were licensed drugs, to clinical indications. We used this set of genes to inform the design of a new genotyping array, which will enable association studies of druggable genes for drug target selection and validation in human disease. Collapse Key Words Collapse MESH Headings Drug Discovery Drug Repositioning Genetic Loci Genome, Human Genome-Wide Association Study Humans Linkage Disequilibrium/genetics Molecular Targeted Therapy Phenotype Polymorphism, Single Nucleotide/genetics Reproducibility of Results Translational Research, Biomedical Collapse Grants WT104104/Z/14/Z Wellcome Trust WT086151/Z/08/Z Wellcome Trust PG/12/71/29684 British Heart Foundation PG12/71/29684 British Heart Foundation PG/13/66/30442 British Heart Foundation Wellcome Trust Collapse
18	Insights into Transporter Classifications: an Outline of Transporters as Drug Targets. METHODS AND PRINCIPLES IN MEDICINAL CHEMISTRY 2017. [DOI: 10.1002/9783527679430.ch1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
19	Pharos: Collating protein information to shed light on the druggable genome. Nucleic Acids Res 2017;45:D995-D1002. [PMID: 27903890 PMCID: PMC5210555 DOI: 10.1093/nar/gkw1072] [Citation(s) in RCA: 181] [Impact Index Per Article: 25.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Revised: 10/17/2016] [Accepted: 10/24/2016] [Indexed: 01/12/2023] Open Abstract The 'druggable genome' encompasses several protein families, but only a subset of targets within them have attracted significant research attention and thus have information about them publicly available. The Illuminating the Druggable Genome (IDG) program was initiated in 2014, has the goal of developing experimental techniques and a Knowledge Management Center (KMC) that would collect and organize information about protein targets from four families, representing the most common druggable targets with an emphasis on understudied proteins. Here, we describe two resources developed by the KMC: the Target Central Resource Database (TCRD) which collates many heterogeneous gene/protein datasets and Pharos (https://pharos.nih.gov), a multimodal web interface that presents the data from TCRD. We briefly describe the types and sources of data considered by the KMC and then highlight features of the Pharos interface designed to enable intuitive access to the IDG knowledgebase. The aim of Pharos is to encourage 'serendipitous browsing', whereby related, relevant information is made easily discoverable. We conclude by describing two use cases that highlight the utility of Pharos and TCRD. Collapse Key Words Collapse MESH Headings Cluster Analysis Computational Biology/methods Databases, Genetic Drug Discovery/methods Genomics/methods Humans Obesity/drug therapy Obesity/genetics Obesity/metabolism Pharmacogenetics/methods Search Engine Software Web Browser Collapse Grants P30 CA118100 NCI NIH HHS T32 HL007824 NHLBI NIH HHS U54 CA189201 NCI NIH HHS U54 CA189205 NCI NIH HHS Collapse
20	A comprehensive map of molecular drug targets. Nat Rev Drug Discov 2017;16:19-34. [PMID: 27910877 PMCID: PMC6314433 DOI: 10.1038/nrd.2016.230] [Citation(s) in RCA: 1292] [Impact Index Per Article: 184.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Abstract The success of mechanism-based drug discovery depends on the definition of the drug target. This definition becomes even more important as we try to link drug response to genetic variation, understand stratified clinical efficacy and safety, rationalize the differences between drugs in the same therapeutic class and predict drug utility in patient subgroups. However, drug targets are often poorly defined in the literature, both for launched drugs and for potential therapeutic agents in discovery and development. Here, we present an updated comprehensive map of molecular targets of approved drugs. We curate a total of 893 human and pathogen-derived biomolecules through which 1,578 US FDA-approved drugs act. These biomolecules include 667 human-genome-derived proteins targeted by drugs for human disease. Analysis of these drug targets indicates the continued dominance of privileged target families across disease areas, but also the growth of novel first-in-class mechanisms, particularly in oncology. We explore the relationships between bioactivity class and clinical success, as well as the presence of orthologues between human and animal models and between pathogen and human genomes. Through the collaboration of three independent teams, we highlight some of the ongoing challenges in accurately defining the targets of molecular therapeutics and present conventions for deconvoluting the complexities of molecular pharmacology and drug efficacy. Collapse Key Words Collapse MESH Headings Databases, Pharmaceutical Drug Approval Drug Delivery Systems/trends Drug Discovery/trends Drug Prescriptions/statistics & numerical data Genetic Variation Genome, Human Humans Pharmacogenetics/trends United States United States Food and Drug Administration Collapse Grants 22897 Cancer Research UK P30 CA118100 NCI NIH HHS UL1 TR001449 NCATS NIH HHS 11566 Cancer Research UK Wellcome Trust U54 CA189205 NCI NIH HHS Collapse
21	Open Targets: a platform for therapeutic target identification and validation. Nucleic Acids Res 2016;45:D985-D994. [PMID: 27899665 PMCID: PMC5210543 DOI: 10.1093/nar/gkw1055] [Citation(s) in RCA: 270] [Impact Index Per Article: 33.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Revised: 10/19/2016] [Accepted: 11/03/2016] [Indexed: 01/16/2023] Open Abstract We have designed and developed a data integration and visualization platform that provides evidence about the association of known and potential drug targets with diseases. The platform is designed to support identification and prioritization of biological targets for follow-up. Each drug target is linked to a disease using integrated genome-wide data from a broad range of data sources. The platform provides either a target-centric workflow to identify diseases that may be associated with a specific target, or a disease-centric workflow to identify targets that may be associated with a specific disease. Users can easily transition between these target- and disease-centric workflows. The Open Targets Validation Platform is accessible at https://www.targetvalidation.org. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
22	The ChEMBL database in 2017. Nucleic Acids Res 2016;45:D945-D954. [PMID: 27899562 PMCID: PMC5210557 DOI: 10.1093/nar/gkw1074] [Citation(s) in RCA: 1338] [Impact Index Per Article: 167.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Revised: 10/21/2016] [Accepted: 10/30/2016] [Indexed: 11/14/2022] Open Abstract ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 and 2014 Nucleic Acids Research Database Issues. Since then, alongside the continued extraction of data from the medicinal chemistry literature, new sources of bioactivity data have also been added to the database. These include: deposited data sets from neglected disease screening; crop protection data; drug metabolism and disposition data and bioactivity data from patents. A number of improvements and new features have also been incorporated. These include the annotation of assays and targets using ontologies, the inclusion of targets and indications for clinical candidates, addition of metabolic pathways for drugs and calculation of structural alerts. The ChEMBL data can be accessed via a web-interface, RDF distribution, data downloads and RESTful web-services. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
23	A drug target slim: using gene ontology and gene ontology annotations to navigate protein-ligand target space in ChEMBL. J Biomed Semantics 2016;7:59. [PMID: 27678076 PMCID: PMC5039825 DOI: 10.1186/s13326-016-0102-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 09/16/2016] [Indexed: 12/25/2022] Open Abstract Background The process of discovering new drugs is a lengthy, time-consuming and expensive process. Modern day drug discovery relies heavily on the rapid identification of novel ‘targets’, usually proteins that can be modulated by small molecule drugs to cure or minimise the effects of a disease. Of the 20,000 proteins currently reported as comprising the human proteome, just under a quarter of these can potentially be modulated by known small molecules Storing information in curated, actively maintained drug discovery databases can help researchers access current drug discovery information quickly. However with the increase in the amount of data generated from both experimental and in silico efforts, databases can become very large very quickly and information retrieval from them can become a challenge. The development of database tools that facilitate rapid information retrieval is important to keep up with the growth of databases. Description We have developed a Gene Ontology-based navigation tool (Gene Ontology Tree) to help users retrieve biological information to single protein targets in the ChEMBL drug discovery database. 99 % of single protein targets in ChEMBL have at least one GO annotation associated with them. There are 12,500 GO terms associated to 6200 protein targets in the ChEMBL database resulting in a total of 140,000 annotations. The slim we have created, the ‘ChEMBL protein target slim’ allows broad categorisation of the biology of 90 % of the protein targets using just 300 high level, informative GO terms. We used the GO slim method of assigning fewer higher level GO groupings to numerous very specific lower level terms derived from the GOA to describe a set of GO terms relevant to proteins in ChEMBL. We then used the slim created to provide a web based tool that allows a quick and easy navigation of protein target space. Terms from the GO are used to capture information on protein molecular function, biological process and subcellular localisations. The ChEMBL database also provides compound information for small molecules that have been tested for their effects on these protein targets. The ‘ChEMBL protein target slim’ provides a means of firstly describing the biology of protein drug targets and secondly allows users to easily establish a connection between biological and chemical information regarding drugs and drug targets in ChEMBL. The ‘ChEMBL protein target slim’ is available as a browsable ‘Gene Ontology Tree’ on the ChEMBL site under the browse targets tab (https://www.ebi.ac.uk/chembl/target/browser). A ChEMBL protein target slim OBO file containing the GO slim terms pertinent to ChEMBL is available from the GOC website (http://geneontology.org/page/go-slim-and-subset-guide). Conclusions We have created a protein target navigation tool based on the ‘ChEMBL protein target slim’. The ‘ChEMBL protein target slim’ provides a way of browsing protein targets in ChEMBL using high level GO terms that describe the molecular functions, processes and subcellular localisations of protein drug targets in drug discovery. The tool also allows user to establish a link between ontological groupings representing protein target biology to relevant compound information in ChEMBL. We have demonstrated by the use of a simple example how the ‘ChEMBL protein target slim’ can be used to link biological processes with drug information based on the information in the ChEMBL database. The tool has potential to aid in areas of drug discovery such as drug repurposing studies or drug-disease-protein pathways. Collapse Key Words Bioinformatics Biology Database Drug discovery Ontologies Protein Collapse MESH Headings Collapse Grants Collapse
24	Open PHACTS computational protocols for in silico target validation of cellular phenotypic screens: knowing the knowns. MEDCHEMCOMM 2016;7:1237-1244. [PMID: 27774140 PMCID: PMC5063042 DOI: 10.1039/c6md00065g] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Accepted: 05/10/2016] [Indexed: 01/09/2023] Abstract Phenotypic screening is in a renaissance phase and is expected by many academic and industry leaders to accelerate the discovery of new drugs for new biology. Given that phenotypic screening is per definition target agnostic, the emphasis of in silico and in vitro follow-up work is on the exploration of possible molecular mechanisms and efficacy targets underlying the biological processes interrogated by the phenotypic screening experiments. Herein, we present six exemplar computational protocols for the interpretation of cellular phenotypic screens based on the integration of compound, target, pathway, and disease data established by the IMI Open PHACTS project. The protocols annotate phenotypic hit lists and allow follow-up experiments and mechanistic conclusions. The annotations included are from ChEMBL, ChEBI, GO, WikiPathways and DisGeNET. Also provided are protocols which select from the IUPHAR/BPS Guide to PHARMACOLOGY interaction file selective compounds to probe potential targets and a correlation robot which systematically aims to identify an overlap of active compounds in both the phenotypic as well as any kinase assay. The protocols are applied to a phenotypic pre-lamin A/C splicing assay selected from the ChEMBL database to illustrate the process. The computational protocols make use of the Open PHACTS API and data and are built within the Pipeline Pilot and KNIME workflow tools. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
25	SureChEMBL: a large-scale, chemically annotated patent document database. Nucleic Acids Res 2015;44:D1220-8. [PMID: 26582922 PMCID: PMC4702887 DOI: 10.1093/nar/gkv1253] [Citation(s) in RCA: 111] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 11/01/2015] [Indexed: 11/13/2022] Open Abstract SureChEMBL is a publicly available large-scale resource containing compounds extracted from the full text, images and attachments of patent documents. The data are extracted from the patent literature according to an automated text and image-mining pipeline on a daily basis. SureChEMBL provides access to a previously unavailable, open and timely set of annotated compound-patent associations, complemented with sophisticated combined structure and keyword-based search capabilities against the compound repository and patent document corpus; given the wealth of knowledge hidden in patent documents, analysis of SureChEMBL data has immediate applications in drug discovery, medicinal chemistry and other commercial areas of chemical science. Currently, the database contains 17 million compounds extracted from 14 million patent documents. Access is available through a dedicated web-based interface and data downloads at: https://www.surechembl.org/. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
26	Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents. J Cheminform 2015;7:49. [PMID: 26457120 PMCID: PMC4594083 DOI: 10.1186/s13321-015-0097-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 09/29/2015] [Indexed: 11/28/2022] Open Abstract Background First public disclosure of new chemical entities often takes place in patents, which makes them an important source of information. However, with an ever increasing number of patent applications, manual processing and curation on such a large scale becomes even more challenging. An alternative approach better suited for this large corpus of documents is the automated extraction of chemical structures. A number of patent chemistry databases generated by using the latter approach are now available but little is known that can help to manage expectations when using them. This study aims to address this by comparing two such freely available sources, SureChEMBL and IBM SIIP (IBM Strategic Intellectual Property Insight Platform), with manually curated commercial databases. Results When looking at the percentage of chemical structures successfully extracted from a set of patents, using SciFinder as our reference, 59 and 51 % were also found in our comparison in SureChEMBL and IBM SIIP, respectively. When performing this comparison with compounds as starting point, i.e. establishing if for a list of compounds the databases provide the links between chemical structures and patents they appear in, we obtained similar results. SureChEMBL and IBM SIIP found 62 and 59 %, respectively, of the compound-patent pairs obtained from Reaxys. Conclusions In our comparison of automatically generated vs. manually curated patent chemistry databases, the former successfully provided approximately 60 % of links between chemical structure and patents. It needs to be stressed that only a very limited number of patents and compound-patent pairs were used for our comparison. Nevertheless, our results will hopefully help to manage expectations of users of patent chemistry databases of this type and provide a useful framework for more studies like ours as well as guide future developments of the workflows used for the automated extraction of chemical structures from patents. The challenges we have encountered whilst performing this study highlight that more needs to be done to make such assessments easier. Above all, more adequate, preferably open access to relevant ‘gold standards’ is required. Electronic supplementary material The online version of this article (doi:10.1186/s13321-015-0097-z) contains supplementary material, which is available to authorized users. Collapse Key Words IBM SIIP Patent chemistry databases Patents SureChEMBL Collapse MESH Headings Collapse Grants Collapse
27	Chemical databases: curation or integration by user-defined equivalence? DRUG DISCOVERY TODAY. TECHNOLOGIES 2015;14:17-24. [PMID: 26194583 PMCID: PMC6294287 DOI: 10.1016/j.ddtec.2015.01.005] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2014] [Revised: 01/15/2015] [Accepted: 01/16/2015] [Indexed: 11/30/2022] Abstract There is a wealth of valuable chemical information in publicly available databases for use by scientists undertaking drug discovery. However finite curation resource, limitations of chemical structure software and differences in individual database applications mean that exact chemical structure equivalence between databases is unlikely to ever be a reality. The ability to identify compound equivalence has been made significantly easier by the use of the International Chemical Identifier (InChI), a non-proprietary line-notation for describing a chemical structure. More importantly, advances in methods to identify compounds that are the same at various levels of similarity, such as those containing the same parent component or having the same connectivity, are now enabling related compounds to be linked between databases where the structure matches are not exact. Collapse Key Words Collapse MESH Headings Databases, Chemical Drug Discovery Molecular Structure Software Collapse Grants Wellcome Trust 086151 Wellcome Trust 104104 Wellcome Trust WT086151/Z/08/Z Wellcome Trust Collapse
28	ChEMBL web services: streamlining access to drug discovery data and utilities. Nucleic Acids Res 2015;43. [PMID: 25883136 PMCID: PMC4489243 DOI: 10.1093/nar%2fgkv352] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open Abstract ChEMBL is now a well-established resource in the fields of drug discovery and medicinal chemistry research. The ChEMBL database curates and stores standardized bioactivity, molecule, target and drug data extracted from multiple sources, including the primary medicinal chemistry literature. Programmatic access to ChEMBL data has been improved by a recent update to the ChEMBL web services (version 2.0.x, https://www.ebi.ac.uk/chembl/api/data/docs), which exposes significantly more data from the underlying database and introduces new functionality. To complement the data-focused services, a utility service (version 1.0.x, https://www.ebi.ac.uk/chembl/api/utils/docs), which provides RESTful access to commonly used cheminformatics methods, has also been concurrently developed. The ChEMBL web services can be used together or independently to build applications and data processing workflows relevant to drug discovery and chemical biology. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
29	ChEMBL web services: streamlining access to drug discovery data and utilities. Nucleic Acids Res 2015;43:W612-20. [PMID: 25883136 PMCID: PMC4489243 DOI: 10.1093/nar/gkv352] [Citation(s) in RCA: 344] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Accepted: 04/03/2015] [Indexed: 01/12/2023] Open Abstract ChEMBL is now a well-established resource in the fields of drug discovery and medicinal chemistry research. The ChEMBL database curates and stores standardized bioactivity, molecule, target and drug data extracted from multiple sources, including the primary medicinal chemistry literature. Programmatic access to ChEMBL data has been improved by a recent update to the ChEMBL web services (version 2.0.x, https://www.ebi.ac.uk/chembl/api/data/docs), which exposes significantly more data from the underlying database and introduces new functionality. To complement the data-focused services, a utility service (version 1.0.x, https://www.ebi.ac.uk/chembl/api/utils/docs), which provides RESTful access to commonly used cheminformatics methods, has also been concurrently developed. The ChEMBL web services can be used together or independently to build applications and data processing workflows relevant to drug discovery and chemical biology. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
30	PPDMs-a resource for mapping small molecule bioactivities from ChEMBL to Pfam-A protein domains. Bioinformatics 2014;31:776-8. [PMID: 25348214 PMCID: PMC4341065 DOI: 10.1093/bioinformatics/btu711] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open Abstract Summary: PPDMs is a resource that maps small molecule bioactivities to protein domains from the Pfam-A collection of protein families. Small molecule bioactivities mapped to protein domains add important precision to approaches that use protein sequence searches alignments to assist applications in computational drug discovery and systems and chemical biology. We have previously proposed a mapping heuristic for a subset of bioactivities stored in ChEMBL with the Pfam-A domain most likely to mediate small molecule binding. We have since refined this mapping using a manual procedure. Here, we present a resource that provides up-to-date mappings and the possibility to review assigned mappings as well as to participate in their assignment and curation. We also describe how mappings provided through the PPDMs resource are made accessible through the main schema of the ChEMBL database. Availability and implementation: The PPDMs resource and curation interface is available at https://www.ebi.ac.uk/chembl/research/ppdms/pfam_maps. The source-code for PPDMs is available under the Apache license at https://github.com/chembl/pfam_maps. Source code is available at https://github.com/chembl/pfam_map_loader to demonstrate the integration process with the main schema of ChEMBL. Contact:jpo@ebi.ac.uk Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
31	The complex portal--an encyclopaedia of macromolecular complexes. Nucleic Acids Res 2014;43:D479-84. [PMID: 25313161 PMCID: PMC4384031 DOI: 10.1093/nar/gku975] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open Abstract The IntAct molecular interaction database has created a new, free, open-source, manually curated resource, the Complex Portal (www.ebi.ac.uk/intact/complex), through which protein complexes from major model organisms are being collated and made available for search, viewing and download. It has been built in close collaboration with other bioinformatics services and populated with data from ChEMBL, MatrixDB, PDBe, Reactome and UniProtKB. Each entry contains information about the participating molecules (including small molecules and nucleic acids), their stoichiometry, topology and structural assembly. Complexes are annotated with details about their function, properties and complex-specific Gene Ontology (GO) terms. Consistent nomenclature is used throughout the resource with systematic names, recommended names and a list of synonyms all provided. The use of the Evidence Code Ontology allows us to indicate for which entries direct experimental evidence is available or if the complex has been inferred based on homology or orthology. The data are searchable using standard identifiers, such as UniProt, ChEBI and GO IDs, protein, gene and complex names or synonyms. This reference resource will be maintained and grow to encompass an increasing number of organisms. Input from groups and individuals with specific areas of expertise is welcome. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
32	UniChem: extension of InChI-based compound mapping to salt, connectivity and stereochemistry layers. J Cheminform 2014;6:43. [PMID: 25221628 PMCID: PMC4158273 DOI: 10.1186/s13321-014-0043-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2014] [Accepted: 09/01/2014] [Indexed: 11/10/2022] Open Abstract UniChem is a low-maintenance, fast and freely available compound identifier mapping service, recently made available on the Internet. Until now, the criterion of molecular equivalence within UniChem has been on the basis of complete identity between Standard InChIs. However, a limitation of this approach is that stereoisomers, isotopes and salts of otherwise identical molecules are not considered as related. Here, we describe how we have exploited the layered structural representation of the Standard InChI to create new functionality within UniChem that integrates these related molecular forms. The service, called 'Connectivity Search' allows molecules to be first matched on the basis of complete identity between the connectivity layer of their corresponding Standard InChIs, and the remaining layers then compared to highlight stereochemical and isotopic differences. Parsing of Standard InChI sub-layers permits mixtures and salts to also be included in this integration process. Implementation of these enhancements required simple modifications to the schema, loader and web application, but none of which have changed the original UniChem functionality or services. The scope of queries may be varied using a variety of easily configurable options, and the output is annotated to assist the user to filter, sort and understand the difference between query and retrieved structures. A RESTful web service output may be easily processed programmatically to allow developers to present the data in whatever form they believe their users will require, or to define their own level of molecular equivalence for their resource, albeit within the constraint of identical connectivity. Collapse Key Words Chemical databases Connectivity search Data integration InChIKey Standard InChI UniChem Collapse MESH Headings Collapse Grants Collapse
33	Transporter assays and assay ontologies: useful tools for drug discovery. DRUG DISCOVERY TODAY. TECHNOLOGIES 2014;12:e47-e54. [PMID: 25027375 DOI: 10.1016/j.ddtec.2014.03.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023] Abstract Transport proteins represent an eminent class of drug targets and ADMET (absorption, distribution, metabolism, excretion, toxicity) associated genes. There exists a large number of distinct activity assays for transport proteins, depending on not only the measurement needed (e.g. transport activity, strength of ligand–protein interaction), but also due to heterogeneous assay setups used by different research groups. Efforts to systematically organize this (divergent) bioassay data have large potential impact in Public-Private partnership and conventional commercial drug discovery. In this short review, we highlight some of the frequently used high-throughput assays for transport proteins, and we discuss emerging assay ontologies and their application to this field. Focusing on human P-glycoprotein (Multidrug resistance protein 1; gene name: ABCB1, MDR1), we exemplify how annotation of bioassay data per target class could improve and add to existing ontologies, and we propose to include an additional layer of metadata supporting data fusion across different bioassays. Collapse Key Words Collapse MESH Headings Biological Ontologies Drug Discovery/methods High-Throughput Screening Assays Membrane Transport Proteins/chemistry Membrane Transport Proteins/classification Membrane Transport Proteins/metabolism Pharmaceutical Preparations/chemistry Pharmaceutical Preparations/metabolism Collapse Grants F 3502 Austrian Science Fund FWF WT086151/Z/08/Z Wellcome Trust Collapse
34	Transporter taxonomy - a comparison of different transport protein classification schemes. DRUG DISCOVERY TODAY. TECHNOLOGIES 2014;12:e37-e46. [PMID: 25027374 DOI: 10.1016/j.ddtec.2014.03.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023] Abstract Currently, there are more than 800 well characterized human membrane transport proteins (including channels and transporters) and there are estimates that about 10% (approx. 2000) of all human genes are related to transport. Membrane transport proteins are of interest as potential drug targets, for drug delivery, and as a cause of side effects and drug–drug interactions. In light of the development of Open PHACTS, which provides an open pharmacological space, we analyzed selected membrane transport protein classification schemes (Transporter Classification Database, ChEMBL, IUPHAR/BPS Guide to Pharmacology, and Gene Ontology) for their ability to serve as a basis for pharmacology driven protein classification. A comparison of these membrane transport protein classification schemes by using a set of clinically relevant transporters as use-case reveals the strengths and weaknesses of the different taxonomy approaches. Collapse Key Words Collapse MESH Headings Classification Databases, Pharmaceutical Databases, Protein Drug Discovery Gene Ontology Humans Membrane Transport Proteins/chemistry Membrane Transport Proteins/classification Membrane Transport Proteins/genetics Collapse Grants Collapse
35	Chemical, target, and bioactive properties of allosteric modulation. PLoS Comput Biol 2014;10:e1003559. [PMID: 24699297 PMCID: PMC3974644 DOI: 10.1371/journal.pcbi.1003559] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Accepted: 02/21/2014] [Indexed: 11/22/2022] Open Abstract Allosteric modulators are ligands for proteins that exert their effects via a different binding site than the natural (orthosteric) ligand site and hence form a conceptually distinct class of ligands for a target of interest. Here, the physicochemical and structural features of a large set of allosteric and non-allosteric ligands from the ChEMBL database of bioactive molecules are analyzed. In general allosteric modulators are relatively smaller, more lipophilic and more rigid compounds, though large differences exist between different targets and target classes. Furthermore, there are differences in the distribution of targets that bind these allosteric modulators. Allosteric modulators are over-represented in membrane receptors, ligand-gated ion channels and nuclear receptor targets, but are underrepresented in enzymes (primarily proteases and kinases). Moreover, allosteric modulators tend to bind to their targets with a slightly lower potency (5.96 log units versus 6.66 log units, p<0.01). However, this lower absolute affinity is compensated by their lower molecular weight and more lipophilic nature, leading to similar binding efficiency and surface efficiency indices. Subsequently a series of classifier models are trained, initially target class independent models followed by finer-grained target (architecture/functional class) based models using the target hierarchy of the ChEMBL database. Applications of these insights include the selection of likely allosteric modulators from existing compound collections, the design of novel chemical libraries biased towards allosteric regulators and the selection of targets potentially likely to yield allosteric modulators on screening. All data sets used in the paper are available for download. The physicochemistry and topography of ligand binding sites is generally conserved amongst related proteins, however, comparisons of the pharmacology of related targets (and even the same target) are often confounded by the existence of multiple, distinct, binding sites within the same protein. Importantly, these multiple binding sites can have ‘druggability’ or selectivity properties, and can therefore offer attractive novel approaches to develop new therapeutic agents. In this paper, sets of known ligands binding to the same target are classified as being either allosteric (binding at a site that is non-competitive for a natural ligand/substrate) or non-allosteric (binding at the same site as a natural substrate), it is demonstrated that there are differences in the profiles of ligands discovered empirically against these sites. Finally predictive models are developed with several useful applications in drug discovery. Collapse Key Words Collapse MESH Headings Allosteric Regulation Databases, Chemical Ligands Models, Chemical Molecular Weight Collapse Grants Wellcome Trust 86151/Z/08/Z Wellcome Trust Collapse
36	The EBI RDF platform: linked open data for the life sciences. Bioinformatics 2014;30:1338-9. [PMID: 24413672 PMCID: PMC3998127 DOI: 10.1093/bioinformatics/btt765] [Citation(s) in RCA: 117] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open Abstract Motivation: Resource description framework (RDF) is an emerging technology for describing, publishing and linking life science data. As a major provider of bioinformatics data and services, the European Bioinformatics Institute (EBI) is committed to making data readily accessible to the community in ways that meet existing demand. The EBI RDF platform has been developed to meet an increasing demand to coordinate RDF activities across the institute and provides a new entry point to querying and exploring integrated resources available at the EBI. Availability:http://www.ebi.ac.uk/rdf Contact:jupp@ebi.ac.uk Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
37	The ChEMBL bioactivity database: an update. Nucleic Acids Res 2013;42:D1083-90. [PMID: 24214965 PMCID: PMC3965067 DOI: 10.1093/nar/gkt1031] [Citation(s) in RCA: 1022] [Impact Index Per Article: 92.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open Abstract ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 Nucleic Acids Research Database Issue. Since then, a variety of new data sources and improvements in functionality have contributed to the growth and utility of the resource. In particular, more comprehensive tracking of compounds from research stages through clinical development to market is provided through the inclusion of data from United States Adopted Name applications; a new richer data model for representing drug targets has been developed; and a number of methods have been put in place to allow users to more easily identify reliable data. Finally, access to ChEMBL is now available via a new Resource Description Framework format, in addition to the web-based interface, data downloads and web services. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
38	Drug target central. Expert Opin Drug Discov 2013;4:857-72. [PMID: 23496271 DOI: 10.1517/17460440903049290] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Abstract BACKGROUND One of the primary pillars of drug discovery is the drug target, its relationship to both the drugs designed against it and the biological processes in which it is involved. Here we review the informatics approaches required to build a complete catalogue of known drug targets. OBJECTIVE Using Pfizer's internal target database as a narrative, we review the steps involved in the construction of an integrated, enterprise target-informatics system. We consider how compiling the drug target universe requires integration across several resources such as competitor intelligence and pharmacological activity databases, as well as input from techniques such as text-mining. In particular, we address data standards and the complexities of representing targets in a structured ontology as well as opportunities for future development. CONCLUSION Drug target-orientated databases address important areas of drug discovery such as chemogenomics, drug/candidate repurposing and business intelligence. As research in industry and academia drives continued expansion of the druggable genome, it is crucial that such systems be maintained to provide an accurate picture of the landscape. This power of this information stretches beyond drug discovery and into the wider scientific community where small molecule tool compounds can enable the dissection of complex cellular pathways. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
39	UniChem: a unified chemical structure cross-referencing and identifier tracking system. J Cheminform 2013;5:3. [PMID: 23317286 PMCID: PMC3616875 DOI: 10.1186/1758-2946-5-3] [Citation(s) in RCA: 99] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2012] [Accepted: 01/04/2013] [Indexed: 11/10/2022] Open Abstract UniChem is a freely available compound identifier mapping service on the internet, designed to optimize the efficiency with which structure-based hyperlinks may be built and maintained between chemistry-based resources. In the past, the creation and maintenance of such links at EMBL-EBI, where several chemistry-based resources exist, has required independent efforts by each of the separate teams. These efforts were complicated by the different data models, release schedules, and differing business rules for compound normalization and identifier nomenclature that exist across the organization. UniChem, a large-scale, non-redundant database of Standard InChIs with pointers between these structures and chemical identifiers from all the separate chemistry resources, was developed as a means of efficiently sharing the maintenance overhead of creating these links. Thus, for each source represented in UniChem, all links to and from all other sources are automatically calculated and immediately available for all to use. Updated mappings are immediately available upon loading of new data releases from the sources. Web services in UniChem provide users with a single simple automatable mechanism for maintaining all links from their resource to all other sources represented in UniChem. In addition, functionality to track changes in identifier usage allows users to monitor which identifiers are current, and which are obsolete. Lastly, UniChem has been deliberately designed to allow additional resources to be included with minimal effort. Indeed, the recent inclusion of data sources external to EMBL-EBI has provided a simple means of providing users with an even wider selection of resources with which to link to, all at no extra cost, while at the same time providing a simple mechanism for external resources to link to all EMBL-EBI chemistry resources. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
40	Shouldn't enantiomeric purity be included in the 'minimum information about a bioactive entity? Response from the MIABE group. Nat Rev Drug Discov 2012. [DOI: 10.1038/nrd3503-c2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
41	Visualizing the drug target landscape. Drug Discov Today 2011;17 Suppl:S3-15. [PMID: 22178891 DOI: 10.1016/j.drudis.2011.12.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Abstract Generating new therapeutic hypotheses for human disease requires the analysis and interpretation of many different experimental datasets. Assembling a holistic picture of the current landscape of drug discovery activity remains a challenge, however, because of the lack of integration between biological, chemical and clinical resources. Although tools designed to tackle the interpretation of individual data types are abundant, systems that bring together multiple elements to directly enable decision making within drug discovery programmes are rare. In this article, we review the path that led to the development of a knowledge system to tackle this problem within our organization and highlight the influences of existing technologies on its development. Central to our approach is the use of visualization to better convey the overall meaning of an integrated set of data including disease association, druggability, competitor intelligence, genomics and text mining. Organizing such data along lines of therapeutic precedence creates clearly distinct 'zones' of pharmaceutical opportunity, ranging from small-molecule repurposing to biotherapeutic prospects and gene family exploitation. Mapping content in this way also provides a visual alerting mechanism that evaluates new evidence in the context of old, reducing information overload by filtering redundant information. In addition, we argue the need for more tools in this space and highlight the role that data standards, new technologies and increased collaboration might have in achieving this aim. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
42	ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 2011;40:D1100-7. [PMID: 21948594 PMCID: PMC3245175 DOI: 10.1093/nar/gkr777] [Citation(s) in RCA: 2338] [Impact Index Per Article: 179.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open Abstract ChEMBL is an Open Data database containing binding, functional and ADMET information for a large number of drug-like bioactive compounds. These data are manually abstracted from the primary published literature on a regular basis, then further curated and standardized to maximize their quality and utility across a wide range of chemical biology and drug-discovery research problems. Currently, the database contains 5.4 million bioactivity measurements for more than 1 million compounds and 5200 protein targets. Access is available through a web-based interface, data downloads and web services at: https://www.ebi.ac.uk/chembldb. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
43	Minimum information about a bioactive entity (MIABE). Nat Rev Drug Discov 2011;10:661-9. [DOI: 10.1038/nrd3503] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
44	PSICQUIC and PSISCORE: accessing and scoring molecular interactions. Nat Methods 2011;8:528-9. [PMID: 21716279 DOI: 10.1038/nmeth.1637] [Citation(s) in RCA: 233] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
45	Visualizing the drug target landscape. Drug Discov Today 2010;15:3-15. [DOI: 10.1016/j.drudis.2009.09.011] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2009] [Revised: 09/14/2009] [Accepted: 09/15/2009] [Indexed: 11/28/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
46	Functional assignment of MAPK phosphatase domains. Proteins 2007;69:19-31. [PMID: 17596826 DOI: 10.1002/prot.21477] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Abstract Mitogen-activated protein kinase (MAPK) pathways are well conserved in most organisms, from yeast to humans. The principal components of these pathways are MAP kinases whose activity is regulated by phosphorylation, implicating various MAPK protein effectors-in particular, protein phosphatases that inactivate MAPKs by dephosphorylation. The molecular basis of binding specificity of such regulatory phosphatases to MAPKs is poorly understood. To try to pinpoint potential functional regions within the sequences and to help identify new family members, we have applied a multimotif pattern-recognition approach to characterize two MAPK phosphatase subfamilies (tyrosine-specific and dual specificity) that are crucial in the regulation of MAPKs. We built "fingerprints" for these two subfamilies that are unique to, and highly discriminatory for, each group of proteins. The fingerprints were used in a genome-wide screen, identifying more than 80 MAPK phosphatase domains, several of which were in partial sequences or unclassified proteins. We confirmed experimentally that one predicted MAPK phosphatase orthologue in Xenopus binds to ERK1/2, suggesting a role in MAPK signaling and thus supporting our functional predictions. Further analysis, mapping the fingerprints on the three-dimensional structure of MAPK phosphatases, revealed that some of the fingerprint motifs reside in the N-terminal noncatalytic regions coinciding with reported MAPK binding sites, while others lie within the catalytic phosphatase domain. These results also suggest the presence of putative allosteric sites in the catalytic region for modulation of protein-protein interactions, and provide a framework for future experimental validation. Collapse Key Words Collapse MESH Headings Amino Acid Motifs Amino Acid Sequence Animals Binding Sites Catalytic Domain Genome Humans Mitogen-Activated Protein Kinases/chemistry Mitogen-Activated Protein Kinases/metabolism Molecular Sequence Data Peptide Mapping/methods Phosphoprotein Phosphatases/chemistry Phosphoprotein Phosphatases/metabolism Phosphorylation Protein Structure, Tertiary Recombinant Fusion Proteins Sequence Alignment/methods Xenopus laevis Collapse Grants 065433 Wellcome Trust 069899 Wellcome Trust Collapse
47	Motif3D: Relating protein sequence motifs to 3D structure. Nucleic Acids Res 2003;31:3333-6. [PMID: 12824320 PMCID: PMC168941 DOI: 10.1093/nar/gkg534] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open Abstract Motif3D is a web-based protein structure viewer designed to allow sequence motifs, and in particular those contained in the fingerprints of the PRINTS database, to be visualised on three-dimensional (3D) structures. Additional functionality is provided for the rhodopsin-like G protein-coupled receptors, enabling fingerprint motifs of any of the receptors in this family to be mapped onto the single structure available, that of bovine rhodopsin. Motif3D can be used via the web interface available at: http://www.bioinf.man.ac.uk/dbbrowser/motif3d/motif3d.html. Collapse Key Words Collapse MESH Headings Amino Acid Motifs Animals Computer Graphics Databases, Protein Internet Models, Molecular Protein Conformation Proteins/chemistry Rhodopsin/chemistry Software Collapse Grants Collapse
48	Bioinformatics approaches for the classification of G-protein-coupled receptors. Curr Opin Pharmacol 2003;3:114-20. [PMID: 12681231 DOI: 10.1016/s1471-4892(03)00005-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Abstract G-protein-coupled receptors are found abundantly in the human genome, and are the targets of numerous prescribed drugs. However, many receptors remain orphaned (i.e. with unknown ligand specificity), and others remain poorly characterised, with little structural information available. Consequently, there is often a gulf between sequence data and structural and functional knowledge of a receptor. Bioinformatics approaches may offer one approach to bridging this gap. In particular, protein family databases, which distil information from multiple sequence alignments into characteristic signatures, could be used to identify the families to which orphan receptors belong, and might facilitate discovery of novel motifs associated with ligand binding and G-protein-coupling. Collapse Key Words Collapse MESH Headings Animals Computational Biology/methods Computational Biology/statistics & numerical data Databases, Genetic/statistics & numerical data GTP-Binding Proteins/classification GTP-Binding Proteins/genetics Humans Receptors, Cell Surface/classification Receptors, Cell Surface/genetics Collapse Grants Collapse
49	PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res 2003;31:400-2. [PMID: 12520033 PMCID: PMC165477 DOI: 10.1093/nar/gkg030] [Citation(s) in RCA: 299] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open Abstract The PRINTS database houses a collection of protein fingerprints. These may be used to assign uncharacterised sequences to known families and hence to infer tentative functions. The September 2002 release (version 36.0) includes 1800 fingerprints, encoding approximately 11 000 motifs, covering a range of globular and membrane proteins, modular polypeptides and so on. In addition to its continued steady growth, we report here the development of an automatic supplement, prePRINTS, designed to increase the coverage of the resource and reduce some of the manual burdens inherent in its maintenance. The databases are accessible for interrogation and searching at http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/. Collapse Key Words Collapse MESH Headings Amino Acid Motifs Animals Automation Conserved Sequence Databases, Protein Proteins/chemistry Software Collapse Grants Collapse
50	Deriving structural and functional insights from a ligand-based hierarchical classification of G protein-coupled receptors. Protein Eng Des Sel 2002;15:7-12. [PMID: 11842232 DOI: 10.1093/protein/15.1.7] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open Abstract G protein-coupled receptors (GPCRs) constitute the largest known family of cell-surface receptors. With hundreds of members populating the rhodopsin-like GPCR superfamily and many more awaiting discovery in the human genome, they are of interest to the pharmaceutical industry because of the opportunities they afford for yielding potentially lucrative drug targets. Typical sequence analysis strategies for identifying novel GPCRs tend to involve similarity searches using standard primary database search tools. This will reveal the most similar sequence, generally without offering any insight into its family or superfamily relationships. Conversely, searches of most 'pattern' or family databases are likely to identify the superfamily, but not the closest matching subtype. Here we describe a diagnostic resource that allows identification of GPCRs in a hierarchical fashion, based principally upon their ligand preference. This resource forms part of the PRINTS database, which now houses approximately 250 GPCR-specific fingerprints (http://www.bioinf.man.ac.uk/dbbrowser/gpcrPRINTS/). This collection of fingerprints is able to provide more sensitive diagnostic opportunities than have been realized by related approaches and is currently the only diagnostic tool for assigning GPCR subtypes. Mapping such fingerprints on to three-dimensional GPCR models offers powerful insights into the structural and functional determinants of subtype specificity. Collapse Key Words Collapse MESH Headings Algorithms Amino Acid Motifs Amino Acid Sequence Animals Databases, Protein Heterotrimeric GTP-Binding Proteins/chemistry Heterotrimeric GTP-Binding Proteins/classification Heterotrimeric GTP-Binding Proteins/metabolism Humans Ligands Protein Conformation Protein Structure, Tertiary Rats Receptor, Melanocortin, Type 4 Receptors, Cell Surface/metabolism Receptors, Peptide/chemistry Receptors, Peptide/metabolism Rhodopsin/chemistry Rhodopsin/metabolism Sensitivity and Specificity Sequence Alignment Sequence Analysis, Protein/methods Sheep Collapse Grants Collapse