1
|
van den Bergh T, Tamo G, Nobili A, Tao Y, Tan T, Bornscheuer UT, Kuipers RKP, Vroling B, de Jong RM, Subramanian K, Schaap PJ, Desmet T, Nidetzky B, Vriend G, Joosten HJ. CorNet: Assigning function to networks of co-evolving residues by automated literature mining. PLoS One 2017; 12:e0176427. [PMID: 28545124 PMCID: PMC5436653 DOI: 10.1371/journal.pone.0176427] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2016] [Accepted: 12/12/2016] [Indexed: 12/30/2022] Open
Abstract
CorNet is a web-based tool for the analysis of co-evolving residue positions in protein super-family sequence alignments. CorNet projects external information such as mutation data extracted from literature on interactively displayed groups of co-evolving residue positions to shed light on the functions associated with these groups and the residues in them. We used CorNet to analyse six enzyme super-families and found that groups of strongly co-evolving residues tend to consist of residues involved in a same function such as activity, specificity, co-factor binding, or enantioselectivity. This finding allows to assign a function to residues for which no data is available yet in the literature. A mutant library was designed to mutate residues observed in a group of co-evolving residues predicted to be involved in enantioselectivity, but for which no literature data is available yet. The resulting set of mutations indeed showed many instances of increased enantioselectivity.
Collapse
Affiliation(s)
- Tom van den Bergh
- Bio-Prodict, Nijmegen, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, The Netherlands
| | | | - Alberto Nobili
- Institute of Biochemistry, Department of Biotechnology & Enzyme Catalysis, Greifswald University, Greifswald, Germany
| | - Yifeng Tao
- Institute of Biochemistry, Department of Biotechnology & Enzyme Catalysis, Greifswald University, Greifswald, Germany
- Beijing Key Lab of Bioprocess, Beijing University of Chemical Technology, Chaoyang, Beijing, China
| | - Tianwei Tan
- Beijing Key Lab of Bioprocess, Beijing University of Chemical Technology, Chaoyang, Beijing, China
| | - Uwe T. Bornscheuer
- Institute of Biochemistry, Department of Biotechnology & Enzyme Catalysis, Greifswald University, Greifswald, Germany
| | | | | | | | | | - Peter J. Schaap
- Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, The Netherlands
| | - Tom Desmet
- Centre for Industrial Biotechnology and Biocatalysis, Ghent University, Ghent, Belgium
| | - Bernd Nidetzky
- Institute of Biotechnology and Biochemical Engineering, Graz University of Technology, Graz, Austria
| | | | - Henk-Jan Joosten
- Bio-Prodict, Nijmegen, The Netherlands
- CMBI, Radboudumc, Nijmegen, The Netherlands
- * E-mail:
| |
Collapse
|
2
|
Isberg V, de Graaf C, Bortolato A, Cherezov V, Katritch V, Marshall FH, Mordalski S, Pin JP, Stevens RC, Vriend G, Gloriam DE. Generic GPCR residue numbers - aligning topology maps while minding the gaps. Trends Pharmacol Sci 2014; 36:22-31. [PMID: 25541108 DOI: 10.1016/j.tips.2014.11.001] [Citation(s) in RCA: 326] [Impact Index Per Article: 32.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2014] [Revised: 11/05/2014] [Accepted: 11/07/2014] [Indexed: 12/31/2022]
Abstract
Generic residue numbers facilitate comparisons of, for example, mutational effects, ligand interactions, and structural motifs. The numbering scheme by Ballesteros and Weinstein for residues within the class A GPCRs (G protein-coupled receptors) has more than 1100 citations, and the recent crystal structures for classes B, C, and F now call for a community consensus in residue numbering within and across these classes. Furthermore, the structural era has uncovered helix bulges and constrictions that offset the generic residue numbers. The use of generic residue numbers depends on convenient access by pharmacologists, chemists, and structural biologists. We review the generic residue numbering schemes for each GPCR class, as well as a complementary structure-based scheme, and provide illustrative examples and GPCR database (GPCRDB) web tools to number any receptor sequence or structure.
Collapse
Affiliation(s)
- Vignir Isberg
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Chris de Graaf
- Division of Medicinal Chemistry, Faculty of Sciences, Amsterdam Institute for Molecules, Medicines and Systems, VU University Amsterdam, The Netherlands
| | | | - Vadim Cherezov
- The Bridge@USC, Department of Chemistry, University of Southern California, Los Angeles, CA 90089 USA
| | - Vsevolod Katritch
- The Bridge@USC, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089 USA
| | | | - Stefan Mordalski
- Department of Medicinal Chemistry, Institute of Pharmacology, Polish Academy of Sciences, Krakow, Poland; Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Krakow, Poland
| | - Jean-Philippe Pin
- Institute of Functional Genomics, Centre National de la Recherche Scientifique (CNRS) Unité Mixte de Recherche 5203, Universities Montpellier, Montpellier, France; Institut National de la Santé et de la Recherche Médicale (INSERM) Unité 661, Montpellier, France
| | - Raymond C Stevens
- The Bridge@USC, Department of Chemistry, University of Southern California, Los Angeles, CA 90089 USA; The Bridge@USC, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089 USA
| | - Gerrit Vriend
- Centre for Molecular and Biomolecular Informatics (CMBI), Radboudumc, Nijmegen, The Netherlands
| | - David E Gloriam
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
3
|
van der Kant R, Vriend G. Alpha-bulges in G protein-coupled receptors. Int J Mol Sci 2014; 15:7841-64. [PMID: 24806342 PMCID: PMC4057707 DOI: 10.3390/ijms15057841] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Revised: 04/02/2014] [Accepted: 04/09/2014] [Indexed: 12/31/2022] Open
Abstract
Agonist binding is related to a series of motions in G protein-coupled receptors (GPCRs) that result in the separation of transmembrane helices III and VI at their cytosolic ends and subsequent G protein binding. A large number of smaller motions also seem to be associated with activation. Most helices in GPCRs are highly irregular and often contain kinks, with extensive literature already available about the role of prolines in kink formation and the precise function of these kinks. GPCR transmembrane helices also contain many α-bulges. In this article we aim to draw attention to the role of these α-bulges in ligand and G-protein binding, as well as their role in several aspects of the mobility associated with GPCR activation. This mobility includes regularization and translation of helix III in the extracellular direction, a rotation of the entire helix VI, an inward movement of the helices near the extracellular side, and a concerted motion of the cytosolic ends of the helices that makes their orientation appear more circular and that opens up space for the G protein to bind. In several cases, α-bulges either appear or disappear as part of the activation process.
Collapse
Affiliation(s)
- Rob van der Kant
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Geert Grooteplein 26-28, 6525 GA Nijmegen, The Netherlands.
| | - Gert Vriend
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Geert Grooteplein 26-28, 6525 GA Nijmegen, The Netherlands.
| |
Collapse
|
4
|
Isberg V, Vroling B, van der Kant R, Li K, Vriend G, Gloriam D. GPCRDB: an information system for G protein-coupled receptors. Nucleic Acids Res 2013; 42:D422-5. [PMID: 24304901 PMCID: PMC3965068 DOI: 10.1093/nar/gkt1255] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
For the past 20 years, the GPCRDB (G protein-coupled receptors database; http://www.gpcr.org/7tm/) has been a ‘one-stop shop’ for G protein-coupled receptor (GPCR)-related data. The GPCRDB contains experimental data on sequences, ligand-binding constants, mutations and oligomers, as well as many different types of computationally derived data, such as multiple sequence alignments and homology models. The GPCRDB also provides visualization and analysis tools, plus a number of query systems. In the latest GPCRDB release, all multiple sequence alignments, and >65 000 homology models, have been significantly improved, thanks to a recent flurry of GPCR X-ray structure data. Tools were introduced to browse X-ray structures, compare binding sites, profile similar receptors and generate amino acid conservation statistics. Snake plots and helix box diagrams can now be custom coloured (e.g. by chemical properties or mutation data) and saved as figures. A series of sequence alignment visualization tools has been added, and sequence alignments can now be created for subsets of sequences and sequence positions, and alignment statistics can be produced for any of these subsets.
Collapse
Affiliation(s)
- Vignir Isberg
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, DK-2100 Copenhagen, Denmark, Bio-Prodict B.V., Castellastraat 116, 6512 EZ, Nijmegen, The Netherlands and CMBI, NCMLS, Radboudumc Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA, Nijmegen, The Netherlands
| | | | | | | | | | | |
Collapse
|
5
|
Eriksson M, Nilsson I, Kogej T, Southan C, Johansson M, Tyrchan C, Muresan S, Blomberg N, Bjäreland M. SARConnect: A Tool to Interrogate the Connectivity Between Proteins, Chemical Structures and Activity Data. Mol Inform 2012; 31:555-568. [PMID: 23308082 PMCID: PMC3535785 DOI: 10.1002/minf.201200030] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2012] [Accepted: 04/14/2012] [Indexed: 11/21/2022]
Abstract
The access and use of large-scale structure-activity relationships (SAR) is increasing as the range of targets and availability of bioactive compound-to-protein mappings expands. However, effective exploitation requires merging and normalisation of activity data, mappings to target classifications as well as visual display of chemical structure relationships. This work describes the development of the application "SARConnect" to address these issues. We discuss options for delivery and analysis of large-scale SAR data together with a set of use-cases to illustrate the design choices and utility. The main activity sources of ChEMBL,1 GOSTAR2 and AstraZeneca's internal system IBIS, had already been integrated in Chemistry Connect.3 For target relationships we selected human UniProtKB/Swiss-Prot4 as our primary source of a heuristic target classification. Similarly, to explore chemical relationships we combined several methods for framework and scaffold analysis into a unified, hierarchical classification where ease of navigation was the primary goal. An application was built on TIBCO Spotfire to retrieve data for visual display. Consequently, users can explore relationships between target, activity and structure across internal, external and commercial sources that encompass approximately 3 million compounds, 2000 human proteins and 10 million activity values. Examples showing the utility of the application are given.
Collapse
Affiliation(s)
- Mats Eriksson
- Discovery Sciences, Computational
Sciences, AstraZeneca R&D Mölndal,
S-431 83 Mölndal, Sweden
| | | | - Thierry Kogej
- Discovery Sciences, Computational
Sciences, AstraZeneca R&D Mölndal,
S-431 83 Mölndal, Sweden
| | | | | | | | - Sorel Muresan
- Discovery Sciences, Computational
Sciences, AstraZeneca R&D Mölndal,
S-431 83 Mölndal, Sweden
| | | | | |
Collapse
|
6
|
Layout-aware text extraction from full-text PDF of scientific articles. SOURCE CODE FOR BIOLOGY AND MEDICINE 2012; 7:7. [PMID: 22640904 PMCID: PMC3441580 DOI: 10.1186/1751-0473-7-7] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2012] [Accepted: 05/28/2012] [Indexed: 11/17/2022]
Abstract
Background The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the ‘Layout-Aware PDF Text Extraction’ (LA-PDFText) system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications. Results Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1) Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2) Classifying text blocks into rhetorical categories using a rule-based method and (3) Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision1 = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, 2commonly used to extract text from PDF. Finally, we discuss preliminary error analysis for our system and identify further areas of improvement. Conclusions LA-PDFText is an open-source tool for accurately extracting text from full-text scientific articles. The release of the system is available at http://code.google.com/p/lapdftext/.
Collapse
|
7
|
Seddon G, Lounnas V, McGuire R, van den Bergh T, Bywater RP, Oliveira L, Vriend G. Drug design for ever, from hype to hope. J Comput Aided Mol Des 2012; 26:137-50. [PMID: 22252446 PMCID: PMC3268973 DOI: 10.1007/s10822-011-9519-9] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Accepted: 12/05/2011] [Indexed: 01/28/2023]
Abstract
In its first 25 years JCAMD has been disseminating a large number of techniques aimed at finding better medicines faster. These include genetic algorithms, COMFA, QSAR, structure based techniques, homology modelling, high throughput screening, combichem, and dozens more that were a hype in their time and that now are just a useful addition to the drug-designers toolbox. Despite massive efforts throughout academic and industrial drug design research departments, the number of FDA-approved new molecular entities per year stagnates, and the pharmaceutical industry is reorganising accordingly. The recent spate of industrial consolidations and the concomitant move towards outsourcing of research activities requires better integration of all activities along the chain from bench to bedside. The next 25 years will undoubtedly show a series of translational science activities that are aimed at a better communication between all parties involved, from quantum chemistry to bedside and from academia to industry. This will above all include understanding the underlying biological problem and optimal use of all available data.
Collapse
Affiliation(s)
- G Seddon
- Adelard Institute, Manchester, UK
| | | | | | | | | | | | | |
Collapse
|