151
|
Abstract
The use of macromolecular structures is widespread for a variety of applications, from teaching protein structure principles all the way to ligand optimization in drug development. Applying data mining techniques on these experimentally determined structures requires a highly uniform, standardized structural data source. The Protein Data Bank (PDB) has evolved over the years toward becoming the standard resource for macromolecular structures. However, the process selecting the data most suitable for specific applications is still very much based on personal preferences and understanding of the experimental techniques used to obtain these models. In this chapter, we will first explain the challenges with data standardization, annotation, and uniformity in the PDB entries determined by X-ray crystallography. We then discuss the specific effect that crystallographic data quality and model optimization methods have on structural models and how validation tools can be used to make informed choices. We also discuss specific advantages of using the PDB_REDO databank as a resource for structural data. Finally, we will provide guidelines on how to select the most suitable protein structure models for detailed analysis and how to select a set of structure models suitable for data mining.
Collapse
Affiliation(s)
- Bart van Beusekom
- Department of Biochemistry, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| | - Anastassis Perrakis
- Department of Biochemistry, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| | - Robbie P Joosten
- Department of Biochemistry, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.
| |
Collapse
|
152
|
Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, Salazar GA, Tate J, Bateman A. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 2015; 44:D279-85. [PMID: 26673716 PMCID: PMC4702930 DOI: 10.1093/nar/gkv1344] [Citation(s) in RCA: 3712] [Impact Index Per Article: 412.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2015] [Accepted: 11/17/2015] [Indexed: 11/24/2022] Open
Abstract
In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.
Collapse
Affiliation(s)
- Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Penelope Coggill
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ruth Y Eberhardt
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Sean R Eddy
- Department of Molecular & Cellular Biology, Harvard University, Biological Laboratories 1008, 16 Divinity Avenue, Cambridge, MA 02138, USA John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA Howard Hughes Medical Institute, Harvard University, Cambridge, MA 02138, USA
| | - Jaina Mistry
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alex L Mitchell
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Simon C Potter
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marco Punta
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Sorbonne Universités, UPMC-Univ P6, CNRS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, 15 rue de l'Ecole de Médecine, 75006 Paris, France
| | - Matloob Qureshi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Amaia Sangrador-Vegas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Gustavo A Salazar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - John Tate
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
153
|
Peters C, Tsirigos KD, Shu N, Elofsson A. Improved topology prediction using the terminal hydrophobic helices rule. Bioinformatics 2015; 32:1158-62. [PMID: 26644416 DOI: 10.1093/bioinformatics/btv709] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Accepted: 11/29/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The translocon recognizes sufficiently hydrophobic regions of a protein and inserts them into the membrane. Computational methods try to determine what hydrophobic regions are recognized by the translocon. Although these predictions are quite accurate, many methods still fail to distinguish marginally hydrophobic transmembrane (TM) helices and equally hydrophobic regions in soluble protein domains. In vivo, this problem is most likely avoided by targeting of the TM-proteins, so that non-TM proteins never see the translocon. Proteins are targeted to the translocon by an N-terminal signal peptide. The targeting is also aided by the fact that the N-terminal helix is more hydrophobic than other TM-helices. In addition, we also recently found that the C-terminal helix is more hydrophobic than central helices. This information has not been used in earlier topology predictors. RESULTS Here, we use the fact that the N- and C-terminal helices are more hydrophobic to develop a new version of the first-principle-based topology predictor, SCAMPI. The new predictor has two main advantages; first, it can be used to efficiently separate membrane and non-membrane proteins directly without the use of an extra prefilter, and second it shows improved performance for predicting the topology of membrane proteins that contain large non-membrane domains. AVAILABILITY AND IMPLEMENTATION The predictor, a web server and all datasets are available at http://scampi.bioinfo.se/ CONTACT arne@bioinfo.se SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christoph Peters
- Department of Biochemistry and Biophysics, Science for Life Laboratory and
| | | | - Nanjiang Shu
- Department of Biochemistry and Biophysics, Science for Life Laboratory and Sweden Bioinformatics Infrastructure for Life Sciences (BILS), Stockholm University, Solna 17121, Sweden
| | - Arne Elofsson
- Department of Biochemistry and Biophysics, Science for Life Laboratory and
| |
Collapse
|
154
|
Xu Q, Malecka KL, Fink L, Jordan EJ, Duffy E, Kolander S, Peterson JR, Dunbrack RL. Identifying three-dimensional structures of autophosphorylation complexes in crystals of protein kinases. Sci Signal 2015; 8:rs13. [PMID: 26628682 DOI: 10.1126/scisignal.aaa6711] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Protein kinase autophosphorylation is a common regulatory mechanism in cell signaling pathways. Crystal structures of several homomeric protein kinase complexes have a serine, threonine, or tyrosine autophosphorylation site of one kinase monomer located in the active site of another monomer, a structural complex that we call an "autophosphorylation complex." We developed and applied a structural bioinformatics method to identify all such autophosphorylation complexes in x-ray crystallographic structures in the Protein Data Bank (PDB). We identified 15 autophosphorylation complexes in the PDB, of which five complexes had not previously been described in the publications describing the crystal structures. These five complexes consist of tyrosine residues in the N-terminal juxtamembrane regions of colony-stimulating factor 1 receptor (CSF1R, Tyr(561)) and ephrin receptor A2 (EPHA2, Tyr(594)), tyrosine residues in the activation loops of the SRC kinase family member LCK (Tyr(394)) and insulin-like growth factor 1 receptor (IGF1R, Tyr(1166)), and a serine in a nuclear localization signal region of CDC-like kinase 2 (CLK2, Ser(142)). Mutations in the complex interface may alter autophosphorylation activity and contribute to disease; therefore, we mutated residues in the autophosphorylation complex interface of LCK and found that two mutations impaired autophosphorylation (T445V and N446A) and mutation of Pro(447) to Ala, Gly, or Leu increased autophosphorylation. The identified autophosphorylation sites are conserved in many kinases, suggesting that, by homology, these complexes may provide insight into autophosphorylation complex interfaces of kinases that are relevant drug targets.
Collapse
Affiliation(s)
- Qifang Xu
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| | - Kimberly L Malecka
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| | - Lauren Fink
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| | - E Joseph Jordan
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Erin Duffy
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| | - Samuel Kolander
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| | - Jeffrey R Peterson
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| | - Roland L Dunbrack
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, PA 19111, USA.
| |
Collapse
|
155
|
Komiyama Y, Banno M, Ueki K, Saad G, Shimizu K. Automatic generation of bioinformatics tools for predicting protein-ligand binding sites. ACTA ACUST UNITED AC 2015; 32:901-7. [PMID: 26545824 PMCID: PMC4803387 DOI: 10.1093/bioinformatics/btv593] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 10/12/2015] [Indexed: 11/13/2022]
Abstract
MOTIVATION Predictive tools that model protein-ligand binding on demand are needed to promote ligand research in an innovative drug-design environment. However, it takes considerable time and effort to develop predictive tools that can be applied to individual ligands. An automated production pipeline that can rapidly and efficiently develop user-friendly protein-ligand binding predictive tools would be useful. RESULTS We developed a system for automatically generating protein-ligand binding predictions. Implementation of this system in a pipeline of Semantic Web technique-based web tools will allow users to specify a ligand and receive the tool within 0.5-1 day. We demonstrated high prediction accuracy for three machine learning algorithms and eight ligands. AVAILABILITY AND IMPLEMENTATION The source code and web application are freely available for download at http://utprot.net They are implemented in Python and supported on Linux. CONTACT shimizu@bi.a.u-tokyo.ac.jp SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yusuke Komiyama
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo 108-8639, Japan and
| | - Masaki Banno
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Kokoro Ueki
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Gul Saad
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Kentaro Shimizu
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo-ku, Tokyo 113-8657, Japan
| |
Collapse
|
156
|
Flissi A, Dufresne Y, Michalik J, Tonon L, Janot S, Noé L, Jacques P, Leclère V, Pupin M. Norine, the knowledgebase dedicated to non-ribosomal peptides, is now open to crowdsourcing. Nucleic Acids Res 2015; 44:D1113-8. [PMID: 26527733 PMCID: PMC4702827 DOI: 10.1093/nar/gkv1143] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 10/16/2015] [Indexed: 11/29/2022] Open
Abstract
Since its creation in 2006, Norine remains the unique knowledgebase dedicated to non-ribosomal peptides (NRPs). These secondary metabolites, produced by bacteria and fungi, harbor diverse interesting biological activities (such as antibiotic, antitumor, siderophore or surfactant) directly related to the diversity of their structures. The Norine team goal is to collect the NRPs and provide tools to analyze them efficiently. We have developed a user-friendly interface and dedicated tools to provide a complete bioinformatics platform. The knowledgebase gathers abundant and valuable annotations on more than 1100 NRPs. To increase the quantity of described NRPs and improve the quality of associated annotations, we are now opening Norine to crowdsourcing. We believe that contributors from the scientific community are the best experts to annotate the NRPs they work on. We have developed MyNorine to facilitate the submission of new NRPs or modifications of stored ones. This article presents MyNorine and other novelties of Norine interface released since the first publication. Norine is freely accessible from the following URL: http://bioinfo.lifl.fr/NRP.
Collapse
Affiliation(s)
- Areski Flissi
- University of Lille, CRIStAL, UMR CNRS 9189, cité scientifique-bat M3ext, 59650 Villeneuve d'Ascq, France Inria Lille Nord Europe, Bonsai team, Parc scientifique de la Haute Borne, 40, avenue Halley-Bt A, 59650 Villeneuve d'Ascq, France
| | - Yoann Dufresne
- University of Lille, CRIStAL, UMR CNRS 9189, cité scientifique-bat M3ext, 59650 Villeneuve d'Ascq, France Inria Lille Nord Europe, Bonsai team, Parc scientifique de la Haute Borne, 40, avenue Halley-Bt A, 59650 Villeneuve d'Ascq, France
| | - Juraj Michalik
- University of Lille, CRIStAL, UMR CNRS 9189, cité scientifique-bat M3ext, 59650 Villeneuve d'Ascq, France University of Lille, bilille, cité scientifique-bat M3ext, 59650 Villeneuve d'Ascq, France
| | - Laurie Tonon
- University of Lille, CRIStAL, UMR CNRS 9189, cité scientifique-bat M3ext, 59650 Villeneuve d'Ascq, France Inria Lille Nord Europe, Bonsai team, Parc scientifique de la Haute Borne, 40, avenue Halley-Bt A, 59650 Villeneuve d'Ascq, France
| | - Stéphane Janot
- University of Lille, CRIStAL, UMR CNRS 9189, cité scientifique-bat M3ext, 59650 Villeneuve d'Ascq, France Inria Lille Nord Europe, Bonsai team, Parc scientifique de la Haute Borne, 40, avenue Halley-Bt A, 59650 Villeneuve d'Ascq, France
| | - Laurent Noé
- University of Lille, CRIStAL, UMR CNRS 9189, cité scientifique-bat M3ext, 59650 Villeneuve d'Ascq, France Inria Lille Nord Europe, Bonsai team, Parc scientifique de la Haute Borne, 40, avenue Halley-Bt A, 59650 Villeneuve d'Ascq, France
| | - Philippe Jacques
- University of Lille, EA 7394, ICV-Institut Charles Viollette, ProBioGEM team, Polytech'Lille, avenue Langevin, 59655 Villeneuve d'Ascq, France
| | - Valérie Leclère
- University of Lille, CRIStAL, UMR CNRS 9189, cité scientifique-bat M3ext, 59650 Villeneuve d'Ascq, France Inria Lille Nord Europe, Bonsai team, Parc scientifique de la Haute Borne, 40, avenue Halley-Bt A, 59650 Villeneuve d'Ascq, France University of Lille, EA 7394, ICV-Institut Charles Viollette, ProBioGEM team, Polytech'Lille, avenue Langevin, 59655 Villeneuve d'Ascq, France
| | - Maude Pupin
- University of Lille, CRIStAL, UMR CNRS 9189, cité scientifique-bat M3ext, 59650 Villeneuve d'Ascq, France Inria Lille Nord Europe, Bonsai team, Parc scientifique de la Haute Borne, 40, avenue Halley-Bt A, 59650 Villeneuve d'Ascq, France University of Lille, bilille, cité scientifique-bat M3ext, 59650 Villeneuve d'Ascq, France
| |
Collapse
|
157
|
Lai YC, Kondapalli C, Lehneck R, Procter JB, Dill BD, Woodroof HI, Gourlay R, Peggie M, Macartney TJ, Corti O, Corvol JC, Campbell DG, Itzen A, Trost M, Muqit MM. Phosphoproteomic screening identifies Rab GTPases as novel downstream targets of PINK1. EMBO J 2015; 34:2840-61. [PMID: 26471730 PMCID: PMC4654935 DOI: 10.15252/embj.201591593] [Citation(s) in RCA: 141] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 09/18/2015] [Indexed: 12/21/2022] Open
Abstract
Mutations in the PTEN‐induced kinase 1 (PINK1) are causative of autosomal recessive Parkinson's disease (PD). We have previously reported that PINK1 is activated by mitochondrial depolarisation and phosphorylates serine 65 (Ser65) of the ubiquitin ligase Parkin and ubiquitin to stimulate Parkin E3 ligase activity. Here, we have employed quantitative phosphoproteomics to search for novel PINK1‐dependent phosphorylation targets in HEK (human embryonic kidney) 293 cells stimulated by mitochondrial depolarisation. This led to the identification of 14,213 phosphosites from 4,499 gene products. Whilst most phosphosites were unaffected, we strikingly observed three members of a sub‐family of Rab GTPases namely Rab8A, 8B and 13 that are all phosphorylated at the highly conserved residue of serine 111 (Ser111) in response to PINK1 activation. Using phospho‐specific antibodies raised against Ser111 of each of the Rabs, we demonstrate that Rab Ser111 phosphorylation occurs specifically in response to PINK1 activation and is abolished in HeLa PINK1 knockout cells and mutant PINK1 PD patient‐derived fibroblasts stimulated by mitochondrial depolarisation. We provide evidence that Rab8A GTPase Ser111 phosphorylation is not directly regulated by PINK1 in vitro and demonstrate in cells the time course of Ser111 phosphorylation of Rab8A, 8B and 13 is markedly delayed compared to phosphorylation of Parkin at Ser65. We further show mechanistically that phosphorylation at Ser111 significantly impairs Rab8A activation by its cognate guanine nucleotide exchange factor (GEF), Rabin8 (by using the Ser111Glu phosphorylation mimic). These findings provide the first evidence that PINK1 is able to regulate the phosphorylation of Rab GTPases and indicate that monitoring phosphorylation of Rab8A/8B/13 at Ser111 may represent novel biomarkers of PINK1 activity in vivo. Our findings also suggest that disruption of Rab GTPase‐mediated signalling may represent a major mechanism in the neurodegenerative cascade of Parkinson's disease.
Collapse
Affiliation(s)
- Yu-Chiang Lai
- MRC Protein Phosphorylation and Ubiquitylation Unit, College of Life Sciences University of Dundee, Dundee, UK
| | - Chandana Kondapalli
- MRC Protein Phosphorylation and Ubiquitylation Unit, College of Life Sciences University of Dundee, Dundee, UK
| | - Ronny Lehneck
- Centre for Integrated Protein Science Munich, Department Chemistry Technische Universität München, Garching, Germany
| | - James B Procter
- Division of Computational Biology, College of Life Sciences University of Dundee, Dundee, UK
| | - Brian D Dill
- MRC Protein Phosphorylation and Ubiquitylation Unit, College of Life Sciences University of Dundee, Dundee, UK
| | - Helen I Woodroof
- MRC Protein Phosphorylation and Ubiquitylation Unit, College of Life Sciences University of Dundee, Dundee, UK
| | - Robert Gourlay
- MRC Protein Phosphorylation and Ubiquitylation Unit, College of Life Sciences University of Dundee, Dundee, UK
| | - Mark Peggie
- Division of Signal Transduction Therapy, College of Life Sciences University of Dundee, Dundee, UK
| | - Thomas J Macartney
- Division of Signal Transduction Therapy, College of Life Sciences University of Dundee, Dundee, UK
| | - Olga Corti
- Inserm U 1127, Paris, France CNRS UMR 7225, Paris, France Sorbonne Universités UPMC Paris 06 UMR S 1127, Paris, France Institut du Cerveau et de la Moelle épinière ICM, Paris, France
| | - Jean-Christophe Corvol
- Inserm U 1127, Paris, France CNRS UMR 7225, Paris, France Sorbonne Universités UPMC Paris 06 UMR S 1127, Paris, France Institut du Cerveau et de la Moelle épinière ICM, Paris, France Inserm Centre d'Investigation Clinique (CIC), Paris, France AP-HP, Département des maladies du système nerveux, Hôpital de la Pitié-Salpêtrière, Paris, France
| | - David G Campbell
- MRC Protein Phosphorylation and Ubiquitylation Unit, College of Life Sciences University of Dundee, Dundee, UK
| | - Aymelt Itzen
- Centre for Integrated Protein Science Munich, Department Chemistry Technische Universität München, Garching, Germany
| | - Matthias Trost
- MRC Protein Phosphorylation and Ubiquitylation Unit, College of Life Sciences University of Dundee, Dundee, UK
| | - Miratul Mk Muqit
- MRC Protein Phosphorylation and Ubiquitylation Unit, College of Life Sciences University of Dundee, Dundee, UK College of Medicine, Dentistry & Nursing, University of Dundee, Dundee, UK
| |
Collapse
|
158
|
Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc Natl Acad Sci U S A 2015; 112:E5486-95. [PMID: 26392535 DOI: 10.1073/pnas.1516373112] [Citation(s) in RCA: 155] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Large-scale tumor sequencing projects enabled the identification of many new cancer gene candidates through computational approaches. Here, we describe a general method to detect cancer genes based on significant 3D clustering of mutations relative to the structure of the encoded protein products. The approach can also be used to search for proteins with an enrichment of mutations at binding interfaces with a protein, nucleic acid, or small molecule partner. We applied this approach to systematically analyze the PanCancer compendium of somatic mutations from 4,742 tumors relative to all known 3D structures of human proteins in the Protein Data Bank. We detected significant 3D clustering of missense mutations in several previously known oncoproteins including HRAS, EGFR, and PIK3CA. Although clustering of missense mutations is often regarded as a hallmark of oncoproteins, we observed that a number of tumor suppressors, including FBXW7, VHL, and STK11, also showed such clustering. Beside these known cases, we also identified significant 3D clustering of missense mutations in NUF2, which encodes a component of the kinetochore, that could affect chromosome segregation and lead to aneuploidy. Analysis of interaction interfaces revealed enrichment of mutations in the interfaces between FBXW7-CCNE1, HRAS-RASA1, CUL4B-CAND1, OGT-HCFC1, PPP2R1A-PPP2R5C/PPP2R2A, DICER1-Mg2+, MAX-DNA, SRSF2-RNA, and others. Together, our results indicate that systematic consideration of 3D structure can assist in the identification of cancer genes and in the understanding of the functional role of their mutations.
Collapse
|
159
|
Wu Z, Hu G, Yang J, Peng Z, Uversky VN, Kurgan L. In various protein complexes, disordered protomers have large per-residue surface areas and area of protein-, DNA- and RNA-binding interfaces. FEBS Lett 2015; 589:2561-9. [DOI: 10.1016/j.febslet.2015.08.014] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2015] [Revised: 07/31/2015] [Accepted: 08/03/2015] [Indexed: 11/28/2022]
|
160
|
Flock T, Ravarani CNJ, Sun D, Venkatakrishnan AJ, Kayikci M, Tate CG, Veprintsev DB, Babu MM. Universal allosteric mechanism for Gα activation by GPCRs. Nature 2015; 524:173-179. [PMID: 26147082 PMCID: PMC4866443 DOI: 10.1038/nature14663] [Citation(s) in RCA: 252] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Accepted: 06/16/2015] [Indexed: 12/25/2022]
Abstract
G protein-coupled receptors (GPCRs) allosterically activate heterotrimeric G proteins and trigger GDP release. Given that there are ∼800 human GPCRs and 16 different Gα genes, this raises the question of whether a universal allosteric mechanism governs Gα activation. Here we show that different GPCRs interact with and activate Gα proteins through a highly conserved mechanism. Comparison of Gα with the small G protein Ras reveals how the evolution of short segments that undergo disorder-to-order transitions can decouple regions important for allosteric activation from receptor binding specificity. This might explain how the GPCR-Gα system diversified rapidly, while conserving the allosteric activation mechanism.
Collapse
Affiliation(s)
- Tilman Flock
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | | | - Dawei Sun
- Laboratory of Biomolecular Research, Paul Scherrer Institut, Villigen, Switzerland
- Department of Biology, ETH Zurich, Zurich, Switzerland
| | | | - Melis Kayikci
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Christopher G. Tate
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Dmitry B. Veprintsev
- Laboratory of Biomolecular Research, Paul Scherrer Institut, Villigen, Switzerland
- Department of Biology, ETH Zurich, Zurich, Switzerland
| | - M. Madan Babu
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| |
Collapse
|
161
|
Zhang Q, Guldbrandtsen B, Bosse M, Lund MS, Sahana G. Runs of homozygosity and distribution of functional variants in the cattle genome. BMC Genomics 2015. [PMID: 26198692 PMCID: PMC4508970 DOI: 10.1186/s12864-015-1715-x] [Citation(s) in RCA: 120] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Recent developments in sequencing technology have facilitated widespread investigations of genomic variants, including continuous stretches of homozygous genomic regions. For cattle, a large proportion of these runs of homozygosity (ROH) are likely the result of inbreeding due to the accumulation of elite alleles from long-term selective breeding programs. In the present study, ROH were characterized in four cattle breeds with whole genome sequence data and the distribution of predicted functional variants was detected in ROH regions and across different ROH length classes. Results On average, 19.5 % of the genome was located in ROH across four cattle breeds. There were an average of 715.5 ROH per genome with an average size of ~750 kbp, ranging from 10 (minimum size considered) to 49,290 kbp. There was a significant correlation between shared short ROH regions and regions putatively under selection (p < 0.001). By investigating the relationship between ROH and the predicted deleterious and non-deleterious variants, we gained insight into the distribution of functional variation in inbred (ROH) regions. Predicted deleterious variants were more enriched in ROH regions than predicted non-deleterious variants, which is consistent with observations in the human genome. We also found that increased enrichment of deleterious variants was significantly higher in short (<100 kbp) and medium (0.1 to 3 Mbp) ROH regions compared with long (>3 Mbp) ROH regions (P < 0.001), which is different than what has been observed in the human genome. Conclusions This study illustrates the distribution of ROH and functional variants within ROH in cattle populations. These patterns are different from those in the human genome but consistent with the natural history of cattle populations, which is confirmed by the significant correlation between shared short ROH regions and regions putatively under selection. These findings contribute to understanding the effects of inbreeding and probably selection in shaping the distribution of functional variants in the cattle genome. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1715-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qianqian Zhang
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, DK-8830, Denmark. .,Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, Wageningen, 6700 AH, The Netherlands.
| | - Bernt Guldbrandtsen
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, DK-8830, Denmark.
| | - Mirte Bosse
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, Wageningen, 6700 AH, The Netherlands.
| | - Mogens S Lund
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, DK-8830, Denmark.
| | - Goutam Sahana
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, DK-8830, Denmark.
| |
Collapse
|
162
|
Viales RR, Eichstaedt CA, Ehlken N, Fischer C, Lichtblau M, Grünig E, Hinderhofer K. Mutation in BMPR2 Promoter: A 'Second Hit' for Manifestation of Pulmonary Arterial Hypertension? PLoS One 2015; 10:e0133042. [PMID: 26167679 PMCID: PMC4500409 DOI: 10.1371/journal.pone.0133042] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Accepted: 06/22/2015] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Hereditary pulmonary arterial hypertension (HPAH) can be caused by autosomal dominant inherited mutations of TGF-β genes, such as the bone morphogenetic protein receptor 2 (BMPR2) and Endoglin (ENG) gene. Additional modifier genes may play a role in disease manifestation and severity. In this study we prospectively assessed two families with known BMPR2 or ENG mutations clinically and genetically and screened for a second mutation in the BMPR2 promoter region. METHODS We investigated the BMPR2 promoter region by direct sequencing in two index-patients with invasively confirmed diagnosis of HPAH, carrying a mutation in the BMPR2 and ENG gene, respectively. Sixteen family members have been assessed clinically by non-invasive methods and genetically by direct sequencing. RESULTS In both index patients with a primary BMPR2 deletion (exon 2 and 3) and Endoglin missense variant (c.1633G>A, p.(G545S)), respectively, we detected a second mutation (c.-669G>A) in the promoter region of the BMPR2 gene. The index patients with 2 mutations/variants were clinically severely affected at early age, whereas further family members with only one mutation had no manifest HPAH. CONCLUSION The finding of this study supports the hypothesis that additional mutations may lead to an early and severe manifestation of HPAH. This study shows for the first time that in the regulatory region of the BMPR2 gene the promoter may be important for disease penetrance. Further studies are needed to assess the incidence and clinical relevance of mutations of the BMPR2 promoter region in a larger patient cohort.
Collapse
Affiliation(s)
- Rebecca Rodríguez Viales
- University Hospital Heidelberg, Centre for pulmonary hypertension of the Thoraxclinic Heidelberg, Heidelberg, Germany; Heidelberg University, Institute of Human Genetics, Heidelberg, Germany
| | - Christina A Eichstaedt
- University Hospital Heidelberg, Centre for pulmonary hypertension of the Thoraxclinic Heidelberg, Heidelberg, Germany
| | - Nicola Ehlken
- University Hospital Heidelberg, Centre for pulmonary hypertension of the Thoraxclinic Heidelberg, Heidelberg, Germany
| | - Christine Fischer
- Heidelberg University, Institute of Human Genetics, Heidelberg, Germany
| | - Mona Lichtblau
- University Hospital Heidelberg, Centre for pulmonary hypertension of the Thoraxclinic Heidelberg, Heidelberg, Germany
| | - Ekkehard Grünig
- University Hospital Heidelberg, Centre for pulmonary hypertension of the Thoraxclinic Heidelberg, Heidelberg, Germany
| | | |
Collapse
|
163
|
Zubek J, Tatjewski M, Boniecki A, Mnich M, Basu S, Plewczynski D. Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae. PeerJ 2015; 3:e1041. [PMID: 26157620 PMCID: PMC4493684 DOI: 10.7717/peerj.1041] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Accepted: 05/31/2015] [Indexed: 11/20/2022] Open
Abstract
Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).
Collapse
Affiliation(s)
- Julian Zubek
- Centre of New Technologies, University of Warsaw , Warsaw , Poland ; Institute of Computer Science, Polish Academy of Sciences , Warsaw , Poland
| | - Marcin Tatjewski
- Centre of New Technologies, University of Warsaw , Warsaw , Poland ; Institute of Computer Science, Polish Academy of Sciences , Warsaw , Poland
| | - Adam Boniecki
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw , Warsaw , Poland
| | - Maciej Mnich
- Faculty of Mathematics and Computer Science, Jagiellonian University , Cracow , Poland
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University , Kolkata, West Bengal , India
| | | |
Collapse
|
164
|
Tsirigos KD, Peters C, Shu N, Käll L, Elofsson A. The TOPCONS web server for consensus prediction of membrane protein topology and signal peptides. Nucleic Acids Res 2015; 43:W401-7. [PMID: 25969446 PMCID: PMC4489233 DOI: 10.1093/nar/gkv485] [Citation(s) in RCA: 634] [Impact Index Per Article: 70.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2015] [Accepted: 05/02/2015] [Indexed: 11/21/2022] Open
Abstract
TOPCONS (http://topcons.net/) is a widely used web server for consensus prediction of membrane protein topology. We hereby present a major update to the server, with some substantial improvements, including the following: (i) TOPCONS can now efficiently separate signal peptides from transmembrane regions. (ii) The server can now differentiate more successfully between globular and membrane proteins. (iii) The server now is even slightly faster, although a much larger database is used to generate the multiple sequence alignments. For most proteins, the final prediction is produced in a matter of seconds. (iv) The user-friendly interface is retained, with the additional feature of submitting batch files and accessing the server programmatically using standard interfaces, making it thus ideal for proteome-wide analyses. Indicatively, the user can now scan the entire human proteome in a few days. (v) For proteins with homology to a known 3D structure, the homology-inferred topology is also displayed. (vi) Finally, the combination of methods currently implemented achieves an overall increase in performance by 4% as compared to the currently available best-scoring methods and TOPCONS is the only method that can identify signal peptides and still maintain a state-of-the-art performance in topology predictions.
Collapse
Affiliation(s)
- Konstantinos D Tsirigos
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden
| | - Christoph Peters
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden
| | - Nanjiang Shu
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden Bioinformatics Infrastructure for Life Sciences (BILS), Stockholm University, Sweden
| | - Lukas Käll
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden
| | - Arne Elofsson
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden
| |
Collapse
|
165
|
Ito A, Ohkawa T. A method of searching for related literature on protein structure analysis by considering a user's intention. BMC Bioinformatics 2015; 16 Suppl 7:S4. [PMID: 25952498 PMCID: PMC4423583 DOI: 10.1186/1471-2105-16-s7-s4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In recent years, with advances in techniques for protein structure analysis, the knowledge about protein structure and function has been published in a vast number of articles. A method to search for specific publications from such a large pool of articles is needed. In this paper, we propose a method to search for related articles on protein structure analysis by using an article itself as a query. RESULTS Each article is represented as a set of concepts in the proposed method. Then, by using similarities among concepts formulated from databases such as Gene Ontology, similarities between articles are evaluated. In this framework, the desired search results vary depending on the user's search intention because a variety of information is included in a single article. Therefore, the proposed method provides not only one input article (primary article) but also additional articles related to it as an input query to determine the search intention of the user, based on the relationship between two query articles. In other words, based on the concepts contained in the input article and additional articles, we actualize a relevant literature search that considers user intention by varying the degree of attention given to each concept and modifying the concept hierarchy graph. CONCLUSIONS We performed an experiment to retrieve relevant papers from articles on protein structure analysis registered in the Protein Data Bank by using three query datasets. The experimental results yielded search results with better accuracy than when user intention was not considered, confirming the effectiveness of the proposed method.
Collapse
|
166
|
Irimia M, Weatheritt RJ, Ellis JD, Parikshak NN, Gonatopoulos-Pournatzis T, Babor M, Quesnel-Vallières M, Tapial J, Raj B, O'Hanlon D, Barrios-Rodiles M, Sternberg MJE, Cordes SP, Roth FP, Wrana JL, Geschwind DH, Blencowe BJ. A highly conserved program of neuronal microexons is misregulated in autistic brains. Cell 2015; 159:1511-23. [PMID: 25525873 DOI: 10.1016/j.cell.2014.11.035] [Citation(s) in RCA: 422] [Impact Index Per Article: 46.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Revised: 10/20/2014] [Accepted: 11/18/2014] [Indexed: 12/16/2022]
Abstract
Alternative splicing (AS) generates vast transcriptomic and proteomic complexity. However, which of the myriad of detected AS events provide important biological functions is not well understood. Here, we define the largest program of functionally coordinated, neural-regulated AS described to date in mammals. Relative to all other types of AS within this program, 3-15 nucleotide "microexons" display the most striking evolutionary conservation and switch-like regulation. These microexons modulate the function of interaction domains of proteins involved in neurogenesis. Most neural microexons are regulated by the neuronal-specific splicing factor nSR100/SRRM4, through its binding to adjacent intronic enhancer motifs. Neural microexons are frequently misregulated in the brains of individuals with autism spectrum disorder, and this misregulation is associated with reduced levels of nSR100. The results thus reveal a highly conserved program of dynamic microexon regulation associated with the remodeling of protein-interaction networks during neurogenesis, the misregulation of which is linked to autism.
Collapse
Affiliation(s)
- Manuel Irimia
- Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada; EMBL/CRG Research Unit in Systems Biology, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, Barcelona 08003, Spain.
| | - Robert J Weatheritt
- Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada; MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Jonathan D Ellis
- Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Neelroop N Parikshak
- Department of Neurology, Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California Los Angeles, 695 Charles E. Young Drive South, Los Angeles, CA 90095, USA
| | | | - Mariana Babor
- Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada
| | | | - Javier Tapial
- EMBL/CRG Research Unit in Systems Biology, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, Barcelona 08003, Spain
| | - Bushra Raj
- Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Dave O'Hanlon
- Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Miriam Barrios-Rodiles
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, ON M5G 1X5, Canada
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Sabine P Cordes
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, ON M5G 1X5, Canada; Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, ON M5S 1A8, Canada
| | - Frederick P Roth
- Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada; Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, ON M5G 1X5, Canada; Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, ON M5S 1A8, Canada; Department of Computer Science, University of Toronto, 10 King's College Road, Toronto, ON M5S 3G4, Canada; Canadian Institute For Advanced Research, 180 Dundas Street West, Toronto, ON M5G 1Z8, Canada
| | - Jeffrey L Wrana
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, ON M5G 1X5, Canada; Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, ON M5S 1A8, Canada
| | - Daniel H Geschwind
- Department of Neurology, Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California Los Angeles, 695 Charles E. Young Drive South, Los Angeles, CA 90095, USA
| | - Benjamin J Blencowe
- Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, ON M5S 1A8, Canada.
| |
Collapse
|
167
|
Abstract
A key reason three-dimensional (3-D) protein structures are annotated with supporting or derived information is to understand the molecular basis of protein function. To this end, protein structure annotation databases curate key facts and observations, based on community-accepted standards, about the ~100,000 3-D experimental protein structures to date. This review will introduce the primary structure repositories, databases, and value-added structural annotation databases, as well as the range of information they provide. The different levels of annotation data (primary vs. derived vs. inferred) and how they should all be considered accordingly will also be described.
Collapse
Affiliation(s)
- Margaret J. Gabanyi
- Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
| | - Helen M. Berman
- Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
| |
Collapse
|
168
|
Vijayan RSK, He P, Modi V, Duong-Ly KC, Ma H, Peterson JR, Dunbrack RL, Levy RM. Conformational analysis of the DFG-out kinase motif and biochemical profiling of structurally validated type II inhibitors. J Med Chem 2014; 58:466-79. [PMID: 25478866 PMCID: PMC4326797 DOI: 10.1021/jm501603h] [Citation(s) in RCA: 144] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
![]()
Structural
coverage of the human kinome has been steadily increasing
over time. The structures provide valuable insights into the molecular
basis of kinase function and also provide a foundation for understanding
the mechanisms of kinase inhibitors. There are a large number of kinase
structures in the PDB for which the Asp and Phe of the DFG motif on
the activation loop swap positions, resulting in the formation of
a new allosteric pocket. We refer to these structures as “classical
DFG-out” conformations in order to distinguish them from conformations
that have also been referred to as DFG-out in the literature but that
do not have a fully formed allosteric pocket. We have completed a
structural analysis of almost 200 small molecule inhibitors bound
to classical DFG-out conformations; we find that they are recognized
by both type I and type II inhibitors. In contrast, we find that nonclassical
DFG-out conformations strongly select against type II inhibitors because
these structures have not formed a large enough allosteric pocket
to accommodate this type of binding mode. In the course of this study
we discovered that the number of structurally validated type II inhibitors
that can be found in the PDB and that are also represented in publicly
available biochemical profiling studies of kinase inhibitors is very
small. We have obtained new profiling results for several additional
structurally validated type II inhibitors identified through our conformational
analysis. Although the available profiling data for type II inhibitors
is still much smaller than for type I inhibitors, a comparison of
the two data sets supports the conclusion that type II inhibitors
are more selective than type I. We comment on the possible contribution
of the DFG-in to DFG-out conformational reorganization to the selectivity.
Collapse
Affiliation(s)
- R S K Vijayan
- Center for Biophysics & Computational Biology and Institute for Computational Molecular Science, Temple University , Philadelphia, Pennsylvania 19122, United States
| | | | | | | | | | | | | | | |
Collapse
|
169
|
Jones DT, Cozzetto D. DISOPRED3: precise disordered region predictions with annotated protein-binding activity. ACTA ACUST UNITED AC 2014; 31:857-63. [PMID: 25391399 PMCID: PMC4380029 DOI: 10.1093/bioinformatics/btu744] [Citation(s) in RCA: 616] [Impact Index Per Article: 61.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Motivation: A sizeable fraction of eukaryotic proteins contain intrinsically disordered regions (IDRs), which act in unfolded states or by undergoing transitions between structured and unstructured conformations. Over time, sequence-based classifiers of IDRs have become fairly accurate and currently a major challenge is linking IDRs to their biological roles from the molecular to the systems level. Results: We describe DISOPRED3, which extends its predecessor with new modules to predict IDRs and protein-binding sites within them. Based on recent CASP evaluation results, DISOPRED3 can be regarded as state of the art in the identification of IDRs, and our self-assessment shows that it significantly improves over DISOPRED2 because its predictions are more specific across the whole board and more sensitive to IDRs longer than 20 amino acids. Predicted IDRs are annotated as protein binding through a novel SVM based classifier, which uses profile data and additional sequence-derived features. Based on benchmarking experiments with full cross-validation, we show that this predictor generates precise assignments of disordered protein binding regions and that it compares well with other publicly available tools. Availability and implementation:http://bioinf.cs.ucl.ac.uk/disopred Contact:d.t.jones@ucl.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David T Jones
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK
| | - Domenico Cozzetto
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
170
|
Potenza E, Di Domenico T, Walsh I, Tosatto SCE. MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins. Nucleic Acids Res 2014; 43:D315-20. [PMID: 25361972 PMCID: PMC4384034 DOI: 10.1093/nar/gku982] [Citation(s) in RCA: 152] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
MobiDB (http://mobidb.bio.unipd.it/) is a database of intrinsically disordered and mobile proteins. Intrinsically disordered regions are key for the function of numerous proteins. Here we provide a new version of MobiDB, a centralized source aimed at providing the most complete picture on different flavors of disorder in protein structures covering all UniProt sequences (currently over 80 million). The database features three levels of annotation: manually curated, indirect and predicted. Manually curated data is extracted from the DisProt database. Indirect data is inferred from PDB structures that are considered an indication of intrinsic disorder. The 10 predictors currently included (three ESpritz flavors, two IUPred flavors, two DisEMBL flavors, GlobPlot, VSL2b and JRONN) enable MobiDB to provide disorder annotations for every protein in absence of more reliable data. The new version also features a consensus annotation and classification for long disordered regions. In order to complement the disorder annotations, MobiDB features additional annotations from external sources. Annotations from the UniProt database include post-translational modifications and linear motifs. Pfam annotations are displayed in graphical form and are link-enabled, allowing the user to visit the corresponding Pfam page for further information. Experimental protein–protein interactions from STRING are also classified for disorder content.
Collapse
Affiliation(s)
- Emilio Potenza
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| | - Tomás Di Domenico
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| | - Ian Walsh
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| |
Collapse
|
171
|
Jamroz M, Niemyska W, Rawdon EJ, Stasiak A, Millett KC, Sułkowski P, Sulkowska JI. KnotProt: a database of proteins with knots and slipknots. Nucleic Acids Res 2014; 43:D306-14. [PMID: 25361973 PMCID: PMC4383900 DOI: 10.1093/nar/gku1059] [Citation(s) in RCA: 138] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
The protein topology database KnotProt, http://knotprot.cent.uw.edu.pl/, collects information about protein structures with open polypeptide chains forming knots or slipknots. The knotting complexity of the cataloged proteins is presented in the form of a matrix diagram that shows users the knot type of the entire polypeptide chain and of each of its subchains. The pattern visible in the matrix gives the knotting fingerprint of a given protein and permits users to determine, for example, the minimal length of the knotted regions (knot's core size) or the depth of a knot, i.e. how many amino acids can be removed from either end of the cataloged protein structure before converting it from a knot to a different type of knot. In addition, the database presents extensive information about the biological functions, families and fold types of proteins with non-trivial knotting. As an additional feature, the KnotProt database enables users to submit protein or polymer chains and generate their knotting fingerprints.
Collapse
Affiliation(s)
- Michal Jamroz
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Wanda Niemyska
- Institute of Mathematics, University of Silesia, Bankowa 14, 40-007 Katowice, Poland
| | - Eric J Rawdon
- Department of Mathematics, University of St. Thomas, Saint Paul, MN 55105, USA
| | - Andrzej Stasiak
- Center for Integrative Genomics, University of Lausanne, 1015-Lausanne, Switzerland
| | - Kenneth C Millett
- Department of Mathematics, University of California, Santa Barbara, CA 93106, USA
| | - Piotr Sułkowski
- Faculty of Physics, University of Warsaw, Pasteura 5, 02-093 Warsaw, Poland California Institute of Technology, Pasadena, CA 91125, USA
| | - Joanna I Sulkowska
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland
| |
Collapse
|
172
|
Patwardhan A, Ashton A, Brandt R, Butcher S, Carzaniga R, Chiu W, Collinson L, Doux P, Duke E, Ellisman MH, Franken E, Grünewald K, Heriche JK, Koster A, Kühlbrandt W, Lagerstedt I, Larabell C, Lawson CL, Saibil HR, Sanz-García E, Subramaniam S, Verkade P, Swedlow JR, Kleywegt GJ. A 3D cellular context for the macromolecular world. Nat Struct Mol Biol 2014; 21:841-5. [PMID: 25289590 PMCID: PMC4346196 DOI: 10.1038/nsmb.2897] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
We report the outcomes of the discussion initiated at the workshop entitled A 3D Cellular Context for the Macromolecular World and propose how data from emerging three-dimensional (3D) cellular imaging techniques—such as electron tomography, 3D scanning electron microscopy and soft X-ray tomography—should be archived, curated, validated and disseminated, to enable their interpretation and reuse by the biomedical community.
Collapse
Affiliation(s)
- Ardan Patwardhan
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | | | - Sarah Butcher
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Raffaella Carzaniga
- Electron Microscopy Unit, Cancer Research UK London Research Institute, London, UK
| | - Wah Chiu
- National Center for Macromolecular Imaging, Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas
| | - Lucy Collinson
- Electron Microscopy Unit, Cancer Research UK London Research Institute, London, UK
| | - Pascal Doux
- FEI Visualization Sciences Group, Mérignac, France
| | | | - Mark H Ellisman
- Center for Research in Biological Systems, National Center for Microscopy and Imaging Research (NCMIR), University of California, San Diego, San Diego, California, USA
| | - Erik Franken
- FEI Electron Optics B.V., Eindhoven, the Netherlands
| | - Kay Grünewald
- Division of Structural Biology, Wellcome Trust Centre for Human Genetics, Oxford, UK
| | - Jean-Karim Heriche
- Cell Biology and Biophysics Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Abraham Koster
- Department of Molecular Cell Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Werner Kühlbrandt
- Department of Structural Biology, Max Planck Institute for Biophysics, Frankfurt, Germany
| | - Ingvar Lagerstedt
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Carolyn Larabell
- Department of Anatomy, University of California, San Francisco, San Francisco, California, USA
| | - Catherine L Lawson
- Research Collaboratory for Structural Bioinformatics, Rutgers University, Piscataway, New Jersey, USA
| | - Helen R Saibil
- Institute of Structural and Molecular Biology, Department of Crystallography, Birkbeck College, London, UK
| | - Eduardo Sanz-García
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Sriram Subramaniam
- Center for Cancer Research, National Cancer Institute, Bethesda, Maryland, USA
| | - Paul Verkade
- Wolfson Bioimaging Facility, School of Biochemistry, University of Bristol, Bristol, UK
| | - Jason R Swedlow
- Centre for Gene Regulation and Expression, University of Dundee, Dundee, UK
| | - Gerard J Kleywegt
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| |
Collapse
|
173
|
Walsh I, Giollo M, Di Domenico T, Ferrari C, Zimmermann O, Tosatto SCE. Comprehensive large-scale assessment of intrinsic protein disorder. ACTA ACUST UNITED AC 2014; 31:201-8. [PMID: 25246432 DOI: 10.1093/bioinformatics/btu625] [Citation(s) in RCA: 124] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
MOTIVATION Intrinsically disordered regions are key for the function of numerous proteins. Due to the difficulties in experimental disorder characterization, many computational predictors have been developed with various disorder flavors. Their performance is generally measured on small sets mainly from experimentally solved structures, e.g. Protein Data Bank (PDB) chains. MobiDB has only recently started to collect disorder annotations from multiple experimental structures. RESULTS MobiDB annotates disorder for UniProt sequences, allowing us to conduct the first large-scale assessment of fast disorder predictors on 25 833 different sequences with X-ray crystallographic structures. In addition to a comprehensive ranking of predictors, this analysis produced the following interesting observations. (i) The predictors cluster according to their disorder definition, with a consensus giving more confidence. (ii) Previous assessments appear over-reliant on data annotated at the PDB chain level and performance is lower on entire UniProt sequences. (iii) Long disordered regions are harder to predict. (iv) Depending on the structural and functional types of the proteins, differences in prediction performance of up to 10% are observed. AVAILABILITY The datasets are available from Web site at URL: http://mobidb.bio.unipd.it/lsd. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ian Walsh
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Manuel Giollo
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Tomás Di Domenico
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Carlo Ferrari
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Olav Zimmermann
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, Department of Information Engineering, University of Padua, Via Gradenigo 6, 35121 Padova, Italy and Institute for Advanced Simulation, Forschungszentrum Juelich, Wilhelm-Johnen-Str., 52425 Juelich, Germany
| |
Collapse
|
174
|
Schomburg KT, Rarey M. Benchmark Data Sets for Structure-Based Computational Target Prediction. J Chem Inf Model 2014; 54:2261-74. [DOI: 10.1021/ci500131x] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Karen T. Schomburg
- Center
for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | - Matthias Rarey
- Center
for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| |
Collapse
|
175
|
Anand P, Nagarajan D, Mukherjee S, Chandra N. PLIC: protein-ligand interaction clusters. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau029. [PMID: 24763918 PMCID: PMC3998096 DOI: 10.1093/database/bau029] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Most of the biological processes are governed through specific protein–ligand interactions. Discerning different components that contribute toward a favorable protein– ligand interaction could contribute significantly toward better understanding protein function, rationalizing drug design and obtaining design principles for protein engineering. The Protein Data Bank (PDB) currently hosts the structure of ∼68 000 protein–ligand complexes. Although several databases exist that classify proteins according to sequence and structure, a mere handful of them annotate and classify protein–ligand interactions and provide information on different attributes of molecular recognition. In this study, an exhaustive comparison of all the biologically relevant ligand-binding sites (84 846 sites) has been conducted using PocketMatch: a rapid, parallel, in-house algorithm. PocketMatch quantifies the similarity between binding sites based on structural descriptors and residue attributes. A similarity network was constructed using binding sites whose PocketMatch scores exceeded a high similarity threshold (0.80). The binding site similarity network was clustered into discrete sets of similar sites using the Markov clustering (MCL) algorithm. Furthermore, various computational tools have been used to study different attributes of interactions within the individual clusters. The attributes can be roughly divided into (i) binding site characteristics including pocket shape, nature of residues and interaction profiles with different kinds of atomic probes, (ii) atomic contacts consisting of various types of polar, hydrophobic and aromatic contacts along with binding site water molecules that could play crucial roles in protein–ligand interactions and (iii) binding energetics involved in interactions derived from scoring functions developed for docking. For each ligand-binding site in each protein in the PDB, site similarity information, clusters they belong to and description of site attributes are provided as a relational database—protein–ligand interaction clusters (PLIC). Database URL: http://proline.biochem.iisc.ernet.in/PLIC
Collapse
Affiliation(s)
- Praveen Anand
- Department of Biochemistry, Indian Institute of Science, Bangalore 560012, Karnataka, India and IISc Mathematics Initiative, Indian Institute of Science, Banglaore 560012, Karnataka, India
| | | | | | | |
Collapse
|
176
|
Xu J, Zhang J. Why human disease-associated residues appear as the wild-type in other species: genome-scale structural evidence for the compensation hypothesis. Mol Biol Evol 2014; 31:1787-92. [PMID: 24723421 DOI: 10.1093/molbev/msu130] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Many human-disease associated amino acid residues (DARs) appear as the wild-type in other species. This phenomenon is commonly explained by the presence of compensatory residues in these other species that alleviate the deleterious effects of the DARs. The general validity of this hypothesis, however, is unclear, because few compensatory residues have been identified. Here we test the compensation hypothesis by assembling and analyzing 1,077 DARs located in 177 proteins of known crystal structures. Because destabilizing protein structures is a primary reason why DARs are deleterious, we focus on protein stability in this analysis. We discover that, in species where a DAR represents the wild-type, the destabilizing effect of the DAR is generally lessened by the observed amino acid substitutions in the spatial proximity of the DAR. This and other findings provide genome-scale evidence for the compensation hypothesis and have important implications for understanding epistasis in protein evolution and for using animal models of human diseases.
Collapse
Affiliation(s)
- Jinrui Xu
- Department of Computational Medicine and Bioinformatics, University of Michigan
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan
| |
Collapse
|
177
|
Berman HM, Kleywegt GJ, Nakamura H, Markley JL. How community has shaped the Protein Data Bank. Structure 2014; 21:1485-91. [PMID: 24010707 DOI: 10.1016/j.str.2013.07.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2013] [Revised: 07/12/2013] [Accepted: 07/17/2013] [Indexed: 11/19/2022]
Abstract
Following several years of community discussion, the Protein Data Bank (PDB) was established in 1971 as a public repository for the coordinates of three-dimensional models of biological macromolecules. Since then, the number, size, and complexity of structural models have continued to grow, reflecting the productivity of structural biology. Managed by the Worldwide PDB organization, the PDB has been able to meet increasing demands for the quantity of structural information and of quality. In addition to providing unrestricted access to structural information, the PDB also works to promote data standards and to raise the profile of structural biology with broader audiences. In this perspective, we describe the history of PDB and the many ways in which the community continues to shape the archive.
Collapse
Affiliation(s)
- Helen M Berman
- RCSB PDB, Center for Integrative Proteomics Research and Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ USA 08854.
| | | | | | | |
Collapse
|
178
|
Mielke CJ, Mandarino LJ, Dinu V. AMASS: a database for investigating protein structures. Bioinformatics 2014; 30:1595-600. [PMID: 24497503 DOI: 10.1093/bioinformatics/btu073] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Modern techniques have produced many sequence annotation databases and protein structure portals, but these Web resources are rarely integrated in ways that permit straightforward exploration of protein functional residues and their co-localization. RESULTS We have created the AMASS database, which maps 1D sequence annotation databases to 3D protein structures with an intuitive visualization interface. Our platform also provides an analysis service that screens mass spectrometry sequence data for post-translational modifications that reside in functionally relevant locations within protein structures. The system is built on the premise that functional residues such as active sites, cancer mutations and post-translational modifications within proteins may co-localize and share common functions. AVAILABILITY AND IMPLEMENTATION AMASS database is implemented with Biopython and Apache as a freely available Web server at amass-db.org.
Collapse
Affiliation(s)
- Clinton J Mielke
- Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA, The Center for Metabolic and Vascular Biology, Mayo Clinic, Scottsdale, AZ 85259, USA and Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USABiodesign Institute, Arizona State University, Tempe, AZ 85287, USA, The Center for Metabolic and Vascular Biology, Mayo Clinic, Scottsdale, AZ 85259, USA and Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USABiodesign Institute, Arizona State University, Tempe, AZ 85287, USA, The Center for Metabolic and Vascular Biology, Mayo Clinic, Scottsdale, AZ 85259, USA and Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA
| | - Lawrence J Mandarino
- Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA, The Center for Metabolic and Vascular Biology, Mayo Clinic, Scottsdale, AZ 85259, USA and Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA
| | - Valentin Dinu
- Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA, The Center for Metabolic and Vascular Biology, Mayo Clinic, Scottsdale, AZ 85259, USA and Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA
| |
Collapse
|
179
|
Brooksbank C, Bergman MT, Apweiler R, Birney E, Thornton J. The European Bioinformatics Institute's data resources 2014. Nucleic Acids Res 2014; 42:D18-25. [PMID: 24271396 PMCID: PMC3964968 DOI: 10.1093/nar/gkt1206] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2013] [Revised: 11/01/2013] [Accepted: 11/04/2013] [Indexed: 12/18/2022] Open
Abstract
Molecular Biology has been at the heart of the 'big data' revolution from its very beginning, and the need for access to biological data is a common thread running from the 1965 publication of Dayhoff's 'Atlas of Protein Sequence and Structure' through the Human Genome Project in the late 1990s and early 2000s to today's population-scale sequencing initiatives. The European Bioinformatics Institute (EMBL-EBI; http://www.ebi.ac.uk) is one of three organizations worldwide that provides free access to comprehensive, integrated molecular data sets. Here, we summarize the principles underpinning the development of these public resources and provide an overview of EMBL-EBI's database collection to complement the reviews of individual databases provided elsewhere in this issue.
Collapse
Affiliation(s)
- Catherine Brooksbank
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mary Todd Bergman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rolf Apweiler
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Janet Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
180
|
Finn RD, Miller BL, Clements J, Bateman A. iPfam: a database of protein family and domain interactions found in the Protein Data Bank. Nucleic Acids Res 2013; 42:D364-73. [PMID: 24297255 PMCID: PMC3965099 DOI: 10.1093/nar/gkt1210] [Citation(s) in RCA: 117] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The database iPfam, available at http://ipfam.org, catalogues Pfam domain interactions based on known 3D structures that are found in the Protein Data Bank, providing interaction data at the molecular level. Previously, the iPfam domain–domain interaction data was integrated within the Pfam database and website, but it has now been migrated to a separate database. This allows for independent development, improving data access and giving clearer separation between the protein family and interactions datasets. In addition to domain–domain interactions, iPfam has been expanded to include interaction data for domain bound small molecule ligands. Functional annotations are provided from source databases, supplemented by the incorporation of Wikipedia articles where available. iPfam (version 1.0) contains >9500 domain–domain and 15 500 domain–ligand interactions. The new website provides access to this data in a variety of ways, including interactive visualizations of the interaction data.
Collapse
Affiliation(s)
- Robert D Finn
- HHMI Janelia Farm Research Campus, 19700 Helix Drive, Ashburn VA 20147 USA and European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | | | | |
Collapse
|
181
|
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer ELL, Tate J, Punta M. Pfam: the protein families database. Nucleic Acids Res 2013; 42:D222-30. [PMID: 24288371 PMCID: PMC3965110 DOI: 10.1093/nar/gkt1223] [Citation(s) in RCA: 4319] [Impact Index Per Article: 392.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.
Collapse
Affiliation(s)
- Robert D Finn
- HHMI Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA 20147 USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK, MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, OX1 3QX, UK, Institute of Biotechnology and Department of Biological and Environmental Sciences, University of Helsinki, PO Box 56 (Viikinkaari 5), 00014 Helsinki, Finland and Stockholm Bioinformatics Center, Swedish eScience Research Center, Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, PO Box 1031, SE-17121 Solna, Sweden
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
182
|
Gutmanas A, Alhroub Y, Battle GM, Berrisford JM, Bochet E, Conroy MJ, Dana JM, Fernandez Montecelo MA, van Ginkel G, Gore SP, Haslam P, Hatherley R, Hendrickx PMS, Hirshberg M, Lagerstedt I, Mir S, Mukhopadhyay A, Oldfield TJ, Patwardhan A, Rinaldi L, Sahni G, Sanz-García E, Sen S, Slowley RA, Velankar S, Wainwright ME, Kleywegt GJ. PDBe: Protein Data Bank in Europe. Nucleic Acids Res 2013; 42:D285-91. [PMID: 24288376 PMCID: PMC3965016 DOI: 10.1093/nar/gkt1180] [Citation(s) in RCA: 105] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The Protein Data Bank in Europe (pdbe.org) is a founding member of the Worldwide PDB consortium (wwPDB; wwpdb.org) and as such is actively engaged in the deposition, annotation, remediation and dissemination of macromolecular structure data through the single global archive for such data, the PDB. Similarly, PDBe is a member of the EMDataBank organisation (emdatabank.org), which manages the EMDB archive for electron microscopy data. PDBe also develops tools that help the biomedical science community to make effective use of the data in the PDB and EMDB for their research. Here we describe new or improved services, including updated SIFTS mappings to other bioinformatics resources, a new browser for the PDB archive based on Gene Ontology (GO) annotation, updates to the analysis of Nuclear Magnetic Resonance-derived structures, redesigned search and browse interfaces, and new or updated visualisation and validation tools for EMDB entries.
Collapse
Affiliation(s)
- Aleksandras Gutmanas
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
183
|
Blohm P, Frishman G, Smialowski P, Goebels F, Wachinger B, Ruepp A, Frishman D. Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis. Nucleic Acids Res 2013; 42:D396-400. [PMID: 24214996 PMCID: PMC3965096 DOI: 10.1093/nar/gkt1079] [Citation(s) in RCA: 99] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Knowledge about non-interacting proteins (NIPs) is important for training the algorithms to predict protein-protein interactions (PPIs) and for assessing the false positive rates of PPI detection efforts. We present the second version of Negatome, a database of proteins and protein domains that are unlikely to engage in physical interactions (available online at http://mips.helmholtz-muenchen.de/proj/ppi/negatome). Negatome is derived by manual curation of literature and by analyzing three-dimensional structures of protein complexes. The main methodological innovation in Negatome 2.0 is the utilization of an advanced text mining procedure to guide the manual annotation process. Potential non-interactions were identified by a modified version of Excerbt, a text mining tool based on semantic sentence analysis. Manual verification shows that nearly a half of the text mining results with the highest confidence values correspond to NIP pairs. Compared to the first version the contents of the database have grown by over 300%.
Collapse
Affiliation(s)
- Philipp Blohm
- Institute for Bioinformatics and Systems Biology/MIPS, HMGU - German Research Center for Environmental Health, Ingolstaedter Landstrasse 1, 85764 Neuherberg, Germany, Clueda AG, Elsenheimerstraße 59, 80687 Munich, Germany and Department of Genome Oriented Bioinformatics, Technische Universitaet Muenchen Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | | | | | | | | | | | | |
Collapse
|
184
|
Abstract
PDBsum, http://www.ebi.ac.uk/pdbsum, is a website providing numerous pictorial analyses of each entry in the Protein Data Bank. It portrays the structural features of all proteins, DNA and ligands in the entry, as well as depicting the interactions between them. The latest features, described here, include annotation of human protein sequences with their naturally occurring amino acid variants, dynamic graphs showing the relationships between related protein domain architectures, analyses of ligand binding clusters across different experimental determinations of the same protein, analyses of tunnels in proteins and new search options.
Collapse
Affiliation(s)
- Tjaart A P de Beer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK and Department of Physical Chemistry, Regional Centre of Advanced Technologies and Materials, Faculty of Science, Palacký University Olomouc, tř. 17. listopadu 12, 771 46 Olomouc, Czech Republic
| | | | | | | |
Collapse
|
185
|
DePietro PJ, Julfayev ES, McLaughlin WA. Quantification of the impact of PSI:Biology according to the annotations of the determined structures. BMC STRUCTURAL BIOLOGY 2013; 13:24. [PMID: 24139526 PMCID: PMC4016320 DOI: 10.1186/1472-6807-13-24] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2013] [Accepted: 10/14/2013] [Indexed: 11/23/2022]
Abstract
Background Protein Structure Initiative:Biology (PSI:Biology) is the third phase of PSI where protein structures are determined in high-throughput to characterize their biological functions. The transition to the third phase entailed the formation of PSI:Biology Partnerships which are composed of structural genomics centers and biomedical science laboratories. We present a method to examine the impact of protein structures determined under the auspices of PSI:Biology by measuring their rates of annotations. The mean numbers of annotations per structure and per residue are examined. These are designed to provide measures of the amount of structure to function connections that can be leveraged from each structure. Results One result is that PSI:Biology structures are found to have a higher rate of annotations than structures determined during the first two phases of PSI. A second result is that the subset of PSI:Biology structures determined through PSI:Biology Partnerships have a higher rate of annotations than those determined exclusive of those partnerships. Both results hold when the annotation rates are examined either at the level of the entire protein or for annotations that are known to fall at specific residues within the portion of the protein that has a determined structure. Conclusions We conclude that PSI:Biology determines structures that are estimated to have a higher degree of biomedical interest than those determined during the first two phases of PSI based on a broad array of biomedical annotations. For the PSI:Biology Partnerships, we see that there is an associated added value that represents part of the progress toward the goals of PSI:Biology. We interpret the added value to mean that team-based structural biology projects that utilize the expertise and technologies of structural genomics centers together with biological laboratories in the community are conducted in a synergistic manner. We show that the annotation rates can be used in conjunction with established metrics, i.e. the numbers of structures and impact of publication records, to monitor the progress of PSI:Biology towards its goals of examining structure to function connections of high biomedical relevance. The metric provides an objective means to quantify the overall impact of PSI:Biology as it uses biomedical annotations from external sources.
Collapse
Affiliation(s)
| | | | - William A McLaughlin
- Department of Basic Science, The Commonwealth Medical College, 525 Pine Street, Scranton, PA 18509, USA.
| |
Collapse
|
186
|
Gutmanas A, Oldfield TJ, Patwardhan A, Sen S, Velankar S, Kleywegt GJ. The role of structural bioinformatics resources in the era of integrative structural biology. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 2013; 69:710-21. [PMID: 23633580 PMCID: PMC3640467 DOI: 10.1107/s0907444913001157] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 01/11/2013] [Indexed: 11/10/2022]
Abstract
The history and the current state of the PDB and EMDB archives is briefly described, as well as some of the challenges that they face. It seems natural that the role of structural biology archives will change from being a pure repository of historic data into becoming an indispensable resource for the wider biomedical community. As part of this transformation, it will be necessary to validate the biomacromolecular structure data and ensure the highest possible quality for the archive holdings, to combine structural data from different spatial scales into a unified resource and to integrate structural data with functional, genetic and taxonomic data as well as other information available in bioinformatics resources. Some recent developments and plans to address these challenges at PDBe are presented.
Collapse
Affiliation(s)
- Aleksandras Gutmanas
- Protein Data Bank in Europe, EMBL–EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England
| | - Thomas J. Oldfield
- Protein Data Bank in Europe, EMBL–EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England
| | - Ardan Patwardhan
- Protein Data Bank in Europe, EMBL–EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England
| | - Sanchayita Sen
- Protein Data Bank in Europe, EMBL–EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England
| | - Sameer Velankar
- Protein Data Bank in Europe, EMBL–EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England
| | - Gerard J. Kleywegt
- Protein Data Bank in Europe, EMBL–EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, England
| |
Collapse
|
187
|
Fernández-Suárez XM, Galperin MY. The 2013 Nucleic Acids Research Database Issue and the online molecular biology database collection. Nucleic Acids Res 2012. [PMID: 23203983 PMCID: PMC3531151 DOI: 10.1093/nar/gks1297] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The 20th annual Database Issue of Nucleic Acids Research includes 176 articles, half of which describe new online molecular biology databases and the other half provide updates on the databases previously featured in NAR and other journals. This year’s highlights include two databases of DNA repeat elements; several databases of transcriptional factors and transcriptional factor-binding sites; databases on various aspects of protein structure and protein–protein interactions; databases for metagenomic and rRNA sequence analysis; and four databases specifically dedicated to Escherichia coli. The increased emphasis on using the genome data to improve human health is reflected in the development of the databases of genomic structural variation (NCBI’s dbVar and EBI’s DGVa), the NIH Genetic Testing Registry and several other databases centered on the genetic basis of human disease, potential drugs, their targets and the mechanisms of protein–ligand binding. Two new databases present genomic and RNAseq data for monkeys, providing wealth of data on our closest relatives for comparative genomics purposes. The NAR online Molecular Biology Database Collection, available at http://www.oxfordjournals.org/nar/database/a/, has been updated and currently lists 1512 online databases. The full content of the Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/).
Collapse
|