1
|
Arrías PN, Monzon AM, Clementel D, Mozaffari S, Piovesan D, Kajava AV, Tosatto SCE. The repetitive structure of DNA clamps: An overlooked protein tandem repeat. J Struct Biol 2023; 215:108001. [PMID: 37467824 DOI: 10.1016/j.jsb.2023.108001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 07/12/2023] [Accepted: 07/16/2023] [Indexed: 07/21/2023]
Abstract
Structured tandem repeats proteins (STRPs) are a specific kind of tandem repeat proteins characterized by a modular and repetitive three-dimensional structure arrangement. The majority of STRPs adopt solenoid structures, but with the increasing availability of experimental structures and high-quality predicted structural models, more STRP folds can be characterized. Here, we describe "Box repeats", an overlooked STRP fold present in the DNA sliding clamp processivity factors, which has eluded classification although structural data has been available since the late 1990s. Each Box repeat is a β⍺βββ module of about 60 residues, which forms a class V "beads-on-a-string" type STRP. The number of repeats present in processivity factors is organism dependent. Monomers of PCNA proteins in both Archaea and Eukarya have 4 repeats, while the monomers of bacterial beta-sliding clamps have 6 repeats. This new repeat fold has been added to the RepeatsDB database, which now provides structural annotation for 66 Box repeat proteins belonging to different organisms, including viruses.
Collapse
Affiliation(s)
- Paula Nazarena Arrías
- Department of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Alexander Miguel Monzon
- Department of Information Engineering, University of Padova, via Giovanni Gradenigo 6/B, 35131 Padova, Italy
| | - Damiano Clementel
- Department of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Soroush Mozaffari
- Department of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, 34293 Montpellier, France
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy.
| |
Collapse
|
2
|
Bromberg Y, Aptekmann AA, Mahlich Y, Cook L, Senn S, Miller M, Nanda V, Ferreiro DU, Falkowski PG. Quantifying structural relationships of metal-binding sites suggests origins of biological electron transfer. SCIENCE ADVANCES 2022; 8:eabj3984. [PMID: 35030025 PMCID: PMC8759750 DOI: 10.1126/sciadv.abj3984] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 11/22/2021] [Indexed: 06/07/2023]
Abstract
Biological redox reactions drive planetary biogeochemical cycles. Using a novel, structure-guided sequence analysis of proteins, we explored the patterns of evolution of enzymes responsible for these reactions. Our analysis reveals that the folds that bind transition metal–containing ligands have similar structural geometry and amino acid sequences across the full diversity of proteins. Similarity across folds reflects the availability of key transition metals over geological time and strongly suggests that transition metal–ligand binding had a small number of common peptide origins. We observe that structures central to our similarity network come primarily from oxidoreductases, suggesting that ancestral peptides may have also facilitated electron transfer reactions. Last, our results reveal that the earliest biologically functional peptides were likely available before the assembly of fully functional protein domains over 3.8 billion years ago.Thus, life is a special, very complex form of motion of matter, but this form did not always exist, and it is not separated from inorganic nature by an impassable abyss; rather, it arose from inorganic nature as a new property in the process of evolution of the world. We must study the history of this evolution if we want to solve the problem of the origin of life. [A. I. Oparin (1)]
Collapse
Affiliation(s)
- Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA
| | - Ariel A. Aptekmann
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA
| | - Yannick Mahlich
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA
| | - Linda Cook
- Program in Applied and Computational Math, Princeton University, Princeton, NJ 08540, USA
| | - Stefan Senn
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA
| | - Maximillian Miller
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08873, USA
| | - Vikas Nanda
- Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, and Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, NJ 08854, USA
| | - Diego U. Ferreiro
- Protein Physiology Lab, Departamento de Química Biológica, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN-CONICET), Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Paul G. Falkowski
- Environmental Biophysics and Molecular Ecology Program, Department of Marine and Coastal Sciences, Rutgers University, New Brunswick, NJ 08901, USA
| |
Collapse
|
3
|
Abstract
Abstract
Ankyrin (ANK) repeat proteins are coded by tandem occurrences of patterns with around 33 amino acids. They often mediate protein–protein interactions in a diversity of biological systems. These proteins have an elongated non-globular shape and often display complex folding mechanisms. This work investigates the energy landscape of representative proteins of this class made up of 3, 4 and 6 ANK repeats using the energy-landscape visualisation method (ELViM). By combining biased and unbiased coarse-grained molecular dynamics AWSEM simulations that sample conformations along the folding trajectories with the ELViM structure-based phase space, one finds a three-dimensional representation of the globally funnelled energy surface. In this representation, it is possible to delineate distinct folding pathways. We show that ELViMs can project, in a natural way, the intricacies of the highly dimensional energy landscapes encoded by the highly symmetric ankyrin repeat proteins into useful low-dimensional representations. These projections can discriminate between multiplicities of specific parallel folding mechanisms that otherwise can be hidden in oversimplified depictions.
Collapse
|
4
|
Deryusheva EI, Machulin AV, Galzitskaya OV. Structural, Functional, and Evolutionary Characteristics of Proteins with Repeats. Mol Biol 2021. [DOI: 10.1134/s0026893321040038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
5
|
Izert MA, Szybowska PE, Górna MW, Merski M. The Effect of Mutations in the TPR and Ankyrin Families of Alpha Solenoid Repeat Proteins. FRONTIERS IN BIOINFORMATICS 2021; 1:696368. [PMID: 36303725 PMCID: PMC9581033 DOI: 10.3389/fbinf.2021.696368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 06/22/2021] [Indexed: 11/20/2022] Open
Abstract
Protein repeats are short, highly similar peptide motifs that occur several times within a single protein, for example the TPR and Ankyrin repeats. Understanding the role of mutation in these proteins is complicated by the competing facts that 1) the repeats are much more restricted to a set sequence than non-repeat proteins, so mutations should be harmful much more often because there are more residues that are heavily restricted due to the need of the sequence to repeat and 2) the symmetry of the repeats in allows the distribution of functional contributions over a number of residues so that sometimes no specific site is singularly responsible for function (unlike enzymatic active site catalytic residues). To address this issue, we review the effects of mutations in a number of natural repeat proteins from the tetratricopeptide and Ankyrin repeat families. We find that mutations are context dependent. Some mutations are indeed highly disruptive to the function of the protein repeats while mutations in identical positions in other repeats in the same protein have little to no effect on structure or function.
Collapse
Affiliation(s)
| | | | | | - Matthew Merski
- *Correspondence: Maria Wiktoria Górna, ; Matthew Merski,
| |
Collapse
|
6
|
Paladin L, Bevilacqua M, Errigo S, Piovesan D, Mičetić I, Necci M, Monzon AM, Fabre ML, Lopez JL, Nilsson JF, Rios J, Menna PL, Cabrera M, Buitron MG, Kulik MG, Fernandez-Alberti S, Fornasari MS, Parisi G, Lagares A, Hirsh L, Andrade-Navarro MA, Kajava AV, Tosatto SCE. RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures. Nucleic Acids Res 2021; 49:D452-D457. [PMID: 33237313 PMCID: PMC7778985 DOI: 10.1093/nar/gkaa1097] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/17/2020] [Accepted: 11/19/2020] [Indexed: 11/21/2022] Open
Abstract
The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.
Collapse
Affiliation(s)
- Lisanna Paladin
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Martina Bevilacqua
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Sara Errigo
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Ivan Mičetić
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Marco Necci
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | | | - Maria Laura Fabre
- IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
| | - Jose Luis Lopez
- IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
| | - Juliet F Nilsson
- IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
| | - Javier Rios
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Pablo Lorenzano Menna
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Maia Cabrera
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Martin Gonzalez Buitron
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Mariane Gonçalves Kulik
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Sebastian Fernandez-Alberti
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Maria Silvina Fornasari
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Gustavo Parisi
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Antonio Lagares
- IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
| | - Layla Hirsh
- Dept. of Engineering, Faculty of Science and Engineering, Pontifical Catholic University of Peru, Av. Universitaria 1801 San Miguel, Lima 32, Lima, Peru
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237, CNRS, Univ. Montpellier, Montpellier, France
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| |
Collapse
|
7
|
Aleksandrova AA, Sarti E, Forrest LR. MemSTATS: A Benchmark Set of Membrane Protein Symmetries and Pseudosymmetries. J Mol Biol 2019; 432:597-604. [PMID: 31628944 DOI: 10.1016/j.jmb.2019.09.020] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 08/30/2019] [Accepted: 09/23/2019] [Indexed: 02/06/2023]
Abstract
In membrane proteins, symmetry and pseudosymmetry often have functional or evolutionary implications. However, available symmetry detection methods have not been tested systematically on this class of proteins because of the lack of an appropriate benchmark set. Here we present MemSTATS, a publicly available benchmark set of both quaternary- and internal-symmetries in membrane protein structures. The symmetries are described in terms of order, repeated elements, and orientation of the axis with respect to the membrane plane. Moreover, using MemSTATS, we compare the performance of four widely used symmetry detection algorithms and highlight specific challenges and areas for improvement in the future.
Collapse
Affiliation(s)
- Antoniya A Aleksandrova
- Computational Structural Biology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Edoardo Sarti
- Computational Structural Biology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Lucy R Forrest
- Computational Structural Biology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
8
|
Bliven SE, Lafita A, Rose PW, Capitani G, Prlić A, Bourne PE. Analyzing the symmetrical arrangement of structural repeats in proteins with CE-Symm. PLoS Comput Biol 2019; 15:e1006842. [PMID: 31009453 PMCID: PMC6504099 DOI: 10.1371/journal.pcbi.1006842] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 05/07/2019] [Accepted: 01/29/2019] [Indexed: 01/04/2023] Open
Abstract
Many proteins fold into highly regular and repetitive three dimensional structures. The analysis of structural patterns and repeated elements is fundamental to understand protein function and evolution. We present recent improvements to the CE-Symm tool for systematically detecting and analyzing the internal symmetry and structural repeats in proteins. In addition to the accurate detection of internal symmetry, the tool is now capable of i) reporting the type of symmetry, ii) identifying the smallest repeating unit, iii) describing the arrangement of repeats with transformation operations and symmetry axes, and iv) comparing the similarity of all the internal repeats at the residue level. CE-Symm 2.0 helps the user investigate proteins with a robust and intuitive sequence-to-structure analysis, with many applications in protein classification, functional annotation and evolutionary studies. We describe the algorithmic extensions of the method and demonstrate its applications to the study of interesting cases of protein evolution. Many protein structures show a great deal of regularity. Even within single polypeptide chains, about 25% of proteins contain self-similar repeating structures, which can be organized in ring-like symmetric arrangements or linear open repeats. The repeats are often related, and thus comparing the sequence and structure of repeats can give an idea as to the early evolutionary history of a protein family. Additionally, the conservation and divergence of repeats can lead to insights about the function of the proteins. This work describes CE-Symm 2.0, a tool for the analysis of protein symmetry. The method automatically detects internal symmetry in protein structures and produces a multiple alignment of structural repeats. The algorithm is able to detect the geometric relationships between the repeats, including cyclic, dihedral, and polyhedral symmetries, translational repeats, and cases where multiple symmetry operators are applicable in a hierarchical manner. These complex relationships can then be visualized in a graphical interface as a complete structure, as a superposition of repeats, or as a multiple alignment of the protein sequence. CE-Symm 2.0 can be systematically used for the automatic detection of internal symmetry in protein structures, or as an interactive tool for the analysis of structural repeats.
Collapse
Affiliation(s)
- Spencer E. Bliven
- Laboratory of Biomolecular Research, Paul Scherrer Institute, Villigen, Switzerland
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- Institute of Applied Simulation, Zurich University of Applied Science, Wädenswil, Switzerland
- * E-mail: (SEB), (AL)
| | - Aleix Lafita
- Laboratory of Biomolecular Research, Paul Scherrer Institute, Villigen, Switzerland
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- * E-mail: (SEB), (AL)
| | - Peter W. Rose
- RCSB Protein Data Bank, San Diego Supercomputing Center, University of California San Diego, La Jolla, California, United States of America
- Structural Bioinformatics Laboratory, San Diego Supercomputing Center, University of California San Diego, La Jolla, California, United States of America
| | - Guido Capitani
- Laboratory of Biomolecular Research, Paul Scherrer Institute, Villigen, Switzerland
- Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Andreas Prlić
- RCSB Protein Data Bank, San Diego Supercomputing Center, University of California San Diego, La Jolla, California, United States of America
| | - Philip E. Bourne
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America
| |
Collapse
|
9
|
Turjanski P, Ferreiro DU. On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences. J Phys Chem B 2018; 122:11295-11301. [PMID: 30239207 DOI: 10.1021/acs.jpcb.8b07206] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
All known terrestrial proteins are coded as continuous strings of ≈20 amino acids. The patterns formed by the repetitions of elements in groups of finite sequences describes the natural architectures of protein families. We present a method to search for patterns and groupings of patterns in protein sequences using a mathematically precise definition for "repetition", an efficient algorithmic implementation and a robust scoring system with no adjustable parameters. We show that the sequence patterns can be well-separated into disjoint classes according to their recurrence in nested structures. The statistics of the occurrences of patterns indicate that short repetitions are sufficient to account for the differences between natural families and randomized groups of sequences by more than 10 standard deviations, while contiguous sequence patterns shorter than 5 residues are effectively random in their occurrences. A small subset of patterns is sufficient to account for a robust "familiarity" definition between arbitrary sets of sequences.
Collapse
Affiliation(s)
- Pablo Turjanski
- KAPOW, Departamento de Computación , Facultad de Ciencias Exactas y Naturales, UBA-CONICET-ICC , Buenos Aires , Argentina
| | - Diego U Ferreiro
- Protein Physiology Lab, Departamento de Química Biológica , Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN , Buenos Aires , Argentina
| |
Collapse
|
10
|
Pugacheva V, Korotkov A, Korotkov E. Search of latent periodicity in amino acid sequences by means of genetic algorithm and dynamic programming. Stat Appl Genet Mol Biol 2017; 15:381-400. [PMID: 27337743 DOI: 10.1515/sagmb-2015-0079] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The aim of this study was to show that amino acid sequences have a latent periodicity with insertions and deletions of amino acids in unknown positions of the analyzed sequence. Genetic algorithm, dynamic programming and random weight matrices were used to develop a new mathematical algorithm for latent periodicity search. A multiple alignment of periods was calculated with help of the direct optimization of the position-weight matrix without using pairwise alignments. The developed algorithm was applied to analyze amino acid sequences of a small number of proteins. This study showed the presence of latent periodicity with insertions and deletions in the amino acid sequences of such proteins, for which the presence of latent periodicity was not previously known. The origin of latent periodicity with insertions and deletions is discussed.
Collapse
|
11
|
Inferring repeat-protein energetics from evolutionary information. PLoS Comput Biol 2017; 13:e1005584. [PMID: 28617812 PMCID: PMC5491312 DOI: 10.1371/journal.pcbi.1005584] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 06/29/2017] [Accepted: 05/21/2017] [Indexed: 11/19/2022] Open
Abstract
Natural protein sequences contain a record of their history. A common constraint in a given protein family is the ability to fold to specific structures, and it has been shown possible to infer the main native ensemble by analyzing covariations in extant sequences. Still, many natural proteins that fold into the same structural topology show different stabilization energies, and these are often related to their physiological behavior. We propose a description for the energetic variation given by sequence modifications in repeat proteins, systems for which the overall problem is simplified by their inherent symmetry. We explicitly account for single amino acid and pair-wise interactions and treat higher order correlations with a single term. We show that the resulting evolutionary field can be interpreted with structural detail. We trace the variations in the energetic scores of natural proteins and relate them to their experimental characterization. The resulting energetic evolutionary field allows the prediction of the folding free energy change for several mutants, and can be used to generate synthetic sequences that are statistically indistinguishable from the natural counterparts.
Collapse
|
12
|
Paladin L, Hirsh L, Piovesan D, Andrade-Navarro MA, Kajava AV, Tosatto SCE. RepeatsDB 2.0: improved annotation, classification, search and visualization of repeat protein structures. Nucleic Acids Res 2016; 45:D308-D312. [PMID: 27899671 PMCID: PMC5210593 DOI: 10.1093/nar/gkw1136] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Revised: 10/20/2016] [Accepted: 10/31/2016] [Indexed: 12/19/2022] Open
Abstract
RepeatsDB 2.0 (URL: http://repeatsdb.bio.unipd.it/) is an update of the database of annotated tandem repeat protein structures. Repeat proteins are a widespread class of non-globular proteins carrying heterogeneous functions involved in several diseases. Here we provide a new version of RepeatsDB with an improved classification schema including high quality annotations for ∼5400 protein structures. RepeatsDB 2.0 features information on start and end positions for the repeat regions and units for all entries. The extensive growth of repeat unit characterization was possible by applying the novel ReUPred annotation method over the entire Protein Data Bank, with data quality is guaranteed by an extensive manual validation for >60% of the entries. The updated web interface includes a new search engine for complex queries and a fully re-designed entry page for a better overview of structural data. It is now possible to compare unit positions, together with secondary structure, fold information and Pfam domains. Moreover, a new classification level has been introduced on top of the existing scheme as an independent layer for sequence similarity relationships at 40%, 60% and 90% identity.
Collapse
Affiliation(s)
- Lisanna Paladin
- Dept. of Biomedical Sciences, University of Padua, 35121 Padova, Italy
| | - Layla Hirsh
- Dept. of Biomedical Sciences, University of Padua, 35121 Padova, Italy.,Departamento de Ingeniería, Pontificia Universidad Católica del Perú, 32 Lima, Perú
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padua, 35121 Padova, Italy
| | - Miguel A Andrade-Navarro
- Institute of Molecular Biology, Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | - Andrey V Kajava
- Centre de Recherches de Biochimie Macromoléculaire, CNRS, Université Montpellier, 34293 Montpellier, France.,Institut de Biologie Computationnelle (IBC), 34293 Montpellier, France.,Institute of Bioengineering, University ITMO, 197101 St. Petersburg, Russia
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padua, 35121 Padova, Italy .,CNR Institute of Neuroscience, 35121 Padova, Italy
| |
Collapse
|
13
|
Hrabe T, Jaroszewski L, Godzik A. Revealing aperiodic aspects of solenoid proteins from sequence information. Bioinformatics 2016; 32:2776-82. [PMID: 27334472 DOI: 10.1093/bioinformatics/btw319] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Accepted: 05/13/2016] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Repeat proteins, which contain multiple repeats of short sequence motifs, form a large but seldom-studied group of proteins. Methods focusing on the analysis of 3D structures of such proteins identified many subtle effects in length distribution of individual motifs that are important for their functions. However, similar analysis was yet not applied to the vast majority of repeat proteins with unknown 3D structures, mostly because of the extreme diversity of the underlying motifs and the resulting difficulty to detect those. RESULTS We developed FAIT, a sequence-based algorithm for the precise assignment of individual repeats in repeat proteins and introduced a framework to classify and compare aperiodicity patterns for large protein families. FAIT extracts repeat positions by post-processing FFAS alignment matrices with image processing methods. On examples of proteins with Leucine Rich Repeat (LRR) domains and other solenoids like proteins, we show that the automated analysis with FAIT correctly identifies exact lengths of individual repeats based entirely on sequence information. AVAILABILITY AND IMPLEMENTATION https://github.com/GodzikLab/FAIT CONTACT: adam@godziklab.org SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Thomas Hrabe
- Department of Bioinformatics and Systems Biology, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Lukasz Jaroszewski
- Department of Bioinformatics and Systems Biology, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Adam Godzik
- Department of Bioinformatics and Systems Biology, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| |
Collapse
|
14
|
Abstract
Structural domains are believed to be modules within proteins that can fold and function independently. Some proteins show tandem repetitions of apparent modular structure that do not fold independently, but rather co-operate in stabilizing structural forms that comprise several repeat-units. For many natural repeat-proteins, it has been shown that weak energetic links between repeats lead to the breakdown of co-operativity and the appearance of folding sub-domains within an apparently regular repeat array. The quasi-1D architecture of repeat-proteins is crucial in detailing how the local energetic balances can modulate the folding dynamics of these proteins, which can be related to the physiological behaviour of these ubiquitous biological systems.
Collapse
|
15
|
Turjanski P, Parra RG, Espada R, Becher V, Ferreiro DU. Protein Repeats from First Principles. Sci Rep 2016; 6:23959. [PMID: 27044676 PMCID: PMC4820709 DOI: 10.1038/srep23959] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 03/16/2016] [Indexed: 01/09/2023] Open
Abstract
Some natural proteins display recurrent structural patterns. Despite being highly similar at the tertiary structure level, repeating patterns within a single repeat protein can be extremely variable at the sequence level. We use a mathematical definition of a repetition and investigate the occurrences of these in sequences of different protein families. We found that long stretches of perfect repetitions are infrequent in individual natural proteins, even for those which are known to fold into structures of recurrent structural motifs. We found that natural repeat proteins are indeed repetitive in their families, exhibiting abundant stretches of 6 amino acids or longer that are perfect repetitions in the reference family. We provide a systematic quantification for this repetitiveness. We show that this form of repetitiveness is not exclusive of repeat proteins, but also occurs in globular domains. A by-product of this work is a fast quantification of the likelihood of a protein to belong to a family.
Collapse
Affiliation(s)
- Pablo Turjanski
- Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - R Gonzalo Parra
- Protein Physiology Lab, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires-CONICET-IQUIBICEN, Buenos Aires, Argentina
| | - Rocío Espada
- Protein Physiology Lab, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires-CONICET-IQUIBICEN, Buenos Aires, Argentina
| | - Verónica Becher
- Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Diego U Ferreiro
- Protein Physiology Lab, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires-CONICET-IQUIBICEN, Buenos Aires, Argentina
| |
Collapse
|
16
|
Parra RG, Espada R, Verstraete N, Ferreiro DU. Structural and Energetic Characterization of the Ankyrin Repeat Protein Family. PLoS Comput Biol 2015; 11:e1004659. [PMID: 26691182 PMCID: PMC4687027 DOI: 10.1371/journal.pcbi.1004659] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 11/10/2015] [Indexed: 11/21/2022] Open
Abstract
Ankyrin repeat containing proteins are one of the most abundant solenoid folds. Usually implicated in specific protein-protein interactions, these proteins are readily amenable for design, with promising biotechnological and biomedical applications. Studying repeat protein families presents technical challenges due to the high sequence divergence among the repeating units. We developed and applied a systematic method to consistently identify and annotate the structural repetitions over the members of the complete Ankyrin Repeat Protein Family, with increased sensitivity over previous studies. We statistically characterized the number of repeats, the folding of the repeat-arrays, their structural variations, insertions and deletions. An energetic analysis of the local frustration patterns reveal the basic features underlying fold stability and its relation to the functional binding regions. We found a strong linear correlation between the conservation of the energetic features in the repeat arrays and their sequence variations, and discuss new insights into the organization and function of these ubiquitous proteins. Some natural proteins are formed with repetitions of similar amino acid stretches. Ankyrin-repeat proteins constitute one of the most abundant families of this class of proteins that serve as model systems to analyze how variations in sequences exert effects in structures and biological functions. We present an in-depth analysis of the ankyrin repeat protein family, characterizing the variations in the repeating arrays both at the structural and energetic level. We introduce a consistent annotation for the repeat characteristics and describe how the structural differences are related to the sequences by their underlying energetic signatures.
Collapse
Affiliation(s)
- R. Gonzalo Parra
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina
| | - Rocío Espada
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina
| | - Nina Verstraete
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina
| | - Diego U. Ferreiro
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina
- * E-mail:
| |
Collapse
|
17
|
Pellegrini M. Tandem Repeats in Proteins: Prediction Algorithms and Biological Role. Front Bioeng Biotechnol 2015; 3:143. [PMID: 26442257 PMCID: PMC4585158 DOI: 10.3389/fbioe.2015.00143] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 09/07/2015] [Indexed: 12/30/2022] Open
Abstract
Tandem repetitions in protein sequence and structure is a fascinating subject of research which has been a focus of study since the late 1990s. In this survey, we give an overview on the multi-faceted aspects of research on protein tandem repeats (PTR for short), including prediction algorithms, databases, early classification efforts, mechanisms of PTR formation and evolution, and synthetic PTR design. We also touch on the rather open issue of the relationship between PTR and flexibility (or disorder) in proteins. Detection of PTR either from protein sequence or structure data is challenging due to inherent high (biological) signal-to-noise ratio that is a key feature of this problem. As early in silico analytic tools have been key enablers for starting this field of study, we expect that current and future algorithmic and statistical breakthroughs will have a high impact on the investigations of the biological role of PTR.
Collapse
Affiliation(s)
- Marco Pellegrini
- Laboratory for Integrative Systems Medicine (LISM), Istituto di Informatica e Telematica, and Istituto di Fisiologia Clinica, Consiglio Nazionale delle Ricerche , Pisa , Italy
| |
Collapse
|
18
|
Do Viet P, Roche DB, Kajava AV. TAPO: A combined method for the identification of tandem repeats in protein structures. FEBS Lett 2015; 589:2611-9. [PMID: 26320412 DOI: 10.1016/j.febslet.2015.08.025] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2015] [Revised: 08/10/2015] [Accepted: 08/13/2015] [Indexed: 10/23/2022]
Abstract
In recent years, there has been an emergence of new 3D structures of proteins containing tandem repeats (TRs), as a result of improved expression and crystallization strategies. Databases focused on structure classifications (PDB, SCOP, CATH) do not provide an easy solution for selection of these structures from PDB. Several approaches have been developed, but no best approach exists to identify the whole range of 3D TRs. Here we describe the TAndem PrOtein detector (TAPO) that uses periodicities of atomic coordinates and other types of structural representation, including strings generated by conformational alphabets, residue contact maps, and arrangements of vectors of secondary structure elements. The benchmarking shows the superior performance of TAPO over the existing programs. In accordance with our analysis of PDB using TAPO, 19% of proteins contain 3D TRs. This analysis allowed us to identify new families of 3D TRs, suggesting that TAPO can be used to regularly update the collection and classification of existing repetitive structures.
Collapse
Affiliation(s)
- Phuong Do Viet
- Centre de Recherche de Biochimie Macromoléculaire, UMR 5237 CNRS, Université Montpellier, 1919, Route de Mende, 34293 Montpellier Cedex 5, France; Institut de Biologie Computationnelle, Université Montpellier, Bat. 5, 860, rue St Priest, 34095 Montpellier Cedex 5, France
| | - Daniel B Roche
- Centre de Recherche de Biochimie Macromoléculaire, UMR 5237 CNRS, Université Montpellier, 1919, Route de Mende, 34293 Montpellier Cedex 5, France; Institut de Biologie Computationnelle, Université Montpellier, Bat. 5, 860, rue St Priest, 34095 Montpellier Cedex 5, France
| | - Andrey V Kajava
- Centre de Recherche de Biochimie Macromoléculaire, UMR 5237 CNRS, Université Montpellier, 1919, Route de Mende, 34293 Montpellier Cedex 5, France; Institut de Biologie Computationnelle, Université Montpellier, Bat. 5, 860, rue St Priest, 34095 Montpellier Cedex 5, France.
| |
Collapse
|
19
|
Espada R, Parra RG, Mora T, Walczak AM, Ferreiro DU. Capturing coevolutionary signals inrepeat proteins. BMC Bioinformatics 2015; 16:207. [PMID: 26134293 PMCID: PMC4489039 DOI: 10.1186/s12859-015-0648-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Accepted: 06/16/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The analysis of correlations of amino acid occurrences in globular domains has led to the development of statistical tools that can identify native contacts - portions of the chains that come to close distance in folded structural ensembles. Here we introduce a direct coupling analysis for repeat proteins - natural systems for which the identification of folding domains remains challenging. RESULTS We show that the inherent translational symmetry of repeat protein sequences introduces a strong bias in the pair correlations at precisely the length scale of the repeat-unit. Equalizing for this bias in an objective way reveals true co-evolutionary signals from which local native contacts can be identified. Importantly, parameter values obtained for all other interactions are not significantly affected by the equalization. We quantify the robustness of the procedure and assign confidence levels to the interactions, identifying the minimum number of sequences needed to extract evolutionary information in several repeat protein families. CONCLUSIONS The overall procedure can be used to reconstruct the interactions at distances larger than repeat-pairs, identifying the characteristics of the strongest couplings in each family, and can be applied to any system that appears translationally symmetric.
Collapse
Affiliation(s)
- Rocío Espada
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina.,Departamento de Física, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - R Gonzalo Parra
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina
| | - Thierry Mora
- Laboratoire de physique statistique, CNRS, UPMC and École normale supérieure, 24 rue Lhomond, Paris, 75005, France
| | | | - Diego U Ferreiro
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina
| |
Collapse
|
20
|
Kurzbach D, Kontaxis G, Coudevylle N, Konrat R. NMR Spectroscopic Studies of the Conformational Ensembles of Intrinsically Disordered Proteins. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2015; 870:149-85. [PMID: 26387102 DOI: 10.1007/978-3-319-20164-1_5] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Intrinsically disordered proteins (IDPs) are characterized by substantial conformational flexibility and thus not amenable to conventional structural biology techniques. Given their inherent structural flexibility NMR spectroscopy offers unique opportunities for structural and dynamic studies of IDPs. The past two decades have witnessed significant development of NMR spectroscopy that couples advances in spin physics and chemistry with a broad range of applications. This chapter will summarize key advances in NMR methodology. Despite the availability of efficient (multi-dimensional) NMR experiments for signal assignment of IDPs it is discussed that NMR of larger and more complex IDPs demands spectral simplification strategies capitalizing on specific isotope-labeling strategies. Prototypical applications of isotope labeling-strategies are described. Since IDP-ligand association and dissociation processes frequently occur on time scales that are amenable to NMR spectroscopy we describe in detail the application of CPMG relaxation dispersion techniques to studies of IDP protein binding. Finally, we demonstrate that the complementary usage of NMR and EPR data provide a more comprehensive picture about the conformational states of IDPs and can be employed to analyze the conformational ensembles of IDPs.
Collapse
Affiliation(s)
- Dennis Kurzbach
- Department of Computational and Structural Biology, Max F. Perutz Laboratories, University of Vienna, Campus Vienna Biocenter 5, 1030, Vienna, Austria
| | - Georg Kontaxis
- Department of Computational and Structural Biology, Max F. Perutz Laboratories, University of Vienna, Campus Vienna Biocenter 5, 1030, Vienna, Austria
| | - Nicolas Coudevylle
- Department of Computational and Structural Biology, Max F. Perutz Laboratories, University of Vienna, Campus Vienna Biocenter 5, 1030, Vienna, Austria
| | - Robert Konrat
- Department of Computational and Structural Biology, Max F. Perutz Laboratories, University of Vienna, Campus Vienna Biocenter 5, 1030, Vienna, Austria.
| |
Collapse
|
21
|
Chakrabarty B, Parekh N. Identifying tandem Ankyrin repeats in protein structures. BMC Bioinformatics 2014; 15:6599. [PMID: 25547411 PMCID: PMC4307672 DOI: 10.1186/s12859-014-0440-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 12/18/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Tandem repetition of structural motifs in proteins is frequently observed across all forms of life. Topology of repeating unit and its frequency of occurrence are associated to a wide range of structural and functional roles in diverse proteins, and defects in repeat proteins have been associated with a number of diseases. It is thus desirable to accurately identify specific repeat type and its copy number. Weak evolutionary constraints on repeat units and insertions/deletions between them make their identification difficult at the sequence level and structure based approaches are desired. The proposed graph spectral approach is based on protein structure represented as a graph for detecting one of the most frequently observed structural repeats, Ankyrin repeat. RESULTS It has been shown in a large number of studies that 3-dimensional topology of a protein structure is well captured by a graph, making it possible to analyze a complex protein structure as a mathematical entity. In this study we show that eigen spectra profile of a protein structure graph exhibits a unique repetitive profile for contiguous repeating units enabling the detection of the repeat region and the repeat type. The proposed approach uses a non-redundant set of 58 Ankyrin proteins to define rules for the detection of Ankyrin repeat motifs. It is evaluated on a set of 370 proteins comprising 125 known Ankyrin proteins and remaining non-solenoid proteins and the prediction compared with UniProt annotation, sequence-based approach, RADAR, and structure-based approach, ConSole. To show the efficacy of the approach, we analyzed the complete PDB structural database and identified 641 previously unrecognized Ankyrin repeat proteins. We observe a unique eigen spectra profile for different repeat types and show that the method can be easily extended to detect other repeat types. It is implemented as a web server, AnkPred. It is freely available at 'bioinf.iiit.ac.in/AnkPred'. CONCLUSIONS AnkPred provides an elegant and computationally efficient graph-based approach for detecting Ankyrin structural repeats in proteins. By analyzing the eigen spectra of the protein structure graph and secondary structure information, characteristic features of a known repeat family are identified. This method is especially useful in correctly identifying new members of a repeat family.
Collapse
Affiliation(s)
- Broto Chakrabarty
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India.
| | - Nita Parekh
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India.
| |
Collapse
|
22
|
Abstract
Biomolecules are the prime information processing elements of living matter. Most of these inanimate systems are polymers that compute their own structures and dynamics using as input seemingly random character strings of their sequence, following which they coalesce and perform integrated cellular functions. In large computational systems with finite interaction-codes, the appearance of conflicting goals is inevitable. Simple conflicting forces can lead to quite complex structures and behaviors, leading to the concept of frustration in condensed matter. We present here some basic ideas about frustration in biomolecules and how the frustration concept leads to a better appreciation of many aspects of the architecture of biomolecules, and especially how biomolecular structure connects to function by means of localized frustration. These ideas are simultaneously both seductively simple and perilously subtle to grasp completely. The energy landscape theory of protein folding provides a framework for quantifying frustration in large systems and has been implemented at many levels of description. We first review the notion of frustration from the areas of abstract logic and its uses in simple condensed matter systems. We discuss then how the frustration concept applies specifically to heteropolymers, testing folding landscape theory in computer simulations of protein models and in experimentally accessible systems. Studying the aspects of frustration averaged over many proteins provides ways to infer energy functions useful for reliable structure prediction. We discuss how frustration affects folding mechanisms. We review here how the biological functions of proteins are related to subtle local physical frustration effects and how frustration influences the appearance of metastable states, the nature of binding processes, catalysis and allosteric transitions. In this review, we also emphasize that frustration, far from being always a bad thing, is an essential feature of biomolecules that allows dynamics to be harnessed for function. In this way, we hope to illustrate how Frustration is a fundamental concept in molecular biology.
Collapse
|
23
|
Neira JL. Structural dissection of the C-terminal sterile alpha motif (SAM) of human p73. Arch Biochem Biophys 2014; 558:133-42. [DOI: 10.1016/j.abb.2014.07.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Revised: 07/01/2014] [Accepted: 07/06/2014] [Indexed: 10/25/2022]
|
24
|
Hrabe T, Godzik A. ConSole: using modularity of contact maps to locate solenoid domains in protein structures. BMC Bioinformatics 2014; 15:119. [PMID: 24766872 PMCID: PMC4021314 DOI: 10.1186/1471-2105-15-119] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 04/17/2014] [Indexed: 11/10/2022] Open
Abstract
Background Periodic proteins, characterized by the presence of multiple repeats of short motifs, form an interesting and seldom-studied group. Due to often extreme divergence in sequence, detection and analysis of such motifs is performed more reliably on the structural level. Yet, few algorithms have been developed for the detection and analysis of structures of periodic proteins. Results ConSole recognizes modularity in protein contact maps, allowing for precise identification of repeats in solenoid protein structures, an important subgroup of periodic proteins. Tests on benchmarks show that ConSole has higher recognition accuracy as compared to Raphael, the only other publicly available solenoid structure detection tool. As a next step of ConSole analysis, we show how detection of solenoid repeats in structures can be used to improve sequence recognition of these motifs and to detect subtle irregularities of repeat lengths in three solenoid protein families. Conclusions The ConSole algorithm provides a fast and accurate tool to recognize solenoid protein structures as a whole and to identify individual solenoid repeat units from a structure. ConSole is available as a web-based, interactive server and is available for download at http://console.sanfordburnham.org.
Collapse
Affiliation(s)
| | - Adam Godzik
- Program in Bioinformatics and Systems Biology, Sanford-Burnham Medical Research Institute, 92037 La Jolla, CA, USA.
| |
Collapse
|
25
|
Konrat R. NMR contributions to structural dynamics studies of intrinsically disordered proteins. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2014; 241:74-85. [PMID: 24656082 PMCID: PMC3985426 DOI: 10.1016/j.jmr.2013.11.011] [Citation(s) in RCA: 130] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2013] [Revised: 11/13/2013] [Accepted: 11/18/2013] [Indexed: 05/04/2023]
Abstract
Intrinsically disordered proteins (IDPs) are characterized by substantial conformational plasticity. Given their inherent structural flexibility X-ray crystallography is not applicable to study these proteins. In contrast, NMR spectroscopy offers unique opportunities for structural and dynamic studies of IDPs. The past two decades have witnessed significant development of NMR spectroscopy that couples advances in spin physics and chemistry with a broad range of applications. This article will summarize key advances in basic physical-chemistry and NMR methodology, outline their limitations and envision future R&D directions.
Collapse
Affiliation(s)
- Robert Konrat
- Department of Structural and Computational Biology, Max F. Perutz Laboratories, University of Vienna, Campus Vienna Biocenter 5, A-1030 Vienna, Austria.
| |
Collapse
|
26
|
Di Domenico T, Potenza E, Walsh I, Parra RG, Giollo M, Minervini G, Piovesan D, Ihsan A, Ferrari C, Kajava AV, Tosatto SCE. RepeatsDB: a database of tandem repeat protein structures. Nucleic Acids Res 2013; 42:D352-7. [PMID: 24311564 PMCID: PMC3964956 DOI: 10.1093/nar/gkt1175] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
RepeatsDB (http://repeatsdb.bio.unipd.it/) is a database of annotated tandem repeat protein structures. Tandem repeats pose a difficult problem for the analysis of protein structures, as the underlying sequence can be highly degenerate. Several repeat types haven been studied over the years, but their annotation was done in a case-by-case basis, thus making large-scale analysis difficult. We developed RepeatsDB to fill this gap. Using state-of-the-art repeat detection methods and manual curation, we systematically annotated the Protein Data Bank, predicting 10 745 repeat structures. In all, 2797 structures were classified according to a recently proposed classification schema, which was expanded to accommodate new findings. In addition, detailed annotations were performed in a subset of 321 proteins. These annotations feature information on start and end positions for the repeat regions and units. RepeatsDB is an ongoing effort to systematically classify and annotate structural protein repeats in a consistent way. It provides users with the possibility to access and download high-quality datasets either interactively or programmatically through web services.
Collapse
Affiliation(s)
- Tomás Di Domenico
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy, Department of Biological Chemistry, Universidad de Buenos Aires, Buenos Aires C1428EGA, Argentina, Department of Information Engineering, University of Padua, 35121 Padova, Italy, Department of Biosciences, COMSATS Institute of Information Technology, Sahiwal, Pakistan, Centre de Recherches de Biochimie Macromoléculaire, CNRS, 34293 Montpellier Cedex 5, France and Institut de Biologie Computationnelle, 34293 Montpellier Cedex 5, France
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|