1
|
Monzon AM, Arrías PN, Elofsson A, Mier P, Andrade-Navarro MA, Bevilacqua M, Clementel D, Bateman A, Hirsh L, Fornasari MS, Parisi G, Piovesan D, Kajava AV, Tosatto SCE. A STRP-ed definition of Structured Tandem Repeats in Proteins. J Struct Biol 2023; 215:108023. [PMID: 37652396 DOI: 10.1016/j.jsb.2023.108023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 07/31/2023] [Accepted: 08/28/2023] [Indexed: 09/02/2023]
Abstract
Tandem Repeat Proteins (TRPs) are a class of proteins with repetitive amino acid sequences that have been studied extensively for over two decades. Different features at the level of sequence, structure, function and evolution have been attributed to them by various authors. And yet many of its salient features appear only when looking at specific subclasses of protein tandem repeats. Here, we attempt to rationalize the existing knowledge on Tandem Repeat Proteins (TRPs) by pointing out several dichotomies. The emerging picture is more nuanced than generally assumed and allows us to draw some boundaries of what is not a "proper" TRP. We conclude with an operational definition of a specific subset, which we have denominated STRPs (Structural Tandem Repeat Proteins), which separates a subclass of tandem repeats with distinctive features from several other less well-defined types of repeats. We believe that this definition will help researchers in the field to better characterize the biological meaning of this large yet largely understudied group of proteins.
Collapse
Affiliation(s)
- Alexander Miguel Monzon
- Dept. of Information Engineering, University of Padova, via Giovanni Gradenigo 6/B, 35131 Padova, Italy
| | - Paula Nazarena Arrías
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Arne Elofsson
- Dept. of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Tomtebodavägen 23, 171 21 Solna, Sweden
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Martina Bevilacqua
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Damiano Clementel
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Layla Hirsh
- Dept. of Engineering, Faculty of Science and Engineering, Pontifical Catholic University of Peru, Av. Universitaria 1801 San Miguel, Lima 32, Lima, Peru
| | - Maria Silvina Fornasari
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, 34293 Montpellier, France
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy.
| |
Collapse
|
2
|
Abstract
Repeat proteins are made with tandem copies of similar amino acid stretches that fold into elongated architectures. These proteins constitute excellent model systems to investigate how evolution relates to structure, folding, and function. Here, we propose a scheme to map evolutionary information at the sequence level to a coarse-grained model for repeat-protein folding and use it to investigate the folding of thousands of repeat proteins. We model the energetics by a combination of an inverse Potts-model scheme with an explicit mechanistic model of duplications and deletions of repeats to calculate the evolutionary parameters of the system at the single-residue level. These parameters are used to inform an Ising-like model that allows for the generation of folding curves, apparent domain emergence, and occupation of intermediate states that are highly compatible with experimental data in specific case studies. We analyzed the folding of thousands of natural Ankyrin repeat proteins and found that a multiplicity of folding mechanisms are possible. Fully cooperative all-or-none transitions are obtained for arrays with enough sequence-similar elements and strong interactions between them, while noncooperative element-by-element intermittent folding arose if the elements are dissimilar and the interactions between them are energetically weak. Additionally, we characterized nucleation-propagation and multidomain folding mechanisms. We show that the global stability and cooperativity of the repeating arrays can be predicted from simple sequence scores.
Collapse
|
3
|
Merski M, Macedo-Ribeiro S, Wieczorek RM, Górna MW. The Repeating, Modular Architecture of the HtrA Proteases. Biomolecules 2022; 12:biom12060793. [PMID: 35740918 PMCID: PMC9221053 DOI: 10.3390/biom12060793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 06/02/2022] [Accepted: 06/04/2022] [Indexed: 02/04/2023] Open
Abstract
A conserved, 26-residue sequence [AA(X2)[A/G][G/L](X2)GDV[I/L](X2)[V/L]NGE(X1)V(X6)] and corresponding structure repeating module were identified within the HtrA protease family using a non-redundant set (N = 20) of publicly available structures. While the repeats themselves were far from sequence perfect, they had notable conservation to a statistically significant level. Three or more repetitions were identified within each protein despite being statistically expected to randomly occur only once per 1031 residues. This sequence repeat was associated with a six stranded antiparallel β-barrel module, two of which are present in the core of the structures of the PA clan of serine proteases, while a modified version of this module could be identified in the PDZ-like domains. Automated structural alignment methods had difficulties in superimposing these β-barrels, but the use of a target human HtrA2 structure showed that these modules had an average RMSD across the set of structures of less than 2 Å (mean and median). Our findings support Dayhoff’s hypothesis that complex proteins arose through duplication of simpler peptide motifs and domains.
Collapse
Affiliation(s)
- Matthew Merski
- Structural Biology Group, Biological and Chemical Research Centre, Faculty of Chemistry, University of Warsaw, Żwirki i Wigury 101, 02-089 Warsaw, Poland
- Correspondence: (M.M.); (M.W.G.); Tel.: +48-225-526-642 (M.M.)
| | - Sandra Macedo-Ribeiro
- Instituto de Investigação e Inovação em Saúde and Instituto de Biologia Molecular e Celular (IBMC), Universidade do Porto, 4200-135 Porto, Portugal;
| | - Rafal M. Wieczorek
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland;
| | - Maria W. Górna
- Structural Biology Group, Biological and Chemical Research Centre, Faculty of Chemistry, University of Warsaw, Żwirki i Wigury 101, 02-089 Warsaw, Poland
- Correspondence: (M.M.); (M.W.G.); Tel.: +48-225-526-642 (M.M.)
| |
Collapse
|
4
|
Abstract
Abstract
Ankyrin (ANK) repeat proteins are coded by tandem occurrences of patterns with around 33 amino acids. They often mediate protein–protein interactions in a diversity of biological systems. These proteins have an elongated non-globular shape and often display complex folding mechanisms. This work investigates the energy landscape of representative proteins of this class made up of 3, 4 and 6 ANK repeats using the energy-landscape visualisation method (ELViM). By combining biased and unbiased coarse-grained molecular dynamics AWSEM simulations that sample conformations along the folding trajectories with the ELViM structure-based phase space, one finds a three-dimensional representation of the globally funnelled energy surface. In this representation, it is possible to delineate distinct folding pathways. We show that ELViMs can project, in a natural way, the intricacies of the highly dimensional energy landscapes encoded by the highly symmetric ankyrin repeat proteins into useful low-dimensional representations. These projections can discriminate between multiplicities of specific parallel folding mechanisms that otherwise can be hidden in oversimplified depictions.
Collapse
|
5
|
Izert MA, Szybowska PE, Górna MW, Merski M. The Effect of Mutations in the TPR and Ankyrin Families of Alpha Solenoid Repeat Proteins. FRONTIERS IN BIOINFORMATICS 2021; 1:696368. [PMID: 36303725 PMCID: PMC9581033 DOI: 10.3389/fbinf.2021.696368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 06/22/2021] [Indexed: 11/20/2022] Open
Abstract
Protein repeats are short, highly similar peptide motifs that occur several times within a single protein, for example the TPR and Ankyrin repeats. Understanding the role of mutation in these proteins is complicated by the competing facts that 1) the repeats are much more restricted to a set sequence than non-repeat proteins, so mutations should be harmful much more often because there are more residues that are heavily restricted due to the need of the sequence to repeat and 2) the symmetry of the repeats in allows the distribution of functional contributions over a number of residues so that sometimes no specific site is singularly responsible for function (unlike enzymatic active site catalytic residues). To address this issue, we review the effects of mutations in a number of natural repeat proteins from the tetratricopeptide and Ankyrin repeat families. We find that mutations are context dependent. Some mutations are indeed highly disruptive to the function of the protein repeats while mutations in identical positions in other repeats in the same protein have little to no effect on structure or function.
Collapse
Affiliation(s)
| | | | | | - Matthew Merski
- *Correspondence: Maria Wiktoria Górna, ; Matthew Merski,
| |
Collapse
|
6
|
Folding and Stability of Ankyrin Repeats Control Biological Protein Function. Biomolecules 2021; 11:biom11060840. [PMID: 34198779 PMCID: PMC8229355 DOI: 10.3390/biom11060840] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 05/25/2021] [Accepted: 06/01/2021] [Indexed: 01/04/2023] Open
Abstract
Ankyrin repeat proteins are found in all three kingdoms of life. Fundamentally, these proteins are involved in protein-protein interaction in order to activate or suppress biological processes. The basic architecture of these proteins comprises repeating modules forming elongated structures. Due to the lack of long-range interactions, a graded stability among the repeats is the generic properties of this protein family determining both protein folding and biological function. Protein folding intermediates were frequently found to be key for the biological functions of repeat proteins. In this review, we discuss most recent findings addressing this close relation for ankyrin repeat proteins including DARPins, Notch receptor ankyrin repeat domain, IκBα inhibitor of NFκB, and CDK inhibitor p19INK4d. The role of local folding and unfolding and gradual stability of individual repeats will be discussed during protein folding, protein-protein interactions, and post-translational modifications. The conformational changes of these repeats function as molecular switches for biological regulation, a versatile property for modern drug discovery.
Collapse
|
7
|
Paladin L, Necci M, Piovesan D, Mier P, Andrade-Navarro MA, Tosatto SCE. A novel approach to investigate the evolution of structured tandem repeat protein families by exon duplication. J Struct Biol 2020; 212:107608. [PMID: 32896658 DOI: 10.1016/j.jsb.2020.107608] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 08/19/2020] [Accepted: 08/21/2020] [Indexed: 11/30/2022]
Abstract
Tandem Repeat Proteins (TRPs) are ubiquitous in cells and are enriched in eukaryotes. They contributed to the evolution of organism complexity, specializing for functions that require quick adaptability such as immunity-related functions. To investigate the hypothesis of repeat protein evolution through exon duplication and rearrangement, we designed a tool to analyze the relationships between exon/intron patterns and structural symmetries. The tool allows comparison of the structure fragments as defined by exon/intron boundaries from Ensembl against the structural element repetitions from RepeatsDB. The all-against-all pairwise structural alignment between fragments and comparison of the two definitions (structural units and exons) are visualized in a single matrix, the "repeat/exon plot". An analysis of different repeat protein families, including the solenoids Leucine-Rich, Ankyrin, Pumilio, HEAT repeats and the β propellers Kelch-like, WD40 and RCC1, shows different behaviors, illustrated here through examples. For each example, the analysis of the exon mapping in homologous proteins supports the conservation of their exon patterns. We propose that when a clear-cut relationship between exon and structural boundaries can be identified, it is possible to infer a specific "evolutionary pattern" which may improve TRPs detection and classification.
Collapse
Affiliation(s)
| | - Marco Necci
- Dept. of Biomedical Sciences, University of Padova, Italy
| | | | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University of Mainz, Germany
| | | | | |
Collapse
|
8
|
Galpern EA, Freiberger MI, Ferreiro DU. Large Ankyrin repeat proteins are formed with similar and energetically favorable units. PLoS One 2020; 15:e0233865. [PMID: 32579546 PMCID: PMC7314423 DOI: 10.1371/journal.pone.0233865] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 05/13/2020] [Indexed: 11/19/2022] Open
Abstract
Ankyrin containing proteins are one of the most abundant repeat protein families present in all extant organisms. They are made with tandem copies of similar amino acid stretches that fold into elongated architectures. Here, we built and curated a dataset of 200 thousand proteins that contain 1.2 million Ankyrin regions and characterize the abundance, structure and energetics of the repetitive regions in natural proteins. We found that there is a continuous roughly exponential variety of array lengths with an exceptional frequency at 24 repeats. We described that individual repeats are seldom interrupted with long insertions and accept few deletions, in line with the known tertiary structures. We found that longer arrays are made up of repeats that are more similar to each other than shorter arrays, and display more favourable folding energy, hinting at their evolutionary origin. The array distributions show that there is a physical upper limit to the size of an array of repeats of about 120 copies, consistent with the limit found in nature. The identity patterns within the arrays suggest that they may have originated by sequential copies of more than one Ankyrin unit.
Collapse
Affiliation(s)
- Ezequiel A. Galpern
- Protein Physiology Lab, Departamento de Química Biológica, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN-CONICE), Universidad de Buenos Aires, Buenos Aires, Argentina
| | - María I. Freiberger
- Protein Physiology Lab, Departamento de Química Biológica, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN-CONICE), Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Diego U. Ferreiro
- Protein Physiology Lab, Departamento de Química Biológica, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN-CONICE), Universidad de Buenos Aires, Buenos Aires, Argentina
- * E-mail:
| |
Collapse
|
9
|
Merski M, Młynarczyk K, Ludwiczak J, Skrzeczkowski J, Dunin-Horkawicz S, Górna MW. Self-analysis of repeat proteins reveals evolutionarily conserved patterns. BMC Bioinformatics 2020; 21:179. [PMID: 32381046 PMCID: PMC7204011 DOI: 10.1186/s12859-020-3493-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 04/15/2020] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Protein repeats can confound sequence analyses because the repetitiveness of their amino acid sequences lead to difficulties in identifying whether similar repeats are due to convergent or divergent evolution. We noted that the patterns derived from traditional "dot plot" protein sequence self-similarity analysis tended to be conserved in sets of related repeat proteins and this conservation could be quantitated using a Jaccard metric. RESULTS Comparison of these dot plots obviated the issues due to sequence similarity for analysis of repeat proteins. A high Jaccard similarity score was suggestive of a conserved relationship between closely related repeat proteins. The dot plot patterns decayed quickly in the absence of selective pressure with an expected loss of 50% of Jaccard similarity due to a loss of 8.2% sequence identity. To perform method testing, we assembled a standard set of 79 repeat proteins representing all the subgroups in RepeatsDB. Comparison of known repeat and non-repeat proteins from the PDB suggested that the information content in dot plots could be used to identify repeat proteins from pure sequence with no requirement for structural information. Analysis of the UniRef90 database suggested that 16.9% of all known proteins could be classified as repeat proteins. These 13.3 million putative repeat protein chains were clustered and a significant amount (82.9%) of clusters containing between 5 and 200 members were of a single functional type. CONCLUSIONS Dot plot analysis of repeat proteins attempts to obviate issues that arise due to the sequence degeneracy of repeat proteins. These results show that this kind of analysis can efficiently be applied to analyze repeat proteins on a large scale.
Collapse
Affiliation(s)
- Matthew Merski
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Krzysztof Młynarczyk
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Jan Ludwiczak
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
- Laboratory of Bioinformatics, Nencki Institute of Experimental Biology, Warsaw, Poland
| | - Jakub Skrzeczkowski
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Stanisław Dunin-Horkawicz
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Maria W. Górna
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| |
Collapse
|
10
|
Turjanski P, Ferreiro DU. On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences. J Phys Chem B 2018; 122:11295-11301. [PMID: 30239207 DOI: 10.1021/acs.jpcb.8b07206] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
All known terrestrial proteins are coded as continuous strings of ≈20 amino acids. The patterns formed by the repetitions of elements in groups of finite sequences describes the natural architectures of protein families. We present a method to search for patterns and groupings of patterns in protein sequences using a mathematically precise definition for "repetition", an efficient algorithmic implementation and a robust scoring system with no adjustable parameters. We show that the sequence patterns can be well-separated into disjoint classes according to their recurrence in nested structures. The statistics of the occurrences of patterns indicate that short repetitions are sufficient to account for the differences between natural families and randomized groups of sequences by more than 10 standard deviations, while contiguous sequence patterns shorter than 5 residues are effectively random in their occurrences. A small subset of patterns is sufficient to account for a robust "familiarity" definition between arbitrary sets of sequences.
Collapse
Affiliation(s)
- Pablo Turjanski
- KAPOW, Departamento de Computación , Facultad de Ciencias Exactas y Naturales, UBA-CONICET-ICC , Buenos Aires , Argentina
| | - Diego U Ferreiro
- Protein Physiology Lab, Departamento de Química Biológica , Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN , Buenos Aires , Argentina
| |
Collapse
|
11
|
Pugacheva V, Korotkov A, Korotkov E. Search of latent periodicity in amino acid sequences by means of genetic algorithm and dynamic programming. Stat Appl Genet Mol Biol 2017; 15:381-400. [PMID: 27337743 DOI: 10.1515/sagmb-2015-0079] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The aim of this study was to show that amino acid sequences have a latent periodicity with insertions and deletions of amino acids in unknown positions of the analyzed sequence. Genetic algorithm, dynamic programming and random weight matrices were used to develop a new mathematical algorithm for latent periodicity search. A multiple alignment of periods was calculated with help of the direct optimization of the position-weight matrix without using pairwise alignments. The developed algorithm was applied to analyze amino acid sequences of a small number of proteins. This study showed the presence of latent periodicity with insertions and deletions in the amino acid sequences of such proteins, for which the presence of latent periodicity was not previously known. The origin of latent periodicity with insertions and deletions is discussed.
Collapse
|
12
|
Inferring repeat-protein energetics from evolutionary information. PLoS Comput Biol 2017; 13:e1005584. [PMID: 28617812 PMCID: PMC5491312 DOI: 10.1371/journal.pcbi.1005584] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 06/29/2017] [Accepted: 05/21/2017] [Indexed: 11/19/2022] Open
Abstract
Natural protein sequences contain a record of their history. A common constraint in a given protein family is the ability to fold to specific structures, and it has been shown possible to infer the main native ensemble by analyzing covariations in extant sequences. Still, many natural proteins that fold into the same structural topology show different stabilization energies, and these are often related to their physiological behavior. We propose a description for the energetic variation given by sequence modifications in repeat proteins, systems for which the overall problem is simplified by their inherent symmetry. We explicitly account for single amino acid and pair-wise interactions and treat higher order correlations with a single term. We show that the resulting evolutionary field can be interpreted with structural detail. We trace the variations in the energetic scores of natural proteins and relate them to their experimental characterization. The resulting energetic evolutionary field allows the prediction of the folding free energy change for several mutants, and can be used to generate synthetic sequences that are statistically indistinguishable from the natural counterparts.
Collapse
|
13
|
Detailing Protein Landscapes under Pressure. Biophys J 2016; 111:2339-2341. [PMID: 27926834 DOI: 10.1016/j.bpj.2016.10.038] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Revised: 10/17/2016] [Accepted: 10/27/2016] [Indexed: 11/23/2022] Open
|
14
|
Persi E, Wolf YI, Koonin EV. Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins. Nat Commun 2016; 7:13570. [PMID: 27857066 PMCID: PMC5120217 DOI: 10.1038/ncomms13570] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2016] [Accepted: 10/17/2016] [Indexed: 01/21/2023] Open
Abstract
Protein repeats are considered hotspots of protein evolution, associated with acquisition of new functions and novel phenotypic traits, including disease. Paradoxically, however, repeats are often strongly conserved through long spans of evolution. To resolve this conundrum, it is necessary to directly compare paralogous (horizontal) evolution of repeats within proteins with their orthologous (vertical) evolution through speciation. Here we develop a rigorous methodology to identify highly periodic repeats with significant sequence similarity, for which evolutionary rates and selection (dN/dS) can be estimated, and systematically characterize their evolution. We show that horizontal evolution of repeats is markedly accelerated compared with their divergence from orthologues in closely related species. This observation is universal across the diversity of life forms and implies a biphasic evolutionary regime whereby new copies experience rapid functional divergence under combined effects of strongly relaxed purifying selection and positive selection, followed by fixation and conservation of each individual repeat.
Collapse
Affiliation(s)
- Erez Persi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
15
|
Turjanski P, Parra RG, Espada R, Becher V, Ferreiro DU. Protein Repeats from First Principles. Sci Rep 2016; 6:23959. [PMID: 27044676 PMCID: PMC4820709 DOI: 10.1038/srep23959] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 03/16/2016] [Indexed: 01/09/2023] Open
Abstract
Some natural proteins display recurrent structural patterns. Despite being highly similar at the tertiary structure level, repeating patterns within a single repeat protein can be extremely variable at the sequence level. We use a mathematical definition of a repetition and investigate the occurrences of these in sequences of different protein families. We found that long stretches of perfect repetitions are infrequent in individual natural proteins, even for those which are known to fold into structures of recurrent structural motifs. We found that natural repeat proteins are indeed repetitive in their families, exhibiting abundant stretches of 6 amino acids or longer that are perfect repetitions in the reference family. We provide a systematic quantification for this repetitiveness. We show that this form of repetitiveness is not exclusive of repeat proteins, but also occurs in globular domains. A by-product of this work is a fast quantification of the likelihood of a protein to belong to a family.
Collapse
Affiliation(s)
- Pablo Turjanski
- Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - R Gonzalo Parra
- Protein Physiology Lab, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires-CONICET-IQUIBICEN, Buenos Aires, Argentina
| | - Rocío Espada
- Protein Physiology Lab, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires-CONICET-IQUIBICEN, Buenos Aires, Argentina
| | - Verónica Becher
- Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Diego U Ferreiro
- Protein Physiology Lab, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires-CONICET-IQUIBICEN, Buenos Aires, Argentina
| |
Collapse
|
16
|
Parra RG, Espada R, Verstraete N, Ferreiro DU. Structural and Energetic Characterization of the Ankyrin Repeat Protein Family. PLoS Comput Biol 2015; 11:e1004659. [PMID: 26691182 PMCID: PMC4687027 DOI: 10.1371/journal.pcbi.1004659] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 11/10/2015] [Indexed: 11/21/2022] Open
Abstract
Ankyrin repeat containing proteins are one of the most abundant solenoid folds. Usually implicated in specific protein-protein interactions, these proteins are readily amenable for design, with promising biotechnological and biomedical applications. Studying repeat protein families presents technical challenges due to the high sequence divergence among the repeating units. We developed and applied a systematic method to consistently identify and annotate the structural repetitions over the members of the complete Ankyrin Repeat Protein Family, with increased sensitivity over previous studies. We statistically characterized the number of repeats, the folding of the repeat-arrays, their structural variations, insertions and deletions. An energetic analysis of the local frustration patterns reveal the basic features underlying fold stability and its relation to the functional binding regions. We found a strong linear correlation between the conservation of the energetic features in the repeat arrays and their sequence variations, and discuss new insights into the organization and function of these ubiquitous proteins. Some natural proteins are formed with repetitions of similar amino acid stretches. Ankyrin-repeat proteins constitute one of the most abundant families of this class of proteins that serve as model systems to analyze how variations in sequences exert effects in structures and biological functions. We present an in-depth analysis of the ankyrin repeat protein family, characterizing the variations in the repeating arrays both at the structural and energetic level. We introduce a consistent annotation for the repeat characteristics and describe how the structural differences are related to the sequences by their underlying energetic signatures.
Collapse
Affiliation(s)
- R. Gonzalo Parra
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina
| | - Rocío Espada
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina
| | - Nina Verstraete
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina
| | - Diego U. Ferreiro
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina
- * E-mail:
| |
Collapse
|