1
|
Liu J, Maxwell M, Cuddihy T, Crawford T, Bassetti M, Hyde C, Peigneur S, Tytgat J, Undheim EAB, Mobli M. ScrepYard: An online resource for disulfide-stabilized tandem repeat peptides. Protein Sci 2023; 32:e4566. [PMID: 36644825 PMCID: PMC9885460 DOI: 10.1002/pro.4566] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 01/05/2023] [Accepted: 01/12/2023] [Indexed: 01/17/2023]
Abstract
Receptor avidity through multivalency is a highly sought-after property of ligands. While readily available in nature in the form of bivalent antibodies, this property remains challenging to engineer in synthetic molecules. The discovery of several bivalent venom peptides containing two homologous and independently folded domains (in a tandem repeat arrangement) has provided a unique opportunity to better understand the underpinning design of multivalency in multimeric biomolecules, as well as how naturally occurring multivalent ligands can be identified. In previous work, we classified these molecules as a larger class termed secreted cysteine-rich repeat-proteins (SCREPs). Here, we present an online resource; ScrepYard, designed to assist researchers in identification of SCREP sequences of interest and to aid in characterizing this emerging class of biomolecules. Analysis of sequences within the ScrepYard reveals that two-domain tandem repeats constitute the most abundant SCREP domain architecture, while the interdomain "linker" regions connecting the functional domains are found to be abundant in amino acids with short or polar sidechains and contain an unusually high abundance of proline residues. Finally, we demonstrate the utility of ScrepYard as a virtual screening tool for discovery of putatively multivalent peptides, by using it as a resource to identify a previously uncharacterized serine protease inhibitor and confirm its predicted activity using an enzyme assay.
Collapse
Affiliation(s)
- Junyu Liu
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia
| | - Michael Maxwell
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia
| | - Thom Cuddihy
- Queensland Cyber Infrastructure Foundation Ltd.The University of QueenslandSt. LuciaQueenslandAustralia,Centre for Clinical ResearchThe University of QueenslandSt. LuciaQueenslandAustralia
| | - Theo Crawford
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia
| | - Madeline Bassetti
- Queensland Cyber Infrastructure Foundation Ltd.The University of QueenslandSt. LuciaQueenslandAustralia
| | - Cameron Hyde
- Queensland Cyber Infrastructure Foundation Ltd.The University of QueenslandSt. LuciaQueenslandAustralia,University of the Sunshine CoastMaroochydoreQueenslandAustralia
| | - Steve Peigneur
- Toxicology and PharmacologyUniversity of Leuven (KU Leuven)LeuvenBelgium
| | - Jan Tytgat
- Toxicology and PharmacologyUniversity of Leuven (KU Leuven)LeuvenBelgium
| | - Eivind A. B. Undheim
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia,Centre for Ecological and Evolutionary Synthesis, Department of BiosciencesUniversity of OsloOsloNorway
| | - Mehdi Mobli
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia
| |
Collapse
|
2
|
Yu H, Kalutantirige FC, Yao L, Schroeder CM, Chen Q, Moore JS. Self-Assembly of Repetitive Segment and Random Segment Polymer Architectures. ACS Macro Lett 2022; 11:1366-1372. [PMID: 36413761 DOI: 10.1021/acsmacrolett.2c00495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Recent advances in chemical synthesis have created new methodologies for synthesizing sequence-controlled synthetic polymers, but rational design of monomer sequence for desired properties remains challenging. In this work, we synthesize periodic polymers with repetitive segments using a sequence-controlled ring-opening metathesis polymerization (ROMP) method, which draws inspiration from proteins containing repetitive sequence motifs. The repetitive segment architecture is shown to dramatically affect the self-assembly behavior of these materials. Our results show that polymers with identical repetitive sequences assemble into uniform spherical nanoparticles after thermal annealing, whereas copolymers with random placement of segments with different sequences exhibit disordered assemblies without a well-defined morphology. Overall, these results bring a new understanding to the role of periodic repetitive sequences in polymer assembly.
Collapse
Affiliation(s)
- Hao Yu
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Falon C Kalutantirige
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Lehan Yao
- Department of Materials Science and Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Charles M Schroeder
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Materials Science and Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Qian Chen
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Materials Science and Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Jeffrey S Moore
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Materials Science and Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
3
|
Kamel M, Kastano K, Mier P, Andrade-Navarro MA. REP2: A Web Server to Detect Common Tandem Repeats in Protein Sequences. J Mol Biol 2021; 433:166895. [PMID: 33972020 DOI: 10.1016/j.jmb.2021.166895] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 02/01/2021] [Accepted: 02/21/2021] [Indexed: 12/13/2022]
Abstract
Ensembles of tandem repeats (TRs) in protein sequences expand rapidly to form domains well suited for interactions with proteins. For this reason, they are relatively frequent. Some TRs have known structures and therefore it is advantageous to predict their presence in a protein sequence. However, since most TRs diverge quickly, their detection by classical sequence comparison algorithms is not very accurate. Previously, we developed a method and a web server that used curated profiles and thresholds for the detection of 11 common TRs. Here we present a new web server (REP2) that allows the analysis of TRs in both individual and aligned sequences. We provide currently precomputed analyses for a selection of 78 UniProt reference proteomes. We illustrate how these data can be used to study the evolution of TRs using comparative genomics. REP2 can be accessed at http://cbdm-01.zdv.uni-mainz.de/~munoz/rep/.
Collapse
Affiliation(s)
- Mohamed Kamel
- Department of Computer Science, Faculty of Mathematics and Informatics, University of M'sila, 28000 M'sila, Algeria; Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | - Kristina Kastano
- Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | | |
Collapse
|
4
|
Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, Tosatto SCE, Paladin L, Raj S, Richardson LJ, Finn RD, Bateman A. Pfam: The protein families database in 2021. Nucleic Acids Res 2021; 49:D412-D419. [PMID: 33125078 PMCID: PMC7779014 DOI: 10.1093/nar/gkaa913] [Citation(s) in RCA: 2616] [Impact Index Per Article: 872.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 10/01/2020] [Accepted: 10/06/2020] [Indexed: 12/19/2022] Open
Abstract
The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.
Collapse
Affiliation(s)
- Jaina Mistry
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sara Chuguransky
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Lowri Williams
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Matloob Qureshi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Gustavo A Salazar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| | - Lisanna Paladin
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| | - Shriya Raj
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Lorna J Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| |
Collapse
|
5
|
Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, Tosatto SCE, Paladin L, Raj S, Richardson LJ, Finn RD, Bateman A. Pfam: The protein families database in 2021. Nucleic Acids Res 2021. [PMID: 33125078 DOI: 10.6019/tol.pfam_fams-t.2018.00001.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2023] Open
Abstract
The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.
Collapse
Affiliation(s)
- Jaina Mistry
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sara Chuguransky
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Lowri Williams
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Matloob Qureshi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Gustavo A Salazar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| | - Lisanna Paladin
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| | - Shriya Raj
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Lorna J Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| |
Collapse
|
6
|
Mylemans B, Voet AR, Tame JR. The Taming of the Screw: the natural and artificial development of β-propeller proteins. Curr Opin Struct Biol 2020; 68:48-54. [PMID: 33373773 DOI: 10.1016/j.sbi.2020.11.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 11/09/2020] [Accepted: 11/27/2020] [Indexed: 12/17/2022]
Abstract
Many proteins are found to possess repeated structural elements, which hint at ancient evolutionary origins and ongoing evolutionary processes. β-propeller proteins are a large family of such proteins, and a popular focus of structural analysis. This review highlights recent work to understand how they arose, and how they have developed into one of the most successful of all protein folds.
Collapse
Affiliation(s)
- Bram Mylemans
- Laboraotry for biomolecular modelling and design, KU Leuven, Celestijnenlaan 200G, 3001 Leuven, Belgium
| | - Arnout Rd Voet
- Protein Design Laboratory, Graduate School of Medical Life Science, Yokohama City University, Suehiro 1-7-29, Tsurumi, Yokohama 230-0045, Japan
| | - Jeremy Rh Tame
- Protein Design Laboratory, Graduate School of Medical Life Science, Yokohama City University, Suehiro 1-7-29, Tsurumi, Yokohama 230-0045, Japan.
| |
Collapse
|
7
|
Paladin L, Necci M, Piovesan D, Mier P, Andrade-Navarro MA, Tosatto SCE. A novel approach to investigate the evolution of structured tandem repeat protein families by exon duplication. J Struct Biol 2020; 212:107608. [PMID: 32896658 DOI: 10.1016/j.jsb.2020.107608] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 08/19/2020] [Accepted: 08/21/2020] [Indexed: 11/30/2022]
Abstract
Tandem Repeat Proteins (TRPs) are ubiquitous in cells and are enriched in eukaryotes. They contributed to the evolution of organism complexity, specializing for functions that require quick adaptability such as immunity-related functions. To investigate the hypothesis of repeat protein evolution through exon duplication and rearrangement, we designed a tool to analyze the relationships between exon/intron patterns and structural symmetries. The tool allows comparison of the structure fragments as defined by exon/intron boundaries from Ensembl against the structural element repetitions from RepeatsDB. The all-against-all pairwise structural alignment between fragments and comparison of the two definitions (structural units and exons) are visualized in a single matrix, the "repeat/exon plot". An analysis of different repeat protein families, including the solenoids Leucine-Rich, Ankyrin, Pumilio, HEAT repeats and the β propellers Kelch-like, WD40 and RCC1, shows different behaviors, illustrated here through examples. For each example, the analysis of the exon mapping in homologous proteins supports the conservation of their exon patterns. We propose that when a clear-cut relationship between exon and structural boundaries can be identified, it is possible to infer a specific "evolutionary pattern" which may improve TRPs detection and classification.
Collapse
Affiliation(s)
| | - Marco Necci
- Dept. of Biomedical Sciences, University of Padova, Italy
| | | | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University of Mainz, Germany
| | | | | |
Collapse
|
8
|
Chavali S, Singh AK, Santhanam B, Babu MM. Amino acid homorepeats in proteins. Nat Rev Chem 2020; 4:420-434. [PMID: 37127972 DOI: 10.1038/s41570-020-0204-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/04/2020] [Indexed: 12/16/2022]
Abstract
Amino acid homorepeats, or homorepeats, are polypeptide segments found in proteins that contain stretches of identical amino acid residues. Although abnormal homorepeat expansions are linked to pathologies such as neurodegenerative diseases, homorepeats are prevalent in eukaryotic proteomes, suggesting that they are important for normal physiology. In this Review, we discuss recent advances in our understanding of the biological functions of homorepeats, which range from facilitating subcellular protein localization to mediating interactions between proteins across diverse cellular pathways. We explore how the functional diversity of homorepeat-containing proteins could be linked to the ability of homorepeats to adopt different structural conformations, an ability influenced by repeat composition, repeat length and the nature of flanking sequences. We conclude by highlighting how an understanding of homorepeats will help us better characterize and develop therapeutics against the human diseases to which they contribute.
Collapse
Affiliation(s)
- Sreenivas Chavali
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, UK.
- Department of Biology, Indian Institute of Science Education and Research (IISER) Tirupati, Tirupati, India.
| | - Anjali K Singh
- Department of Biology, Indian Institute of Science Education and Research (IISER) Tirupati, Tirupati, India
| | - Balaji Santhanam
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, UK
- Department of Structural Biology and Center for Data Driven Discovery, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - M Madan Babu
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, UK.
- Department of Structural Biology and Center for Data Driven Discovery, St. Jude Children's Research Hospital, Memphis, TN, USA.
| |
Collapse
|
9
|
|
10
|
Galpern EA, Freiberger MI, Ferreiro DU. Large Ankyrin repeat proteins are formed with similar and energetically favorable units. PLoS One 2020; 15:e0233865. [PMID: 32579546 PMCID: PMC7314423 DOI: 10.1371/journal.pone.0233865] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 05/13/2020] [Indexed: 11/19/2022] Open
Abstract
Ankyrin containing proteins are one of the most abundant repeat protein families present in all extant organisms. They are made with tandem copies of similar amino acid stretches that fold into elongated architectures. Here, we built and curated a dataset of 200 thousand proteins that contain 1.2 million Ankyrin regions and characterize the abundance, structure and energetics of the repetitive regions in natural proteins. We found that there is a continuous roughly exponential variety of array lengths with an exceptional frequency at 24 repeats. We described that individual repeats are seldom interrupted with long insertions and accept few deletions, in line with the known tertiary structures. We found that longer arrays are made up of repeats that are more similar to each other than shorter arrays, and display more favourable folding energy, hinting at their evolutionary origin. The array distributions show that there is a physical upper limit to the size of an array of repeats of about 120 copies, consistent with the limit found in nature. The identity patterns within the arrays suggest that they may have originated by sequential copies of more than one Ankyrin unit.
Collapse
Affiliation(s)
- Ezequiel A. Galpern
- Protein Physiology Lab, Departamento de Química Biológica, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN-CONICE), Universidad de Buenos Aires, Buenos Aires, Argentina
| | - María I. Freiberger
- Protein Physiology Lab, Departamento de Química Biológica, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN-CONICE), Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Diego U. Ferreiro
- Protein Physiology Lab, Departamento de Química Biológica, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN-CONICE), Universidad de Buenos Aires, Buenos Aires, Argentina
- * E-mail:
| |
Collapse
|
11
|
Piovesan D, Tosatto SCE. INGA 2.0: improving protein function prediction for the dark proteome. Nucleic Acids Res 2020; 47:W373-W378. [PMID: 31073595 PMCID: PMC6602455 DOI: 10.1093/nar/gkz375] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 04/29/2019] [Accepted: 04/30/2019] [Indexed: 12/21/2022] Open
Abstract
Our current knowledge of complex biological systems is stored in a computable form through the Gene Ontology (GO) which provides a comprehensive description of genes function. Prediction of GO terms from the sequence remains, however, a challenging task, which is particularly critical for novel genomes. Here we present INGA 2.0, a new version of the INGA software for protein function prediction. INGA exploits homology, domain architecture, interaction networks and information from the ‘dark proteome’, like transmembrane and intrinsically disordered regions, to generate a consensus prediction. INGA was ranked in the top ten methods on both CAFA2 and CAFA3 blind tests. The new algorithm can process entire genomes in a few hours or even less when additional input files are provided. The new interface provides a better user experience by integrating filters and widgets to explore the graph structure of the predicted terms. The INGA web server, databases and benchmarking are available from URL: https://inga.bio.unipd.it/.
Collapse
Affiliation(s)
- Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, Padua, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, Padua, Italy.,CNR Institute of Neuroscience, Padua, Italy
| |
Collapse
|
12
|
Merski M, Młynarczyk K, Ludwiczak J, Skrzeczkowski J, Dunin-Horkawicz S, Górna MW. Self-analysis of repeat proteins reveals evolutionarily conserved patterns. BMC Bioinformatics 2020; 21:179. [PMID: 32381046 PMCID: PMC7204011 DOI: 10.1186/s12859-020-3493-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 04/15/2020] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Protein repeats can confound sequence analyses because the repetitiveness of their amino acid sequences lead to difficulties in identifying whether similar repeats are due to convergent or divergent evolution. We noted that the patterns derived from traditional "dot plot" protein sequence self-similarity analysis tended to be conserved in sets of related repeat proteins and this conservation could be quantitated using a Jaccard metric. RESULTS Comparison of these dot plots obviated the issues due to sequence similarity for analysis of repeat proteins. A high Jaccard similarity score was suggestive of a conserved relationship between closely related repeat proteins. The dot plot patterns decayed quickly in the absence of selective pressure with an expected loss of 50% of Jaccard similarity due to a loss of 8.2% sequence identity. To perform method testing, we assembled a standard set of 79 repeat proteins representing all the subgroups in RepeatsDB. Comparison of known repeat and non-repeat proteins from the PDB suggested that the information content in dot plots could be used to identify repeat proteins from pure sequence with no requirement for structural information. Analysis of the UniRef90 database suggested that 16.9% of all known proteins could be classified as repeat proteins. These 13.3 million putative repeat protein chains were clustered and a significant amount (82.9%) of clusters containing between 5 and 200 members were of a single functional type. CONCLUSIONS Dot plot analysis of repeat proteins attempts to obviate issues that arise due to the sequence degeneracy of repeat proteins. These results show that this kind of analysis can efficiently be applied to analyze repeat proteins on a large scale.
Collapse
Affiliation(s)
- Matthew Merski
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Krzysztof Młynarczyk
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Jan Ludwiczak
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
- Laboratory of Bioinformatics, Nencki Institute of Experimental Biology, Warsaw, Poland
| | - Jakub Skrzeczkowski
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Stanisław Dunin-Horkawicz
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Maria W. Górna
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| |
Collapse
|
13
|
A New Census of Protein Tandem Repeats and Their Relationship with Intrinsic Disorder. Genes (Basel) 2020; 11:genes11040407. [PMID: 32283633 PMCID: PMC7230257 DOI: 10.3390/genes11040407] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 03/29/2020] [Accepted: 04/01/2020] [Indexed: 12/31/2022] Open
Abstract
Protein tandem repeats (TRs) are often associated with immunity-related functions and diseases. Since that last census of protein TRs in 1999, the number of curated proteins increased more than seven-fold and new TR prediction methods were published. TRs appear to be enriched with intrinsic disorder and vice versa. The significance and the biological reasons for this association are unknown. Here, we characterize protein TRs across all kingdoms of life and their overlap with intrinsic disorder in unprecedented detail. Using state-of-the-art prediction methods, we estimate that 50.9% of proteins contain at least one TR, often located at the sequence flanks. Positive linear correlation between the proportion of TRs and the protein length was observed universally, with Eukaryotes in general having more TRs, but when the difference in length is taken into account the difference is quite small. TRs were enriched with disorder-promoting amino acids and were inside intrinsically disordered regions. Many such TRs were homorepeats. Our results support that TRs mostly originate by duplication and are involved in essential functions such as transcription processes, structural organization, electron transport and iron-binding. In viruses, TRs are found in proteins essential for virulence.
Collapse
|
14
|
Alvarez-Carreño C, Coello G, Arciniega M. FiRES: A computational method for the de novo identification of internal structure similarity in proteins. Proteins 2020; 88:1169-1179. [PMID: 32112578 DOI: 10.1002/prot.25886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 11/12/2019] [Accepted: 02/24/2020] [Indexed: 11/08/2022]
Abstract
Internal structure similarity in proteins can be observed at the domain and subdomain levels. From an evolutionary perspective, structurally similar elements may arise divergently by gene duplication and fusion events but may also be the product of convergent evolution under physicochemical constraints. The characterization of proteins that contain repeated structural elements has implications for many fields of protein science including protein domain evolution, structure classification, structure prediction, and protein engineering. FiRES (Find Repeated Elements in Structure) is an algorithm that relies on a topology-independent structure alignment method to identify repeating elements in protein structure. FiRES was tested against two hand curated databases of protein repeats: MALIDUP, for very divergent duplicated domains; and RepeatsDB for short tandem repeats. The performance of FiRES was compared to that of lalign, RADAR, HHrepID, CE-symm, ReUPred, and Swelfe. FiRES was the method that most accurately detected proteins either with duplicated domains (accuracy = 0.86) or with multiple repeated units (accuracy = 0.92). FiRES is a new methodology for the discovery of proteins containing structurally similar elements. The FiRES web server is publicly available at http://fires.ifc.unam.mx. The scripts, results, and benchmarks from this study can be downloaded from https://github.com/Claualvarez/fires.
Collapse
Affiliation(s)
- Claudia Alvarez-Carreño
- Department of Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico.,School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Gerardo Coello
- Unidad de Cómputo, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Marcelino Arciniega
- Department of Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
15
|
Pagès G, Grudinin S. DeepSymmetry: using 3D convolutional networks for identification of tandem repeats and internal symmetries in protein structures. Bioinformatics 2019; 35:5113-5120. [PMID: 31161198 DOI: 10.1093/bioinformatics/btz454] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 04/16/2019] [Accepted: 05/29/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Thanks to the recent advances in structural biology, nowadays 3D structures of various proteins are solved on a routine basis. A large portion of these structures contain structural repetitions or internal symmetries. To understand the evolution mechanisms of these proteins and how structural repetitions affect the protein function, we need to be able to detect such proteins very robustly. As deep learning is particularly suited to deal with spatially organized data, we applied it to the detection of proteins with structural repetitions. RESULTS We present DeepSymmetry, a versatile method based on 3D convolutional networks that detects structural repetitions in proteins and their density maps. Our method is designed to identify tandem repeat proteins, proteins with internal symmetries, symmetries in the raw density maps, their symmetry order and also the corresponding symmetry axes. Detection of symmetry axes is based on learning 6D Veronese mappings of 3D vectors, and the median angular error of axis determination is less than one degree. We demonstrate the capabilities of our method on benchmarks with tandem-repeated proteins and also with symmetrical assemblies. For example, we have discovered about 7800 putative tandem repeat proteins in the PDB. AVAILABILITY AND IMPLEMENTATION The method is available at https://team.inria.fr/nano-d/software/deepsymmetry. It consists of a C++ executable that transforms molecular structures into volumetric density maps, and a Python code based on the TensorFlow framework for applying the DeepSymmetry model to these maps. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guillaume Pagès
- Inria, Université Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| | - Sergei Grudinin
- Inria, Université Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| |
Collapse
|
16
|
Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P, Gruca A, Grynberg M, Kajava AV, Promponas VJ, Anisimova M, Jakobsen KS, Linke D. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 2019; 47:10994-11006. [PMID: 31584084 PMCID: PMC6868369 DOI: 10.1093/nar/gkz841] [Citation(s) in RCA: 159] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 09/03/2019] [Accepted: 10/01/2019] [Indexed: 12/13/2022] Open
Abstract
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.
Collapse
Affiliation(s)
- Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton. CB10 1SD, UK
| | - Patryk Jarnot
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Aleksandra Gruca
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics PAS, Pawińskiego 5A, 02-106 Warsaw, Poland
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Universite Montpellier 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
- Institut de Biologie Computationnelle, 34095 Montpellier, France
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, PO Box 20537, CY 1678 Nicosia, Cyprus
| | - Maria Anisimova
- Institute of Applied Simulations, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Wädenswil, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Dirk Linke
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| |
Collapse
|
17
|
Aleksandrova AA, Sarti E, Forrest LR. MemSTATS: A Benchmark Set of Membrane Protein Symmetries and Pseudosymmetries. J Mol Biol 2019; 432:597-604. [PMID: 31628944 DOI: 10.1016/j.jmb.2019.09.020] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 08/30/2019] [Accepted: 09/23/2019] [Indexed: 02/06/2023]
Abstract
In membrane proteins, symmetry and pseudosymmetry often have functional or evolutionary implications. However, available symmetry detection methods have not been tested systematically on this class of proteins because of the lack of an appropriate benchmark set. Here we present MemSTATS, a publicly available benchmark set of both quaternary- and internal-symmetries in membrane protein structures. The symmetries are described in terms of order, repeated elements, and orientation of the axis with respect to the membrane plane. Moreover, using MemSTATS, we compare the performance of four widely used symmetry detection algorithms and highlight specific challenges and areas for improvement in the future.
Collapse
Affiliation(s)
- Antoniya A Aleksandrova
- Computational Structural Biology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Edoardo Sarti
- Computational Structural Biology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Lucy R Forrest
- Computational Structural Biology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
18
|
Hirsh L, Paladin L, Piovesan D, Tosatto SCE. RepeatsDB-lite: a web server for unit annotation of tandem repeat proteins. Nucleic Acids Res 2019; 46:W402-W407. [PMID: 29746699 PMCID: PMC6031040 DOI: 10.1093/nar/gky360] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 04/24/2018] [Indexed: 11/15/2022] Open
Abstract
RepeatsDB-lite (http://protein.bio.unipd.it/repeatsdb-lite) is a web server for the prediction of repetitive structural elements and units in tandem repeat (TR) proteins. TRs are a widespread but poorly annotated class of non-globular proteins carrying heterogeneous functions. RepeatsDB-lite extends the prediction to all TR types and strongly improves the performance both in terms of computational time and accuracy over previous methods, with precision above 95% for solenoid structures. The algorithm exploits an improved TR unit library derived from the RepeatsDB database to perform an iterative structural search and assignment. The web interface provides tools for analyzing the evolutionary relationships between units and manually refine the prediction by changing unit positions and protein classification. An all-against-all structure-based sequence similarity matrix is calculated and visualized in real-time for every user edit. Reviewed predictions can be submitted to RepeatsDB for review and inclusion.
Collapse
Affiliation(s)
- Layla Hirsh
- Dept. of Biomedical Sciences, University of Padua, Padua, Italy.,Dept. of Engineering, Pontificia Universidad Católica del Perú, Lima, Perú
| | - Lisanna Paladin
- Dept. of Biomedical Sciences, University of Padua, Padua, Italy
| | | | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padua, Padua, Italy.,CNR Institute of Neurosciences, Padua, Italy
| |
Collapse
|
19
|
Bliven SE, Lafita A, Rose PW, Capitani G, Prlić A, Bourne PE. Analyzing the symmetrical arrangement of structural repeats in proteins with CE-Symm. PLoS Comput Biol 2019; 15:e1006842. [PMID: 31009453 PMCID: PMC6504099 DOI: 10.1371/journal.pcbi.1006842] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 05/07/2019] [Accepted: 01/29/2019] [Indexed: 01/04/2023] Open
Abstract
Many proteins fold into highly regular and repetitive three dimensional structures. The analysis of structural patterns and repeated elements is fundamental to understand protein function and evolution. We present recent improvements to the CE-Symm tool for systematically detecting and analyzing the internal symmetry and structural repeats in proteins. In addition to the accurate detection of internal symmetry, the tool is now capable of i) reporting the type of symmetry, ii) identifying the smallest repeating unit, iii) describing the arrangement of repeats with transformation operations and symmetry axes, and iv) comparing the similarity of all the internal repeats at the residue level. CE-Symm 2.0 helps the user investigate proteins with a robust and intuitive sequence-to-structure analysis, with many applications in protein classification, functional annotation and evolutionary studies. We describe the algorithmic extensions of the method and demonstrate its applications to the study of interesting cases of protein evolution. Many protein structures show a great deal of regularity. Even within single polypeptide chains, about 25% of proteins contain self-similar repeating structures, which can be organized in ring-like symmetric arrangements or linear open repeats. The repeats are often related, and thus comparing the sequence and structure of repeats can give an idea as to the early evolutionary history of a protein family. Additionally, the conservation and divergence of repeats can lead to insights about the function of the proteins. This work describes CE-Symm 2.0, a tool for the analysis of protein symmetry. The method automatically detects internal symmetry in protein structures and produces a multiple alignment of structural repeats. The algorithm is able to detect the geometric relationships between the repeats, including cyclic, dihedral, and polyhedral symmetries, translational repeats, and cases where multiple symmetry operators are applicable in a hierarchical manner. These complex relationships can then be visualized in a graphical interface as a complete structure, as a superposition of repeats, or as a multiple alignment of the protein sequence. CE-Symm 2.0 can be systematically used for the automatic detection of internal symmetry in protein structures, or as an interactive tool for the analysis of structural repeats.
Collapse
Affiliation(s)
- Spencer E. Bliven
- Laboratory of Biomolecular Research, Paul Scherrer Institute, Villigen, Switzerland
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- Institute of Applied Simulation, Zurich University of Applied Science, Wädenswil, Switzerland
- * E-mail: (SEB), (AL)
| | - Aleix Lafita
- Laboratory of Biomolecular Research, Paul Scherrer Institute, Villigen, Switzerland
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- * E-mail: (SEB), (AL)
| | - Peter W. Rose
- RCSB Protein Data Bank, San Diego Supercomputing Center, University of California San Diego, La Jolla, California, United States of America
- Structural Bioinformatics Laboratory, San Diego Supercomputing Center, University of California San Diego, La Jolla, California, United States of America
| | - Guido Capitani
- Laboratory of Biomolecular Research, Paul Scherrer Institute, Villigen, Switzerland
- Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Andreas Prlić
- RCSB Protein Data Bank, San Diego Supercomputing Center, University of California San Diego, La Jolla, California, United States of America
| | - Philip E. Bourne
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America
| |
Collapse
|
20
|
Systematic mapping of free energy landscapes of a growing filamin domain during biosynthesis. Proc Natl Acad Sci U S A 2018; 115:9744-9749. [PMID: 30201720 PMCID: PMC6166796 DOI: 10.1073/pnas.1716252115] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Efficient synthesis and folding of proteins, avoiding misfolded states, are central to cell function. As folding may be initiated in parallel with translation, key experimental challenges are to map changes that occur in folding free energy landscapes as translation proceeds and to understand how these landscapes might be modulated by the ribosome and auxiliary factors. Here, we study the length-dependent folding of a domain from a tandem repeat protein and solve the structure of a stable folding intermediate. Although destabilized by the ribosome at equilibrium, modeling of the nonequilibrium folding pathway nevertheless indicates a significant role for proline isomerization during translation. We develop a simple model to explore the impact of cotranslational folding kinetics on misfolding hazards. Cotranslational folding (CTF) is a fundamental molecular process that ensures efficient protein biosynthesis and minimizes the formation of misfolded states. However, the complexity of this process makes it extremely challenging to obtain structural characterizations of CTF pathways. Here, we correlate observations of translationally arrested nascent chains with those of a systematic C-terminal truncation strategy. We create a detailed description of chain length-dependent free energy landscapes associated with folding of the FLN5 filamin domain, in isolation and on the ribosome, and thus, quantify a substantial destabilization of the native structure on the ribosome. We identify and characterize two folding intermediates formed in isolation, including a partially folded intermediate associated with the isomerization of a conserved cis proline residue. The slow folding associated with this process raises the prospect that neighboring unfolded domains might accumulate and misfold during biosynthesis. We develop a simple model to quantify the risk of misfolding in this situation and show that catalysis of folding by peptidyl-prolyl isomerases is sufficient to eliminate this hazard.
Collapse
|
21
|
Elfin: An algorithm for the computational design of custom three-dimensional structures from modular repeat protein building blocks. J Struct Biol 2018; 201:100-107. [DOI: 10.1016/j.jsb.2017.09.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2017] [Revised: 08/11/2017] [Accepted: 09/02/2017] [Indexed: 11/17/2022]
|
22
|
Roche DB, Viet PD, Bakulina A, Hirsh L, Tosatto SCE, Kajava AV. Classification of β-hairpin repeat proteins. J Struct Biol 2017; 201:130-138. [PMID: 29017817 DOI: 10.1016/j.jsb.2017.10.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Revised: 10/02/2017] [Accepted: 10/04/2017] [Indexed: 12/11/2022]
Abstract
In recent years, a number of new protein structures that possess tandem repeats have emerged. Many of these proteins are comprised of tandem arrays of β-hairpins. Today, the amount and variety of the data on these β-hairpin repeat (BHR) structures have reached a level that requires detailed analysis and further classification. In this paper, we classified the BHR proteins, compared structures, sequences of repeat motifs, functions and distribution across the major taxonomic kingdoms of life and within organisms. As a result, we identified six different BHR folds in tandem repeat proteins of Class III (elongated structures) and one BHR fold (up-and-down β-barrel) in Class IV ("closed" structures). Our survey reveals the high incidence of the BHR proteins among bacteria and viruses and their possible relationship to the structures of amyloid fibrils. It indicates that BHR folds will be an attractive target for future structural studies, especially in the context of age-related amyloidosis and emerging infectious diseases. This work allowed us to update the RepeatsDB database, which contains annotated tandem repeat protein structures and to construct sequence profiles based on BHR structural alignments.
Collapse
Affiliation(s)
- Daniel B Roche
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, Montpellier 34293, France; Institut de Biologie Computationnelle, Montpellier, France
| | - Phuong Do Viet
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, Montpellier 34293, France; Institut de Biologie Computationnelle, Montpellier, France
| | - Anastasia Bakulina
- Novosibirsk State University, Pirogova str. 1, Novosibirsk 630090, Russia; State Research Center of Virology and Biotechnology VECTOR, Koltsovo, Russia
| | - Layla Hirsh
- Department of Biomedical Sciences, University of Padova, I-35121 Padova, Italy; Engineering Department, Pontifical Catholic University of Peru, Lima 32, Peru
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, I-35121 Padova, Italy
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, Montpellier 34293, France; Institut de Biologie Computationnelle, Montpellier, France.
| |
Collapse
|