1
|
Liu J, Maxwell M, Cuddihy T, Crawford T, Bassetti M, Hyde C, Peigneur S, Tytgat J, Undheim EAB, Mobli M. ScrepYard: An online resource for disulfide-stabilized tandem repeat peptides. Protein Sci 2023; 32:e4566. [PMID: 36644825 PMCID: PMC9885460 DOI: 10.1002/pro.4566] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 01/05/2023] [Accepted: 01/12/2023] [Indexed: 01/17/2023]
Abstract
Receptor avidity through multivalency is a highly sought-after property of ligands. While readily available in nature in the form of bivalent antibodies, this property remains challenging to engineer in synthetic molecules. The discovery of several bivalent venom peptides containing two homologous and independently folded domains (in a tandem repeat arrangement) has provided a unique opportunity to better understand the underpinning design of multivalency in multimeric biomolecules, as well as how naturally occurring multivalent ligands can be identified. In previous work, we classified these molecules as a larger class termed secreted cysteine-rich repeat-proteins (SCREPs). Here, we present an online resource; ScrepYard, designed to assist researchers in identification of SCREP sequences of interest and to aid in characterizing this emerging class of biomolecules. Analysis of sequences within the ScrepYard reveals that two-domain tandem repeats constitute the most abundant SCREP domain architecture, while the interdomain "linker" regions connecting the functional domains are found to be abundant in amino acids with short or polar sidechains and contain an unusually high abundance of proline residues. Finally, we demonstrate the utility of ScrepYard as a virtual screening tool for discovery of putatively multivalent peptides, by using it as a resource to identify a previously uncharacterized serine protease inhibitor and confirm its predicted activity using an enzyme assay.
Collapse
Affiliation(s)
- Junyu Liu
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia
| | - Michael Maxwell
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia
| | - Thom Cuddihy
- Queensland Cyber Infrastructure Foundation Ltd.The University of QueenslandSt. LuciaQueenslandAustralia,Centre for Clinical ResearchThe University of QueenslandSt. LuciaQueenslandAustralia
| | - Theo Crawford
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia
| | - Madeline Bassetti
- Queensland Cyber Infrastructure Foundation Ltd.The University of QueenslandSt. LuciaQueenslandAustralia
| | - Cameron Hyde
- Queensland Cyber Infrastructure Foundation Ltd.The University of QueenslandSt. LuciaQueenslandAustralia,University of the Sunshine CoastMaroochydoreQueenslandAustralia
| | - Steve Peigneur
- Toxicology and PharmacologyUniversity of Leuven (KU Leuven)LeuvenBelgium
| | - Jan Tytgat
- Toxicology and PharmacologyUniversity of Leuven (KU Leuven)LeuvenBelgium
| | - Eivind A. B. Undheim
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia,Centre for Ecological and Evolutionary Synthesis, Department of BiosciencesUniversity of OsloOsloNorway
| | - Mehdi Mobli
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia
| |
Collapse
|
2
|
Chakrabarty B, Parekh N. DbStRiPs: Database of structural repeats in proteins. Protein Sci 2022; 31:23-36. [PMID: 33641184 PMCID: PMC8740836 DOI: 10.1002/pro.4052] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 02/11/2021] [Accepted: 02/15/2021] [Indexed: 01/03/2023]
Abstract
Recent interest in repeat proteins has arisen due to stable structural folds, high evolutionary conservation and repertoire of functions provided by these proteins. However, repeat proteins are poorly characterized because of high sequence variation between repeating units and structure-based identification and classification of repeats is desirable. Using a robust network-based pipeline, manual curation and Kajava's structure-based classification schema, we have developed a database of tandem structural repeats, Database of Structural Repeats in Proteins (DbStRiPs). A unique feature of this database is that available knowledge on sequence repeat families is incorporated by mapping Pfam classification scheme onto structural classification. Integration of sequence and structure-based classifications help in identifying different functional groups within the same structural subclass, leading to refinement in the annotation of repeat proteins. Analysis of complete Protein Data Bank revealed 16,472 repeat annotations in 15,141 protein chains, one previously uncharacterized novel protein repeat family (PRF), named left-handed beta helix, and 33 protein repeat clusters (PRCs). Based on their unique structural motif, ~79% of these repeat proteins are classified in one of the 14 PRFs or 33 PRCs, and the remaining are grouped as unclassified repeat proteins. Each repeat protein is provided with a detailed annotation in DbStRiPs that includes start and end boundaries of repeating units, copy number, secondary and tertiary structure view, repeat class/subclass, disease association, MSA of repeating units and cross-references to various protein pattern databases, human protein atlas and interaction resources. DbStRiPs provides easy search and download options to high-quality annotations of structural repeat proteins (URL: http://bioinf.iiit.ac.in/dbstrips/).
Collapse
Affiliation(s)
- Broto Chakrabarty
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information TechnologyHyderabadIndia
| | - Nita Parekh
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information TechnologyHyderabadIndia
| |
Collapse
|
3
|
Deryusheva EI, Machulin AV, Galzitskaya OV. Structural, Functional, and Evolutionary Characteristics of Proteins with Repeats. Mol Biol 2021. [DOI: 10.1134/s0026893321040038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
4
|
Mary Rajathei D, Parthasarathy S, Selvaraj S. HPREP: a comprehensive database for human proteome repeats. J Integr Bioinform 2020; 0:/j/jib.ahead-of-print/jib-2020-0024/jib-2020-0024.xml. [PMID: 33136065 DOI: 10.1515/jib-2020-0024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Accepted: 09/17/2020] [Indexed: 11/15/2022] Open
Abstract
Amino acid repeats are found to play important roles in both structures and functions of the proteins. These are commonly found in all kingdoms of life, especially in eukaryotes and a larger fraction of human proteins composed of repeats. Further, the abnormal expansions of shorter repeats cause various diseases to humans. Therefore, the analysis of repeats of the entire human proteome along with functional, mutational and disease information would help to better understand their roles in proteins. To fulfill this need, we developed a web database HPREP (http://bioinfo.bdu.ac.in/hprep) for human proteome repeats using Perl and HTML programming. We identified different categories of well-characterized repeats and domain repeats that are present in the human proteome of UniProtKB/Swiss-Prot by using in-house Perl programming and novel repeats by using the repeat detection T-REKS tool as well as XSTREAM web server. Further, these proteins are annotated with functional, mutational and disease information and grouped according to specific repeat types. The developed database enables the users to search by specific repeat type in order to understand their involvement in proteins. Thus, the HPREP database is expected to be a useful resource to gain better insight regarding the different repeats in human proteome and their biological roles.
Collapse
Affiliation(s)
- David Mary Rajathei
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli 620 024, India
| | - Subbiah Parthasarathy
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli 620 024, India
| | - Samuel Selvaraj
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli 620 024, India
| |
Collapse
|
5
|
Banach M, Konieczny L, Roterman I. Why do antifreeze proteins require a solenoid? Biochimie 2017; 144:74-84. [PMID: 29054801 DOI: 10.1016/j.biochi.2017.10.011] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Accepted: 10/12/2017] [Indexed: 12/21/2022]
Abstract
Proteins whose presence prevents water from freezing in living organisms at temperatures below 0 °C are referred to as antifreeze proteins. This group includes molecules of varying size (from 30 to over 300 aa) and variable secondary/supersecondary conformation. Some of these proteins also contain peculiar structural motifs called solenoids. We have applied the fuzzy oil drop model in the analysis of four categories of antifreeze proteins: 1 - very small proteins, i.e. helical peptides (below 40 aa); 2 - small globular proteins (40-100 aa); 3 - large globular proteins (>100 aa) and 4 - proteins containing solenoids. The FOD model suggests a mechanism by which antifreeze proteins prevent freezing. In accordance with this theory, the presence of the protein itself produces an ordering of water molecules which counteracts the formation of ice crystals. This conclusion is supported by analysis of the ordering of hydrophobic and hydrophilic residues in antifreeze proteins, revealing significant variability - from perfect adherence to the fuzzy oil drop model through structures which lack a clearly defined hydrophobic core, all the way to linear arrangement of alternating local minima and maxima propagating along the principal axis of the solenoid (much like in amyloids). The presented model - alternative with respect to the ice docking model - explains the antifreeze properties of compounds such as saccharides and fatty acids. The fuzzy oil drop model also enables differentiation between amyloids and antifreeze proteins.
Collapse
Affiliation(s)
- M Banach
- Department of Bioinformatics and Telemedicine, Jagiellonian University, Medical College, Lazarza 16, 31-530, Krakow, Poland
| | - L Konieczny
- Chair of Medical Biochemistry, Jagiellonian University, Medical College, Kopernika 7, 31-034, Krakow, Poland
| | - I Roterman
- Department of Bioinformatics and Telemedicine, Jagiellonian University, Medical College, Lazarza 16, 31-530, Krakow, Poland.
| |
Collapse
|
6
|
Abstract
Repeats are ubiquitous elements of proteins and they play important roles for cellular function and during evolution. Repeats are, however, also notoriously difficult to capture computationally and large scale studies so far had difficulties in linking genetic causes, structural properties and evolutionary trajectories of protein repeats. Here we apply recently developed methods for repeat detection and analysis to a large dataset comprising over hundred metazoan genomes. We find that repeats in larger protein families experience generally very few insertions or deletions (indels) of repeat units but there is also a significant fraction of noteworthy volatile outliers with very high indel rates. Analysis of structural data indicates that repeats with an open structure and independently folding units are more volatile and more likely to be intrinsically disordered. Such disordered repeats are also significantly enriched in sites with a high functional potential such as linear motifs. Furthermore, the most volatile repeats have a high sequence similarity between their units. Since many volatile repeats also show signs of recombination, we conclude they are often shaped by concerted evolution. Intriguingly, many of these conserved yet volatile repeats are involved in host-pathogen interactions where they might foster fast but subtle adaptation in biological arms races. KEY WORDS: protein evolution, domain rearrangements, protein repeats, concerted evolution.
Collapse
Affiliation(s)
- Andreas Schüler
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University, Huefferstrasse 1, Muenster, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University, Huefferstrasse 1, Muenster, Germany
| |
Collapse
|
7
|
Pellegrini M. Tandem Repeats in Proteins: Prediction Algorithms and Biological Role. Front Bioeng Biotechnol 2015; 3:143. [PMID: 26442257 PMCID: PMC4585158 DOI: 10.3389/fbioe.2015.00143] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 09/07/2015] [Indexed: 12/30/2022] Open
Abstract
Tandem repetitions in protein sequence and structure is a fascinating subject of research which has been a focus of study since the late 1990s. In this survey, we give an overview on the multi-faceted aspects of research on protein tandem repeats (PTR for short), including prediction algorithms, databases, early classification efforts, mechanisms of PTR formation and evolution, and synthetic PTR design. We also touch on the rather open issue of the relationship between PTR and flexibility (or disorder) in proteins. Detection of PTR either from protein sequence or structure data is challenging due to inherent high (biological) signal-to-noise ratio that is a key feature of this problem. As early in silico analytic tools have been key enablers for starting this field of study, we expect that current and future algorithmic and statistical breakthroughs will have a high impact on the investigations of the biological role of PTR.
Collapse
Affiliation(s)
- Marco Pellegrini
- Laboratory for Integrative Systems Medicine (LISM), Istituto di Informatica e Telematica, and Istituto di Fisiologia Clinica, Consiglio Nazionale delle Ricerche , Pisa , Italy
| |
Collapse
|
8
|
Parmeggiani F, Huang PS, Vorobiev S, Xiao R, Park K, Caprari S, Su M, Seetharaman J, Mao L, Janjua H, Montelione GT, Hunt J, Baker D. A general computational approach for repeat protein design. J Mol Biol 2014; 427:563-75. [PMID: 25451037 PMCID: PMC4303030 DOI: 10.1016/j.jmb.2014.11.005] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2014] [Revised: 10/08/2014] [Accepted: 11/07/2014] [Indexed: 01/12/2023]
Abstract
Repeat proteins have considerable potential for use as modular binding reagents or biomaterials in biomedical and nanotechnology applications. Here we describe a general computational method for building idealized repeats that integrates available family sequences and structural information with Rosetta de novo protein design calculations. Idealized designs from six different repeat families were generated and experimentally characterized; 80% of the proteins were expressed and soluble and more than 40% were folded and monomeric with high thermal stability. Crystal structures determined for members of three families are within 1Å root-mean-square deviation to the design models. The method provides a general approach for fast and reliable generation of stable modular repeat protein scaffolds.
Collapse
Affiliation(s)
- Fabio Parmeggiani
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA; Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Po-Ssu Huang
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA; Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Sergey Vorobiev
- Department of Biological Sciences, Northeast Structural Genomics Consortium, Columbia University, New York, NY 10027, USA
| | - Rong Xiao
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry and Department of Biochemistry, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Northeast Structural Genomics Consortium, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Keunwan Park
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA; Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Silvia Caprari
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | - Min Su
- Department of Biological Sciences, Northeast Structural Genomics Consortium, Columbia University, New York, NY 10027, USA
| | - Jayaraman Seetharaman
- Department of Biological Sciences, Northeast Structural Genomics Consortium, Columbia University, New York, NY 10027, USA
| | - Lei Mao
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry and Department of Biochemistry, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Northeast Structural Genomics Consortium, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Haleema Janjua
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry and Department of Biochemistry, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Northeast Structural Genomics Consortium, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry and Department of Biochemistry, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Northeast Structural Genomics Consortium, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - John Hunt
- Department of Biological Sciences, Northeast Structural Genomics Consortium, Columbia University, New York, NY 10027, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA; Institute for Protein Design, University of Washington, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
9
|
Di Domenico T, Potenza E, Walsh I, Parra RG, Giollo M, Minervini G, Piovesan D, Ihsan A, Ferrari C, Kajava AV, Tosatto SCE. RepeatsDB: a database of tandem repeat protein structures. Nucleic Acids Res 2013; 42:D352-7. [PMID: 24311564 PMCID: PMC3964956 DOI: 10.1093/nar/gkt1175] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
RepeatsDB (http://repeatsdb.bio.unipd.it/) is a database of annotated tandem repeat protein structures. Tandem repeats pose a difficult problem for the analysis of protein structures, as the underlying sequence can be highly degenerate. Several repeat types haven been studied over the years, but their annotation was done in a case-by-case basis, thus making large-scale analysis difficult. We developed RepeatsDB to fill this gap. Using state-of-the-art repeat detection methods and manual curation, we systematically annotated the Protein Data Bank, predicting 10 745 repeat structures. In all, 2797 structures were classified according to a recently proposed classification schema, which was expanded to accommodate new findings. In addition, detailed annotations were performed in a subset of 321 proteins. These annotations feature information on start and end positions for the repeat regions and units. RepeatsDB is an ongoing effort to systematically classify and annotate structural protein repeats in a consistent way. It provides users with the possibility to access and download high-quality datasets either interactively or programmatically through web services.
Collapse
Affiliation(s)
- Tomás Di Domenico
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy, Department of Biological Chemistry, Universidad de Buenos Aires, Buenos Aires C1428EGA, Argentina, Department of Information Engineering, University of Padua, 35121 Padova, Italy, Department of Biosciences, COMSATS Institute of Information Technology, Sahiwal, Pakistan, Centre de Recherches de Biochimie Macromoléculaire, CNRS, 34293 Montpellier Cedex 5, France and Institut de Biologie Computationnelle, 34293 Montpellier Cedex 5, France
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Kubrycht J, Sigler K, Souček P, Hudeček J. Structures composing protein domains. Biochimie 2013; 95:1511-24. [DOI: 10.1016/j.biochi.2013.04.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Accepted: 04/02/2013] [Indexed: 12/21/2022]
|
11
|
Bianco AM, Marcuzzi A, Zanin V, Girardelli M, Vuch J, Crovella S. Database tools in genetic diseases research. Genomics 2013; 101:75-85. [DOI: 10.1016/j.ygeno.2012.11.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Revised: 10/26/2012] [Accepted: 11/01/2012] [Indexed: 01/22/2023]
|