51
|
Woolfson DN. A Brief History of De Novo Protein Design: Minimal, Rational, and Computational. J Mol Biol 2021; 433:167160. [PMID: 34298061 DOI: 10.1016/j.jmb.2021.167160] [Citation(s) in RCA: 65] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Revised: 07/07/2021] [Accepted: 07/12/2021] [Indexed: 12/26/2022]
Abstract
Protein design has come of age, but how will it mature? In the 1980s and the 1990s, the primary motivation for de novo protein design was to test our understanding of the informational aspect of the protein-folding problem; i.e., how does protein sequence determine protein structure and function? This necessitated minimal and rational design approaches whereby the placement of each residue in a design was reasoned using chemical principles and/or biochemical knowledge. At that time, though with some notable exceptions, the use of computers to aid design was not widespread. Over the past two decades, the tables have turned and computational protein design is firmly established. Here, I illustrate this progress through a timeline of de novo protein structures that have been solved to atomic resolution and deposited in the Protein Data Bank. From this, it is clear that the impact of rational and computational design has been considerable: More-complex and more-sophisticated designs are being targeted with many being resolved to atomic resolution. Furthermore, our ability to generate and manipulate synthetic proteins has advanced to a point where they are providing realistic alternatives to natural protein functions for applications both in vitro and in cells. Also, and increasingly, computational protein design is becoming accessible to non-specialists. This all begs the questions: Is there still a place for minimal and rational design approaches? And, what challenges lie ahead for the burgeoning field of de novo protein design as a whole?
Collapse
Affiliation(s)
- Derek N Woolfson
- School of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, UK; School of Biochemistry, University of Bristol, Biomedical Sciences Building, University Walk, Bristol BS8 1TD, UK; Bristol BioDesign Institute, University of Bristol, Life Sciences Building, Tyndall Avenue, Bristol BS8 1TQ, UK.
| |
Collapse
|
52
|
Izert MA, Szybowska PE, Górna MW, Merski M. The Effect of Mutations in the TPR and Ankyrin Families of Alpha Solenoid Repeat Proteins. FRONTIERS IN BIOINFORMATICS 2021; 1:696368. [PMID: 36303725 PMCID: PMC9581033 DOI: 10.3389/fbinf.2021.696368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 06/22/2021] [Indexed: 11/20/2022] Open
Abstract
Protein repeats are short, highly similar peptide motifs that occur several times within a single protein, for example the TPR and Ankyrin repeats. Understanding the role of mutation in these proteins is complicated by the competing facts that 1) the repeats are much more restricted to a set sequence than non-repeat proteins, so mutations should be harmful much more often because there are more residues that are heavily restricted due to the need of the sequence to repeat and 2) the symmetry of the repeats in allows the distribution of functional contributions over a number of residues so that sometimes no specific site is singularly responsible for function (unlike enzymatic active site catalytic residues). To address this issue, we review the effects of mutations in a number of natural repeat proteins from the tetratricopeptide and Ankyrin repeat families. We find that mutations are context dependent. Some mutations are indeed highly disruptive to the function of the protein repeats while mutations in identical positions in other repeats in the same protein have little to no effect on structure or function.
Collapse
Affiliation(s)
| | | | | | - Matthew Merski
- *Correspondence: Maria Wiktoria Górna, ; Matthew Merski,
| |
Collapse
|
53
|
Rudenko V, Korotkov E. Search for Highly Divergent Tandem Repeats in Amino Acid Sequences. Int J Mol Sci 2021; 22:ijms22137096. [PMID: 34281150 PMCID: PMC8269118 DOI: 10.3390/ijms22137096] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 06/25/2021] [Accepted: 06/28/2021] [Indexed: 11/29/2022] Open
Abstract
We report a Method to Search for Highly Divergent Tandem Repeats (MSHDTR) in protein sequences which considers pairwise correlations between adjacent residues. MSHDTR was compared with some previously developed methods for searching for tandem repeats (TRs) in amino acid sequences, such as T-REKS and XSTREAM, which focus on the identification of TRs with significant sequence similarity, whereas MSHDTR detects repeats that significantly diverged during evolution, accumulating deletions, insertions, and substitutions. The application of MSHDTR to a search of the Swiss-Prot databank revealed over 15 thousand TR-containing amino acid sequences that were difficult to find using the other methods. Among the detected TRs, the most representative were those with consensus lengths of two and seven residues; these TRs were subjected to cluster analysis and the classes of patterns were identified. All TRs detected in this study have been combined into a databank accessible over the WWW.
Collapse
Affiliation(s)
- Valentina Rudenko
- Center of Bioengineering Research Center of Biotechnology RAS, 119071 Moscow, Russia;
- Correspondence: ; Tel.: +7-926-7248271
| | - Eugene Korotkov
- Center of Bioengineering Research Center of Biotechnology RAS, 119071 Moscow, Russia;
- Moscow Engineering Physics Institute, National Research Nuclear University MEPhI, 115409 Moscow, Russia
| |
Collapse
|
54
|
Delucchi M, Näf P, Bliven S, Anisimova M. TRAL 2.0: Tandem Repeat Detection With Circular Profile Hidden Markov Models and Evolutionary Aligner. FRONTIERS IN BIOINFORMATICS 2021; 1:691865. [PMID: 36303789 PMCID: PMC9581039 DOI: 10.3389/fbinf.2021.691865] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 06/11/2021] [Indexed: 11/13/2022] Open
Abstract
The Tandem Repeat Annotation Library (TRAL) focuses on analyzing tandem repeat units in genomic sequences. TRAL can integrate and harmonize tandem repeat annotations from a large number of external tools, and provides a statistical model for evaluating and filtering the detected repeats. TRAL version 2.0 includes new features such as a module for identifying repeats from circular profile hidden Markov models, a new repeat alignment method based on the progressive Poisson Indel Process, an improved installation procedure and a docker container. TRAL is an open-source Python 3 library and is available, together with documentation and tutorials viavital-it.ch/software/tral.
Collapse
Affiliation(s)
- Matteo Delucchi
- Institute of Applied Simulations, School of Life Sciences und Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Paulina Näf
- Institute of Applied Simulations, School of Life Sciences und Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Spencer Bliven
- Institute of Applied Simulations, School of Life Sciences und Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Laboratory for Scientific Computing and Modelling, Paul Scherrer Institute, Villigen PSI, Villigen, Switzerland
| | - Maria Anisimova
- Institute of Applied Simulations, School of Life Sciences und Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- *Correspondence: Maria Anisimova,
| |
Collapse
|
55
|
Barik S. An Analytical Review of the Structural Features of Pentatricopeptide Repeats: Strategic Amino Acids, Repeat Arrangements and Superhelical Architecture. Int J Mol Sci 2021; 22:ijms22105407. [PMID: 34065603 PMCID: PMC8160929 DOI: 10.3390/ijms22105407] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 05/18/2021] [Accepted: 05/19/2021] [Indexed: 12/27/2022] Open
Abstract
Tricopeptide repeats are common in natural proteins, and are exemplified by 34- and 35-residue repeats, known respectively as tetratricopeptide repeats (TPRs) and pentatricopeptide repeats (PPRs). In both classes, each repeat unit forms an antiparallel bihelical structure, so that multiple such units in a polypeptide are arranged in a parallel fashion. The primary structures of the motifs are nonidentical, but amino acids of similar properties occur in strategic positions. The focus of the present work was on PPR, but TPR, its better-studied cousin, is often included for comparison. The analyses revealed that critical amino acids, namely Gly, Pro, Ala and Trp, were placed at distinct locations in the higher order structure of PPR domains. While most TPRs occur in repeats of three, the PPRs exhibited a much greater diversity in repeat numbers, from 1 to 30 or more, separated by spacers of various sequences and lengths. Studies of PPR strings in proteins showed that the majority of PPR units are single, and that the longer tandems (i.e., without space in between) occurred in decreasing order. The multi-PPR domains also formed superhelical vortices, likely governed by interhelical angles rather than the spacers. These findings should be useful in designing and understanding the PPR domains.
Collapse
Affiliation(s)
- Sailen Barik
- EonBio, 3780 Pelham Drive, Mobile, AL 36619, USA
| |
Collapse
|
56
|
Accurate contact-based modelling of repeat proteins predicts the structure of new repeats protein families. PLoS Comput Biol 2021; 17:e1008798. [PMID: 33857128 PMCID: PMC8078820 DOI: 10.1371/journal.pcbi.1008798] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 04/27/2021] [Accepted: 02/15/2021] [Indexed: 12/18/2022] Open
Abstract
Repeat proteins are abundant in eukaryotic proteomes. They are involved in many eukaryotic specific functions, including signalling. For many of these proteins, the structure is not known, as they are difficult to crystallise. Today, using direct coupling analysis and deep learning it is often possible to predict a protein’s structure. However, the unique sequence features present in repeat proteins have been a challenge to use direct coupling analysis for predicting contacts. Here, we show that deep learning-based methods (trRosetta, DeepMetaPsicov (DMP) and PconsC4) overcomes this problem and can predict intra- and inter-unit contacts in repeat proteins. In a benchmark dataset of 815 repeat proteins, about 90% can be correctly modelled. Further, among 48 PFAM families lacking a protein structure, we produce models of forty-one families with estimated high accuracy. Repeat proteins are widespread among organisms and particularly abundant in eukaryotic proteomes. Their primary sequence presents repetition in the amino acid sequences that origin structures with repeated folds/domains. Although the repeated units often can be recognised from the sequence alone, often structural information is missing. Here, we used contact prediction for predicting the structure of repeats protein directly from their primary sequences. We benchmark the methods on a dataset comprehensive of all the known repeated structures. We evaluate the contact predictions and the obtained models for different classes of repeat proteins. Further, we develop and benchmark a quality assessment (QA) method specific for repeat proteins. Finally, we used the prediction pipeline for all PFAM repeat families without resolved structures and found that forty-one of them could be modelled with high accuracy.
Collapse
|
57
|
A Leucine-Rich Repeat Protein Provides a SHOC2 the RAS Circuit: a Structure-Function Perspective. Mol Cell Biol 2021; 41:MCB.00627-20. [PMID: 33526449 DOI: 10.1128/mcb.00627-20] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
SHOC2 is a prototypical leucine-rich repeat protein that promotes downstream receptor tyrosine kinase (RTK)/RAS signaling and plays important roles in several cellular and developmental processes. Gain-of-function germ line mutations of SHOC2 drive the RASopathy Noonan-like syndrome, and SHOC2 mediates adaptive resistance to mitogen-activated protein kinase (MAPK) inhibitors. Similar to many scaffolding proteins, SHOC2 facilitates signal transduction by enabling proximal protein interactions and regulating the subcellular localization of its binding partners. Here, we review the structural features of SHOC2 that mediate its known functions, discuss these elements in the context of various binding partners and signaling pathways, and highlight areas of SHOC2 biology where a consensus view has not yet emerged.
Collapse
|
58
|
Gidley F, Parmeggiani F. Repeat proteins: designing new shapes and functions for solenoid folds. Curr Opin Struct Biol 2021; 68:208-214. [PMID: 33721772 DOI: 10.1016/j.sbi.2021.02.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 01/31/2021] [Accepted: 02/01/2021] [Indexed: 10/21/2022]
Abstract
The modular nature of repeat proteins has inspired the design of regular and completely novel sequences and structures. Research in the past years has provided a broad set of design approaches and new repeat proteins that have found applications in molecular recognition, taking advantage of the natural ability of some of these families to bind proteins, peptides and nucleic acids. Here, we provide an overview on the recent trends in design of repeat proteins, particularly solenoid folds, and their applications. By exploiting the intrinsic modularity of repeats, new architectures have been designed that combine different types of repeat, are easily scalable by changing the number of repeats and can be quickly generated by using existing modular building blocks.
Collapse
Affiliation(s)
- Frances Gidley
- School of Chemistry, School of Biochemistry, Bristol Biodesign Institute, University of Bristol, United Kingdom
| | - Fabio Parmeggiani
- School of Chemistry, School of Biochemistry, Bristol Biodesign Institute, University of Bristol, United Kingdom.
| |
Collapse
|
59
|
Wen Y, He MQ, Yu YL, Wang JH. Biomolecule-mediated chiral nanostructures: a review of chiral mechanism and application. Adv Colloid Interface Sci 2021; 289:102376. [PMID: 33561566 DOI: 10.1016/j.cis.2021.102376] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Revised: 01/18/2021] [Accepted: 01/27/2021] [Indexed: 12/30/2022]
Abstract
The chirality of biomolecules is vital importance in biosensing and biomedicine. However, most biomolecules only have a chiral response in the ultraviolet region, and the corresponding chiral signal is weak. In recent years, inorganic nanomaterials can adjust chiral light signals to the visible and near-infrared regions and enhance optical signals due to their high polarizability and adjustable morphology-dependent optical properties. Nonetheless, inorganic nanomaterials usually lack specificity to identify targets, and have strong toxicity when applied in organisms. The combination of chiral biomolecules and inorganic nanomaterials offers a way to solve these problems. Because chiral biomolecules, such as DNA, amino acids, and peptides, have programmability, specific recognition, excellent biocompatibility, and strong binding force to inorganic nanomaterials. Biomolecule-mediated chiral nanostructures show specific recognition of targets, extremely low biological toxicity and adjustable optical activity by regulating, assembling and inducing inorganic nanomaterials. Therefore, biomolecule-mediated chiral nanostructures have received widespread attention, including chiral biosensing, enantiomers recognition and separation, biological diagnosis and treatment, chiral catalysis, and circular polarization of chiral metamaterials. This review mainly introduces the three chiral mechanisms of biomolecule-mediated chiral nanostructures, lists some important applications at present, and discusses the development prospects of biomolecule-mediated chiral nanostructures.
Collapse
|
60
|
Bürgi J, Ekal L, Wilmanns M. Versatile allosteric properties in Pex5-like tetratricopeptide repeat proteins to induce diverse downstream function. Traffic 2021; 22:140-152. [PMID: 33580581 DOI: 10.1111/tra.12785] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Revised: 01/30/2021] [Accepted: 02/10/2021] [Indexed: 01/11/2023]
Abstract
Proteins composed of tetratricopeptide repeat (TPR) arrays belong to the α-solenoid tandem-repeat family that have unique properties in terms of their overall conformational flexibility and ability to bind to multiple protein ligands. The peroxisomal matrix protein import receptor Pex5 comprises two TPR triplets that recognize protein cargos with a specific C-terminal Peroxisomal Targeting Signal (PTS) 1 motif. Import of PTS1-containing protein cargos into peroxisomes through a transient pore is mainly driven by allosteric binding, coupling and release mechanisms, without a need for external energy. A very similar TPR architecture is found in the functionally unrelated TRIP8b, a regulator of the hyperpolarization-activated cyclic nucleotide-gated (HCN) ion channel. TRIP8b binds to the HCN ion channel via a C-terminal sequence motif that is nearly identical to the PTS1 motif of Pex5 receptor cargos. Pex5, Pex5-related Pex9, and TRIP8b also share a less conserved N-terminal domain. This domain provides a second protein cargo-binding site and plays a distinct role in allosteric coupling of initial cargo loading by PTS1 motif-mediated interactions and different downstream functional readouts. The data reviewed here highlight the overarching role of molecular allostery in driving the diverse functions of TPR array proteins, which could form a model for other α-solenoid tandem-repeat proteins involved in translocation processes across membranes.
Collapse
Affiliation(s)
- Jérôme Bürgi
- European Molecular Biology Laboratory, Hamburg Unit, Hamburg, Germany
| | - Lakhan Ekal
- European Molecular Biology Laboratory, Hamburg Unit, Hamburg, Germany
| | - Matthias Wilmanns
- European Molecular Biology Laboratory, Hamburg Unit, Hamburg, Germany.,University Hamburg Clinical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
61
|
Kamel M, Kastano K, Mier P, Andrade-Navarro MA. REP2: A Web Server to Detect Common Tandem Repeats in Protein Sequences. J Mol Biol 2021; 433:166895. [PMID: 33972020 DOI: 10.1016/j.jmb.2021.166895] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 02/01/2021] [Accepted: 02/21/2021] [Indexed: 12/13/2022]
Abstract
Ensembles of tandem repeats (TRs) in protein sequences expand rapidly to form domains well suited for interactions with proteins. For this reason, they are relatively frequent. Some TRs have known structures and therefore it is advantageous to predict their presence in a protein sequence. However, since most TRs diverge quickly, their detection by classical sequence comparison algorithms is not very accurate. Previously, we developed a method and a web server that used curated profiles and thresholds for the detection of 11 common TRs. Here we present a new web server (REP2) that allows the analysis of TRs in both individual and aligned sequences. We provide currently precomputed analyses for a selection of 78 UniProt reference proteomes. We illustrate how these data can be used to study the evolution of TRs using comparative genomics. REP2 can be accessed at http://cbdm-01.zdv.uni-mainz.de/~munoz/rep/.
Collapse
Affiliation(s)
- Mohamed Kamel
- Department of Computer Science, Faculty of Mathematics and Informatics, University of M'sila, 28000 M'sila, Algeria; Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | - Kristina Kastano
- Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | | |
Collapse
|
62
|
Abstract
Cooperativity is a hallmark of protein folding, but the thermodynamic origins of cooperativity are difficult to quantify. Tandem repeat proteins provide a unique experimental system to quantify cooperativity due to their internal symmetry and their tolerance of deletion, extension, and in some cases fragmentation into single repeats. Analysis of repeat proteins of different lengths with nearest-neighbor Ising models provides values for repeat folding ([Formula: see text]) and inter-repeat coupling (ΔGi-1,i). In this article, we review the architecture of repeat proteins and classify them in terms of ΔGi and ΔGi-1,i; this classification scheme groups repeat proteins according to their degree of cooperativity. We then present various statistical thermodynamic models, based on the 1D-Ising model, for analysis of different classes of repeat proteins. We use these models to analyze data for highly and moderately cooperative and noncooperative repeat proteins and relate their fitted parameters to overall structural features.
Collapse
Affiliation(s)
- Mark Petersen
- Program in Molecular Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, USA.,T.C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, USA;
| | - Doug Barrick
- T.C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, USA;
| |
Collapse
|
63
|
Kim CS, Brown AM, Grove TZ, Etzkorn FA. Designed leucine-rich repeat proteins bind two muramyl dipeptide ligands. Protein Sci 2021; 30:804-817. [PMID: 33512005 DOI: 10.1002/pro.4031] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 01/22/2021] [Accepted: 01/22/2021] [Indexed: 12/15/2022]
Abstract
Designed protein receptors hold diagnostic and therapeutic promise. We now report the design of five consensus leucine-rich repeat proteins (CLRR4-8) based on the LRR domain of nucleotide-binding oligomerization domain (NOD)-like receptors involved in the innate immune system. The CLRRs bind muramyl dipeptide (MDP), a bacterial cell wall component, with micromolar affinity. The overall Kd app values ranged from 1.0 to 57 μM as measured by fluorescence quenching experiments. Biphasic fluorescence quenching curves were observed in all CLRRs, with higher affinity Kd1 values ranging from 0.04 to 4.5 μM, and lower affinity Kd2 values ranging from 3.1 to 227 μM. These biphasic binding curves, along with the docking studies of MDP binding to CLRR4, suggest that at least two MDPs bind to each protein. Previously, only single MDP binding was reported. This high-capacity binding of MDP promises small, soluble, stable CLRR scaffolds as candidates for the future design of pathogen biosensors.
Collapse
Affiliation(s)
- Christina S Kim
- Department of Chemistry, Virginia Tech, Blacksburg, Virginia, USA
| | - Anne M Brown
- University Libraries, Virginia Tech, Blacksburg, Virginia, USA
| | - Tijana Z Grove
- Department of Chemistry, Virginia Tech, Blacksburg, Virginia, USA
| | | |
Collapse
|
64
|
Paladin L, Bevilacqua M, Errigo S, Piovesan D, Mičetić I, Necci M, Monzon AM, Fabre ML, Lopez JL, Nilsson JF, Rios J, Menna PL, Cabrera M, Buitron MG, Kulik MG, Fernandez-Alberti S, Fornasari MS, Parisi G, Lagares A, Hirsh L, Andrade-Navarro MA, Kajava AV, Tosatto SCE. RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures. Nucleic Acids Res 2021; 49:D452-D457. [PMID: 33237313 PMCID: PMC7778985 DOI: 10.1093/nar/gkaa1097] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/17/2020] [Accepted: 11/19/2020] [Indexed: 11/21/2022] Open
Abstract
The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.
Collapse
Affiliation(s)
- Lisanna Paladin
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Martina Bevilacqua
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Sara Errigo
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Ivan Mičetić
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Marco Necci
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | | | - Maria Laura Fabre
- IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
| | - Jose Luis Lopez
- IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
| | - Juliet F Nilsson
- IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
| | - Javier Rios
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Pablo Lorenzano Menna
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Maia Cabrera
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Martin Gonzalez Buitron
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Mariane Gonçalves Kulik
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Sebastian Fernandez-Alberti
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Maria Silvina Fornasari
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Gustavo Parisi
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Antonio Lagares
- IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
| | - Layla Hirsh
- Dept. of Engineering, Faculty of Science and Engineering, Pontifical Catholic University of Peru, Av. Universitaria 1801 San Miguel, Lima 32, Lima, Peru
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237, CNRS, Univ. Montpellier, Montpellier, France
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| |
Collapse
|
65
|
Wang F, Gnewou O, Modlin C, Beltran LC, Xu C, Su Z, Juneja P, Grigoryan G, Egelman EH, Conticello VP. Structural analysis of cross α-helical nanotubes provides insight into the designability of filamentous peptide nanomaterials. Nat Commun 2021; 12:407. [PMID: 33462223 PMCID: PMC7814010 DOI: 10.1038/s41467-020-20689-w] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 12/02/2020] [Indexed: 12/12/2022] Open
Abstract
The exquisite structure-function correlations observed in filamentous protein assemblies provide a paradigm for the design of synthetic peptide-based nanomaterials. However, the plasticity of quaternary structure in sequence-space and the lability of helical symmetry present significant challenges to the de novo design and structural analysis of such filaments. Here, we describe a rational approach to design self-assembling peptide nanotubes based on controlling lateral interactions between protofilaments having an unusual cross-α supramolecular architecture. Near-atomic resolution cryo-EM structural analysis of seven designed nanotubes provides insight into the designability of interfaces within these synthetic peptide assemblies and identifies a non-native structural interaction based on a pair of arginine residues. This arginine clasp motif can robustly mediate cohesive interactions between protofilaments within the cross-α nanotubes. The structure of the resultant assemblies can be controlled through the sequence and length of the peptide subunits, which generates synthetic peptide filaments of similar dimensions to flagella and pili.
Collapse
Affiliation(s)
- Fengbin Wang
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, 22908, USA
| | - Ordy Gnewou
- Department of Chemistry, Emory University, Atlanta, GA, 30322, USA
| | - Charles Modlin
- Department of Chemistry, Emory University, Atlanta, GA, 30322, USA
| | - Leticia C Beltran
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, 22908, USA
| | - Chunfu Xu
- Department of Chemistry, Emory University, Atlanta, GA, 30322, USA
| | - Zhangli Su
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, 22908, USA
| | - Puneet Juneja
- The Robert P. Apkarian Integrated Electron Microscopy Core (IEMC), Emory University, Atlanta, GA, 30322, USA
| | - Gevorg Grigoryan
- Department of Computer Science, Dartmouth College, Hanover, NH, 03755, USA.,Department of Biological Sciences, Dartmouth College, Hanover, NH, 03755, USA
| | - Edward H Egelman
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, 22908, USA
| | - Vincent P Conticello
- Department of Chemistry, Emory University, Atlanta, GA, 30322, USA. .,The Robert P. Apkarian Integrated Electron Microscopy Core (IEMC), Emory University, Atlanta, GA, 30322, USA.
| |
Collapse
|
66
|
Mier P, Andrade-Navarro MA. Assessing the low complexity of protein sequences via the low complexity triangle. PLoS One 2020; 15:e0239154. [PMID: 33378336 PMCID: PMC7773278 DOI: 10.1371/journal.pone.0239154] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Accepted: 08/31/2020] [Indexed: 11/24/2022] Open
Abstract
Background Proteins with low complexity regions (LCRs) have atypical sequence and structural features. Their amino acid composition varies from the expected, determined proteome-wise, and they do not follow the rules of structural folding that prevail in globular regions. One way to characterize these regions is by assessing the repeatability of a sequence, that is, calculating the local propensity of a region to be part of a repeat. Results We combine two local measures of low complexity, repeatability (using the RES algorithm) and fraction of the most frequent amino acid, to evaluate different proteomes, datasets of protein regions with specific features, and individual cases of proteins with extreme compositions. We apply a representation called ‘low complexity triangle’ as a proof-of-concept to represent the low complexity measured values. Results show that proteomes have distinct signatures in the low complexity triangle, and that these signatures are associated to complexity features of the sequences. We developed a web tool called LCT (http://cbdm-01.zdv.uni-mainz.de/~munoz/lct/) to allow users to calculate the low complexity triangle of a given protein or region of interest. Conclusions The low complexity triangle proves to be a suitable procedure to represent the general low complexity of a sequence or protein dataset. Homorepeats, direpeats, compositionally biased regions and globular regions occupy characteristic positions in the triangle. The described pipeline can be used to characterize LCRs and may help in quantifying the content of degenerated tandem repeats in proteins and proteomes.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Mainz, Germany
- * E-mail:
| | - Miguel A. Andrade-Navarro
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Mainz, Germany
| |
Collapse
|
67
|
Evolution of Protein Structure and Stability in Global Warming. Int J Mol Sci 2020; 21:ijms21249662. [PMID: 33352933 PMCID: PMC7767258 DOI: 10.3390/ijms21249662] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 12/15/2020] [Accepted: 12/16/2020] [Indexed: 12/12/2022] Open
Abstract
This review focuses on the molecular signatures of protein structures in relation to evolution and survival in global warming. It is based on the premise that the power of evolutionary selection may lead to thermotolerant organisms that will repopulate the planet and continue life in general, but perhaps with different kinds of flora and fauna. Our focus is on molecular mechanisms, whereby known examples of thermoresistance and their physicochemical characteristics were noted. A comparison of interactions of diverse residues in proteins from thermophilic and mesophilic organisms, as well as reverse genetic studies, revealed a set of imprecise molecular signatures that pointed to major roles of hydrophobicity, solvent accessibility, disulfide bonds, hydrogen bonds, ionic and π-electron interactions, and an overall condensed packing of the higher-order structure, especially in the hydrophobic regions. Regardless of mutations, specialized protein chaperones may play a cardinal role. In evolutionary terms, thermoresistance to global warming will likely occur in stepwise mutational changes, conforming to the molecular signatures, such that each "intermediate" fits a temporary niche through punctuated equilibrium, while maintaining protein functionality. Finally, the population response of different species to global warming may vary substantially, and, as such, some may evolve while others will undergo catastrophic mass extinction.
Collapse
|
68
|
Sajko S, Grishkovskaya I, Kostan J, Graewert M, Setiawan K, Trübestein L, Niedermüller K, Gehin C, Sponga A, Puchinger M, Gavin AC, Leonard TA, Svergun DI, Smith TK, Morriswood B, Djinovic-Carugo K. Structures of three MORN repeat proteins and a re-evaluation of the proposed lipid-binding properties of MORN repeats. PLoS One 2020; 15:e0242677. [PMID: 33296386 PMCID: PMC7725318 DOI: 10.1371/journal.pone.0242677] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 11/08/2020] [Indexed: 11/19/2022] Open
Abstract
MORN (Membrane Occupation and Recognition Nexus) repeat proteins have a wide taxonomic distribution, being found in both prokaryotes and eukaryotes. Despite this ubiquity, they remain poorly characterised at both a structural and a functional level compared to other common repeats. In functional terms, they are often assumed to be lipid-binding modules that mediate membrane targeting. We addressed this putative activity by focusing on a protein composed solely of MORN repeats-Trypanosoma brucei MORN1. Surprisingly, no evidence for binding to membranes or lipid vesicles by TbMORN1 could be obtained either in vivo or in vitro. Conversely, TbMORN1 did interact with individual phospholipids. High- and low-resolution structures of the MORN1 protein from Trypanosoma brucei and homologous proteins from the parasites Toxoplasma gondii and Plasmodium falciparum were obtained using a combination of macromolecular crystallography, small-angle X-ray scattering, and electron microscopy. This enabled a first structure-based definition of the MORN repeat itself. Furthermore, all three structures dimerised via their C-termini in an antiparallel configuration. The dimers could form extended or V-shaped quaternary structures depending on the presence of specific interface residues. This work provides a new perspective on MORN repeats, showing that they are protein-protein interaction modules capable of mediating both dimerisation and oligomerisation.
Collapse
Affiliation(s)
- Sara Sajko
- Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Irina Grishkovskaya
- Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Julius Kostan
- Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Melissa Graewert
- European Molecular Biology Laboratory, Hamburg Unit, Hamburg, Germany
| | - Kim Setiawan
- Department of Cell and Developmental Biology, Biocenter, University of Würzburg, Würzburg, Germany
| | - Linda Trübestein
- Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Korbinian Niedermüller
- Department of Cell and Developmental Biology, Biocenter, University of Würzburg, Würzburg, Germany
| | - Charlotte Gehin
- European Molecular Biology Laboratory, Heidelberg Unit, Heidelberg, Germany
- Institute of Bioengineering, Laboratory of Lipid Cell Biology, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Antonio Sponga
- Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Martin Puchinger
- Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | - Anne-Claude Gavin
- European Molecular Biology Laboratory, Heidelberg Unit, Heidelberg, Germany
- Department for Cell Physiology and Metabolism, University of Geneva, Centre Medical Universitaire, Geneva, Switzerland
| | - Thomas A. Leonard
- Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
| | | | - Terry K. Smith
- School of Biology, BSRC, University of St. Andrews, St. Andrews, United Kingdom
| | - Brooke Morriswood
- Department of Cell and Developmental Biology, Biocenter, University of Würzburg, Würzburg, Germany
| | - Kristina Djinovic-Carugo
- Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, Austria
- Department of Biochemistry, Faculty of Chemistry and Chemical Technology, University of Ljubljana, Ljubljana, Slovenia
| |
Collapse
|
69
|
Perovic V, Leclercq JY, Sumonja N, Richard FD, Veljkovic N, Kajava AV. Tally-2.0: upgraded validator of tandem repeat detection in protein sequences. Bioinformatics 2020; 36:3260-3262. [PMID: 32096820 DOI: 10.1093/bioinformatics/btaa121] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 02/02/2020] [Accepted: 02/18/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Proteins containing tandem repeats (TRs) are abundant, frequently fold in elongated non-globular structures and perform vital functions. A number of computational tools have been developed to detect TRs in protein sequences. A blurred boundary between imperfect TR motifs and non-repetitive sequences gave rise to necessity to validate the detected TRs. RESULTS Tally-2.0 is a scoring tool based on a machine learning (ML) approach, which allows to validate the results of TR detection. It was upgraded by using improved training datasets and additional ML features. Tally-2.0 performs at a level of 93% sensitivity, 83% specificity and an area under the receiver operating characteristic curve of 95%. AVAILABILITY AND IMPLEMENTATION Tally-2.0 is available, as a web tool and as a standalone application published under Apache License 2.0, on the URL https://bioinfo.crbm.cnrs.fr/index.php? route=tools&tool=27. It is supported on Linux. Source code is available upon request. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vladimir Perovic
- Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade 11001, Serbia
| | - Jeremy Y Leclercq
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Université de Montpellier, Montpellier 34293, France
| | - Neven Sumonja
- Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade 11001, Serbia
| | - Francois D Richard
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Université de Montpellier, Montpellier 34293, France.,Laboratory for Translational Breast Cancer Research, Department of Oncology, KU Leuven, Leuven 3000, Belgium
| | - Nevena Veljkovic
- Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade 11001, Serbia
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Université de Montpellier, Montpellier 34293, France
| |
Collapse
|
70
|
Paladin L, Necci M, Piovesan D, Mier P, Andrade-Navarro MA, Tosatto SCE. A novel approach to investigate the evolution of structured tandem repeat protein families by exon duplication. J Struct Biol 2020; 212:107608. [PMID: 32896658 DOI: 10.1016/j.jsb.2020.107608] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 08/19/2020] [Accepted: 08/21/2020] [Indexed: 11/30/2022]
Abstract
Tandem Repeat Proteins (TRPs) are ubiquitous in cells and are enriched in eukaryotes. They contributed to the evolution of organism complexity, specializing for functions that require quick adaptability such as immunity-related functions. To investigate the hypothesis of repeat protein evolution through exon duplication and rearrangement, we designed a tool to analyze the relationships between exon/intron patterns and structural symmetries. The tool allows comparison of the structure fragments as defined by exon/intron boundaries from Ensembl against the structural element repetitions from RepeatsDB. The all-against-all pairwise structural alignment between fragments and comparison of the two definitions (structural units and exons) are visualized in a single matrix, the "repeat/exon plot". An analysis of different repeat protein families, including the solenoids Leucine-Rich, Ankyrin, Pumilio, HEAT repeats and the β propellers Kelch-like, WD40 and RCC1, shows different behaviors, illustrated here through examples. For each example, the analysis of the exon mapping in homologous proteins supports the conservation of their exon patterns. We propose that when a clear-cut relationship between exon and structural boundaries can be identified, it is possible to infer a specific "evolutionary pattern" which may improve TRPs detection and classification.
Collapse
Affiliation(s)
| | - Marco Necci
- Dept. of Biomedical Sciences, University of Padova, Italy
| | | | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University of Mainz, Germany
| | | | | |
Collapse
|
71
|
|
72
|
Chavali S, Singh AK, Santhanam B, Babu MM. Amino acid homorepeats in proteins. Nat Rev Chem 2020; 4:420-434. [PMID: 37127972 DOI: 10.1038/s41570-020-0204-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/04/2020] [Indexed: 12/16/2022]
Abstract
Amino acid homorepeats, or homorepeats, are polypeptide segments found in proteins that contain stretches of identical amino acid residues. Although abnormal homorepeat expansions are linked to pathologies such as neurodegenerative diseases, homorepeats are prevalent in eukaryotic proteomes, suggesting that they are important for normal physiology. In this Review, we discuss recent advances in our understanding of the biological functions of homorepeats, which range from facilitating subcellular protein localization to mediating interactions between proteins across diverse cellular pathways. We explore how the functional diversity of homorepeat-containing proteins could be linked to the ability of homorepeats to adopt different structural conformations, an ability influenced by repeat composition, repeat length and the nature of flanking sequences. We conclude by highlighting how an understanding of homorepeats will help us better characterize and develop therapeutics against the human diseases to which they contribute.
Collapse
Affiliation(s)
- Sreenivas Chavali
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, UK.
- Department of Biology, Indian Institute of Science Education and Research (IISER) Tirupati, Tirupati, India.
| | - Anjali K Singh
- Department of Biology, Indian Institute of Science Education and Research (IISER) Tirupati, Tirupati, India
| | - Balaji Santhanam
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, UK
- Department of Structural Biology and Center for Data Driven Discovery, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - M Madan Babu
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, UK.
- Department of Structural Biology and Center for Data Driven Discovery, St. Jude Children's Research Hospital, Memphis, TN, USA.
| |
Collapse
|
73
|
|
74
|
Merski M, Młynarczyk K, Ludwiczak J, Skrzeczkowski J, Dunin-Horkawicz S, Górna MW. Self-analysis of repeat proteins reveals evolutionarily conserved patterns. BMC Bioinformatics 2020; 21:179. [PMID: 32381046 PMCID: PMC7204011 DOI: 10.1186/s12859-020-3493-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 04/15/2020] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Protein repeats can confound sequence analyses because the repetitiveness of their amino acid sequences lead to difficulties in identifying whether similar repeats are due to convergent or divergent evolution. We noted that the patterns derived from traditional "dot plot" protein sequence self-similarity analysis tended to be conserved in sets of related repeat proteins and this conservation could be quantitated using a Jaccard metric. RESULTS Comparison of these dot plots obviated the issues due to sequence similarity for analysis of repeat proteins. A high Jaccard similarity score was suggestive of a conserved relationship between closely related repeat proteins. The dot plot patterns decayed quickly in the absence of selective pressure with an expected loss of 50% of Jaccard similarity due to a loss of 8.2% sequence identity. To perform method testing, we assembled a standard set of 79 repeat proteins representing all the subgroups in RepeatsDB. Comparison of known repeat and non-repeat proteins from the PDB suggested that the information content in dot plots could be used to identify repeat proteins from pure sequence with no requirement for structural information. Analysis of the UniRef90 database suggested that 16.9% of all known proteins could be classified as repeat proteins. These 13.3 million putative repeat protein chains were clustered and a significant amount (82.9%) of clusters containing between 5 and 200 members were of a single functional type. CONCLUSIONS Dot plot analysis of repeat proteins attempts to obviate issues that arise due to the sequence degeneracy of repeat proteins. These results show that this kind of analysis can efficiently be applied to analyze repeat proteins on a large scale.
Collapse
Affiliation(s)
- Matthew Merski
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Krzysztof Młynarczyk
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Jan Ludwiczak
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
- Laboratory of Bioinformatics, Nencki Institute of Experimental Biology, Warsaw, Poland
| | - Jakub Skrzeczkowski
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Stanisław Dunin-Horkawicz
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Maria W. Górna
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| |
Collapse
|
75
|
A New Census of Protein Tandem Repeats and Their Relationship with Intrinsic Disorder. Genes (Basel) 2020; 11:genes11040407. [PMID: 32283633 PMCID: PMC7230257 DOI: 10.3390/genes11040407] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 03/29/2020] [Accepted: 04/01/2020] [Indexed: 12/31/2022] Open
Abstract
Protein tandem repeats (TRs) are often associated with immunity-related functions and diseases. Since that last census of protein TRs in 1999, the number of curated proteins increased more than seven-fold and new TR prediction methods were published. TRs appear to be enriched with intrinsic disorder and vice versa. The significance and the biological reasons for this association are unknown. Here, we characterize protein TRs across all kingdoms of life and their overlap with intrinsic disorder in unprecedented detail. Using state-of-the-art prediction methods, we estimate that 50.9% of proteins contain at least one TR, often located at the sequence flanks. Positive linear correlation between the proportion of TRs and the protein length was observed universally, with Eukaryotes in general having more TRs, but when the difference in length is taken into account the difference is quite small. TRs were enriched with disorder-promoting amino acids and were inside intrinsically disordered regions. Many such TRs were homorepeats. Our results support that TRs mostly originate by duplication and are involved in essential functions such as transcription processes, structural organization, electron transport and iron-binding. In viruses, TRs are found in proteins essential for virulence.
Collapse
|
76
|
Barik S. The Nature and Arrangement of Pentatricopeptide Domains and the Linker Sequences Between Them. Bioinform Biol Insights 2020; 14:1177932220906434. [PMID: 32180683 PMCID: PMC7059232 DOI: 10.1177/1177932220906434] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 01/23/2020] [Indexed: 12/31/2022] Open
Abstract
The tricopeptide (amino acid number in the 30s) repeats constitute some of the
most common amino acid repeats in proteins of diverse organisms. The most
important representatives of this class are the 34-residue and 35-residue
repeats, eponymously known as tetratricopeptide repeat (TPR) and
pentatricopeptide repeat (PPR), respectively. The unit motif of both consists of
a pair of alpha helices. As members of the large, all-helical repeat classes,
TPR and PPR share structural similarities, but also play specific roles in
protein function. In this study, a comprehensive bioinformatic analysis of the
PPR units and the linkers that connect them was conducted. The results suggested
the existence of PPR repeats of various formats, as well as smaller,
PPR-unrelated repeats. Besides their length, these repeats differed in amino
acid arrangements and location of key amino acids. These findings provide a
broader and unified perspective of the pentatricopeptide family while raising
provocative questions about the assembly and evolution of these domains.
Collapse
|
77
|
Lopez-Ortiz C, Peña-Garcia Y, Natarajan P, Bhandari M, Abburi V, Dutta SK, Yadav L, Stommel J, Nimmakayala P, Reddy UK. The ankyrin repeat gene family in Capsicum spp: Genome-wide survey, characterization and gene expression profile. Sci Rep 2020; 10:4044. [PMID: 32132613 PMCID: PMC7055287 DOI: 10.1038/s41598-020-61057-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Accepted: 02/20/2020] [Indexed: 11/09/2022] Open
Abstract
The ankyrin (ANK) repeat protein family is largely distributed across plants and has been found to participate in multiple processes such as plant growth and development, hormone response, response to biotic and abiotic stresses. It is considered as one of the major markers of capsaicin content in pepper fruits. In this study, we performed a genome-wide identification and expression analysis of genes encoding ANK proteins in three Capsicum species: Capsicum baccatum, Capsicum annuum and Capsicum chinense. We identified a total of 87, 85 and 96 ANK genes in C. baccatum, C. annuum and C. chinense genomes, respectively. Next, we performed a comprehensive bioinformatics analysis of the Capsicum ANK gene family including gene chromosomal localization, Cis-elements, conserved motif identification, intron/exon structural patterns and gene ontology classification as well as profile expression. Phylogenetic and domain organization analysis grouped the Capsicum ANK gene family into ten subfamilies distributed across all 12 pepper chromosomes at different densities. Analysis of the expression of ANK genes in leaf and pepper fruits suggested that the ANKs have specific expression patterns at various developmental stages in placenta tissue. Our results provide valuable information for further studies of the evolution, classification and putative functions of ANK genes in pepper.
Collapse
Affiliation(s)
- Carlos Lopez-Ortiz
- Department of Biology, Gus R. Douglass Institute, West Virginia State University, Institute, West Virginia, United States of America
| | - Yadira Peña-Garcia
- Department of Biology, Gus R. Douglass Institute, West Virginia State University, Institute, West Virginia, United States of America
| | - Purushothaman Natarajan
- Department of Biology, Gus R. Douglass Institute, West Virginia State University, Institute, West Virginia, United States of America.,Department of Genetic Engineering, School of Bioengineering, SRM Institute of Science and Technology, Kattankulathur, 603203, India
| | - Menuka Bhandari
- Department of Biology, Gus R. Douglass Institute, West Virginia State University, Institute, West Virginia, United States of America
| | - Venkata Abburi
- Department of Biology, Gus R. Douglass Institute, West Virginia State University, Institute, West Virginia, United States of America
| | - Sudip Kumar Dutta
- Department of Biology, Gus R. Douglass Institute, West Virginia State University, Institute, West Virginia, United States of America.,ICAR RC NEH Region, Mizoram Centre, Kolasib, Mizoram, India
| | - Lav Yadav
- Department of Biology, Gus R. Douglass Institute, West Virginia State University, Institute, West Virginia, United States of America
| | - John Stommel
- Genetic Improvement of Fruits and Vegetables Laboratory (USDA, ARS), Beltsville, MD, 20705, USA
| | - Padma Nimmakayala
- Department of Biology, Gus R. Douglass Institute, West Virginia State University, Institute, West Virginia, United States of America.
| | - Umesh K Reddy
- Department of Biology, Gus R. Douglass Institute, West Virginia State University, Institute, West Virginia, United States of America.
| |
Collapse
|
78
|
Regulation of FKBP51 and FKBP52 functions by post-translational modifications. Biochem Soc Trans 2020; 47:1815-1831. [PMID: 31754722 DOI: 10.1042/bst20190334] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Revised: 10/22/2019] [Accepted: 10/28/2019] [Indexed: 12/17/2022]
Abstract
FKBP51 and FKBP52 are two iconic members of the family of peptidyl-prolyl-(cis/trans)-isomerases (EC: 5.2.1.8), which comprises proteins that catalyze the cis/trans isomerization of peptidyl-prolyl peptide bonds in unfolded and partially folded polypeptide chains and native state proteins. Originally, both proteins have been studied as molecular chaperones belonging to the steroid receptor heterocomplex, where they were first discovered. In addition to their expected role in receptor folding and chaperoning, FKBP51 and FKBP52 are also involved in many biological processes, such as signal transduction, transcriptional regulation, protein transport, cancer development, and cell differentiation, just to mention a few examples. Recent studies have revealed that both proteins are subject of post-translational modifications such as phosphorylation, SUMOlyation, and acetylation. In this work, we summarize recent advances in the study of these immunophilins portraying them as scaffolding proteins capable to organize protein heterocomplexes, describing some of their antagonistic properties in the physiology of the cell, and the putative regulation of their properties by those post-translational modifications.
Collapse
|
79
|
Abstract
Proteins are molecular machines whose function depends on their ability to achieve complex folds with precisely defined structural and dynamic properties. The rational design of proteins from first-principles, or de novo, was once considered to be impossible, but today proteins with a variety of folds and functions have been realized. We review the evolution of the field from its earliest days, placing particular emphasis on how this endeavor has illuminated our understanding of the principles underlying the folding and function of natural proteins, and is informing the design of macromolecules with unprecedented structures and properties. An initial set of milestones in de novo protein design focused on the construction of sequences that folded in water and membranes to adopt folded conformations. The first proteins were designed from first-principles using very simple physical models. As computers became more powerful, the use of the rotamer approximation allowed one to discover amino acid sequences that stabilize the desired fold. As the crystallographic database of protein structures expanded in subsequent years, it became possible to construct proteins by assembling short backbone fragments that frequently recur in Nature. The second set of milestones in de novo design involves the discovery of complex functions. Proteins have been designed to bind a variety of metals, porphyrins, and other cofactors. The design of proteins that catalyze hydrolysis and oxygen-dependent reactions has progressed significantly. However, de novo design of catalysts for energetically demanding reactions, or even proteins that bind with high affinity and specificity to highly functionalized complex polar molecules remains an importnant challenge that is now being achieved. Finally, the protein design contributed significantly to our understanding of membrane protein folding and transport of ions across membranes. The area of membrane protein design, or more generally of biomimetic polymers that function in mixed or non-aqueous environments, is now becoming increasingly possible.
Collapse
|
80
|
McCord JP, Grove TZ. Engineering repeat proteins of the immune system. Biopolymers 2020; 111:e23348. [PMID: 32031681 DOI: 10.1002/bip.23348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 08/30/2019] [Accepted: 09/06/2019] [Indexed: 11/06/2022]
Abstract
Limitations associated with immunoglobulins have motivated the search for novel binding scaffolds. Repeat proteins have emerged as one promising class of scaffolds, but often are limited to binding protein and peptide targets. An exception is the repeat proteins of the immune system, which have in recent years served as an inspiration for binding scaffolds which can bind glycans and other classes of biomolecule. Like other repeat proteins, these proteins can be very stable and have a monomeric mode of binding, with elongated and highly variable binding surfaces. The ability to target glycans and glycoproteins fill an important gap in current tools for research and biomedical applications.
Collapse
Affiliation(s)
- Jennifer P McCord
- Department of Chemistry, Virginia Polytechnic Institute and State University, Blacksburg, VA, U.S.A
| | - Tijana Z Grove
- Department of Chemistry, Virginia Polytechnic Institute and State University, Blacksburg, VA, U.S.A.,Zarkovic Grove Consulting, LLC, Blacksburg, VA, U.S.A
| |
Collapse
|
81
|
Llabrés S, Tsenkov MI, MacGowan SA, Barton GJ, Zachariae U. Disease related single point mutations alter the global dynamics of a tetratricopeptide (TPR) α-solenoid domain. J Struct Biol 2020; 209:107405. [PMID: 31628985 PMCID: PMC6961204 DOI: 10.1016/j.jsb.2019.107405] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Revised: 10/04/2019] [Accepted: 10/15/2019] [Indexed: 01/18/2023]
Abstract
Tetratricopeptide repeat (TPR) proteins belong to the class of α-solenoid proteins, in which repetitive units of α-helical hairpin motifs stack to form superhelical, often highly flexible structures. TPR domains occur in a wide variety of proteins, and perform key functional roles including protein folding, protein trafficking, cell cycle control and post-translational modification. Here, we look at the TPR domain of the enzyme O-linked GlcNAc-transferase (OGT), which catalyses O-GlcNAcylation of a broad range of substrate proteins. A number of single-point mutations in the TPR domain of human OGT have been associated with the disease Intellectual Disability (ID). By extended steered and equilibrium atomistic simulations, we show that the OGT-TPR domain acts as an elastic nanospring, and that each of the ID-related local mutations substantially affect the global dynamics of the TPR domain. Since the nanospring character of the OGT-TPR domain is key to its function in binding and releasing OGT substrates, these changes of its biomechanics likely lead to defective substrate interaction. We find that neutral mutations in the human population, selected by analysis of the gnomAD database, do not incur these changes. Our findings may not only help to explain the ID phenotype of the mutants, but also aid the design of TPR proteins with tailored biomechanical properties.
Collapse
Affiliation(s)
- Salomé Llabrés
- Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK.
| | - Maxim I Tsenkov
- Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK
| | - Stuart A MacGowan
- Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK
| | - Geoffrey J Barton
- Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK
| | - Ulrich Zachariae
- Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK; Physics, School of Science and Engineering, University of Dundee, Dundee, UK.
| |
Collapse
|
82
|
Pagès G, Grudinin S. DeepSymmetry: using 3D convolutional networks for identification of tandem repeats and internal symmetries in protein structures. Bioinformatics 2019; 35:5113-5120. [PMID: 31161198 DOI: 10.1093/bioinformatics/btz454] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 04/16/2019] [Accepted: 05/29/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Thanks to the recent advances in structural biology, nowadays 3D structures of various proteins are solved on a routine basis. A large portion of these structures contain structural repetitions or internal symmetries. To understand the evolution mechanisms of these proteins and how structural repetitions affect the protein function, we need to be able to detect such proteins very robustly. As deep learning is particularly suited to deal with spatially organized data, we applied it to the detection of proteins with structural repetitions. RESULTS We present DeepSymmetry, a versatile method based on 3D convolutional networks that detects structural repetitions in proteins and their density maps. Our method is designed to identify tandem repeat proteins, proteins with internal symmetries, symmetries in the raw density maps, their symmetry order and also the corresponding symmetry axes. Detection of symmetry axes is based on learning 6D Veronese mappings of 3D vectors, and the median angular error of axis determination is less than one degree. We demonstrate the capabilities of our method on benchmarks with tandem-repeated proteins and also with symmetrical assemblies. For example, we have discovered about 7800 putative tandem repeat proteins in the PDB. AVAILABILITY AND IMPLEMENTATION The method is available at https://team.inria.fr/nano-d/software/deepsymmetry. It consists of a C++ executable that transforms molecular structures into volumetric density maps, and a Python code based on the TensorFlow framework for applying the DeepSymmetry model to these maps. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guillaume Pagès
- Inria, Université Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| | - Sergei Grudinin
- Inria, Université Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| |
Collapse
|
83
|
Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P, Gruca A, Grynberg M, Kajava AV, Promponas VJ, Anisimova M, Jakobsen KS, Linke D. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 2019; 47:10994-11006. [PMID: 31584084 PMCID: PMC6868369 DOI: 10.1093/nar/gkz841] [Citation(s) in RCA: 155] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 09/03/2019] [Accepted: 10/01/2019] [Indexed: 12/13/2022] Open
Abstract
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.
Collapse
Affiliation(s)
- Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton. CB10 1SD, UK
| | - Patryk Jarnot
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Aleksandra Gruca
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics PAS, Pawińskiego 5A, 02-106 Warsaw, Poland
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Universite Montpellier 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
- Institut de Biologie Computationnelle, 34095 Montpellier, France
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, PO Box 20537, CY 1678 Nicosia, Cyprus
| | - Maria Anisimova
- Institute of Applied Simulations, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Wädenswil, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Dirk Linke
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| |
Collapse
|
84
|
Mainieri A, Haig D. Retrotransposon gag-like 1 (RTL1) and the molecular evolution of self-targeting imprinted microRNAs. Biol Direct 2019; 14:18. [PMID: 31640745 PMCID: PMC6805670 DOI: 10.1186/s13062-019-0250-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 09/26/2019] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Transcription of the antisense strand of RTL1 produces a sense mRNA that is targeted for degradation by antisense microRNAs transcribed from the sense strand. Translation of the mRNA produces a retrotransposon-derived protein that is implicated in placental development. The sense and antisense transcripts are oppositely imprinted: sense mRNAs are expressed from the paternally-derived chromosome, antisense microRNAs from the maternally-derived chromosome. RESULTS Two microRNAs at the RTL1 locus, miR-431 and the rodent-specific miR-434, are derived from within tandem repeats. We present an evolutionary model for the establishment of a new self-targeting microRNA derived from within a tandem repeat that inhibits production of RTL1 protein when maternally-derived in heterozygotes but not when paternally-derived. CONCLUSIONS The interaction of sense and antisense transcripts can be interpreted as a form of communication between maternally-derived and paternally-derived RTL1 alleles that possesses many of the features of a greenbeard effect. This interaction is evolutionary stable, unlike a typical greenbeard effect, because of the necessary complementarity between microRNAs and mRNA transcribed from opposite strands of the same double helix. We conjecture that microRNAs and mRNA cooperate to reduce demands on mothers when an allele is paired with itself in homozygous offspring. REVIEWERS This article was reviewed by Eugene Berezikov and Bernard Crespi.
Collapse
Affiliation(s)
- Avantika Mainieri
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - David Haig
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
85
|
Aleksandrova AA, Sarti E, Forrest LR. MemSTATS: A Benchmark Set of Membrane Protein Symmetries and Pseudosymmetries. J Mol Biol 2019; 432:597-604. [PMID: 31628944 DOI: 10.1016/j.jmb.2019.09.020] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 08/30/2019] [Accepted: 09/23/2019] [Indexed: 02/06/2023]
Abstract
In membrane proteins, symmetry and pseudosymmetry often have functional or evolutionary implications. However, available symmetry detection methods have not been tested systematically on this class of proteins because of the lack of an appropriate benchmark set. Here we present MemSTATS, a publicly available benchmark set of both quaternary- and internal-symmetries in membrane protein structures. The symmetries are described in terms of order, repeated elements, and orientation of the axis with respect to the membrane plane. Moreover, using MemSTATS, we compare the performance of four widely used symmetry detection algorithms and highlight specific challenges and areas for improvement in the future.
Collapse
Affiliation(s)
- Antoniya A Aleksandrova
- Computational Structural Biology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Edoardo Sarti
- Computational Structural Biology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Lucy R Forrest
- Computational Structural Biology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
86
|
Rajathei DM, Parthasarathy S, Selvaraj S. Identification and Analysis of Long Repeats of Proteins at the Domain Level. Front Bioeng Biotechnol 2019; 7:250. [PMID: 31649924 PMCID: PMC6795024 DOI: 10.3389/fbioe.2019.00250] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 09/16/2019] [Indexed: 12/27/2022] Open
Abstract
Amino acid repeats play an important role in the structure and function of proteins. Analysis of long repeats in protein sequences enables one to understand their abundance, structure and function in the protein universe. In the present study, amino acid repeats of length >50 (long repeats) were identified in a non-redundant set of UniProt sequences using the RADAR program. The underlying structures and functions of these long repeats were carried out using the Gene3D for structural domains, Pfam for functional domains and enzyme and non-enzyme functional classification for catalytic and binding of the proteins. From a structural perspective, these long repeats seem to predominantly occur in certain architectures such as sandwich, bundle, barrel, and roll and within these architectures abundant in the superfolds. The lengths of the repeats within each fold are not uniform exhibiting different structures for different functions. We also observed that long repeats are in the domain regions of the family and are involved in the function of the proteins. After grouping based on enzyme and non-enzyme classes, we observed the abundant occurrence of long repeats in specific catalytic and binding of the proteins. In this study, we have analyzed the occurrence of long repeats in the protein sequence universe apart from well-characterized short tandem repeats in sequences and their structures and functions of the proteins at the domain level. The present study suggests that long repeats may play an important role in the structure and function of domains of the proteins.
Collapse
Affiliation(s)
- David Mary Rajathei
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, India
| | - Subbiah Parthasarathy
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, India
| | - Samuel Selvaraj
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, India
| |
Collapse
|
87
|
Hirsh L, Paladin L, Piovesan D, Tosatto SCE. RepeatsDB-lite: a web server for unit annotation of tandem repeat proteins. Nucleic Acids Res 2019; 46:W402-W407. [PMID: 29746699 PMCID: PMC6031040 DOI: 10.1093/nar/gky360] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 04/24/2018] [Indexed: 11/15/2022] Open
Abstract
RepeatsDB-lite (http://protein.bio.unipd.it/repeatsdb-lite) is a web server for the prediction of repetitive structural elements and units in tandem repeat (TR) proteins. TRs are a widespread but poorly annotated class of non-globular proteins carrying heterogeneous functions. RepeatsDB-lite extends the prediction to all TR types and strongly improves the performance both in terms of computational time and accuracy over previous methods, with precision above 95% for solenoid structures. The algorithm exploits an improved TR unit library derived from the RepeatsDB database to perform an iterative structural search and assignment. The web interface provides tools for analyzing the evolutionary relationships between units and manually refine the prediction by changing unit positions and protein classification. An all-against-all structure-based sequence similarity matrix is calculated and visualized in real-time for every user edit. Reviewed predictions can be submitted to RepeatsDB for review and inclusion.
Collapse
Affiliation(s)
- Layla Hirsh
- Dept. of Biomedical Sciences, University of Padua, Padua, Italy.,Dept. of Engineering, Pontificia Universidad Católica del Perú, Lima, Perú
| | - Lisanna Paladin
- Dept. of Biomedical Sciences, University of Padua, Padua, Italy
| | | | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padua, Padua, Italy.,CNR Institute of Neurosciences, Padua, Italy
| |
Collapse
|
88
|
Hughes SA, Wang F, Wang S, Kreutzberger MAB, Osinski T, Orlova A, Wall JS, Zuo X, Egelman EH, Conticello VP. Ambidextrous helical nanotubes from self-assembly of designed helical hairpin motifs. Proc Natl Acad Sci U S A 2019; 116:14456-14464. [PMID: 31262809 PMCID: PMC6642399 DOI: 10.1073/pnas.1903910116] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Tandem repeat proteins exhibit native designability and represent potentially useful scaffolds for the construction of synthetic biomimetic assemblies. We have designed 2 synthetic peptides, HEAT_R1 and LRV_M3Δ1, based on the consensus sequences of single repeats of thermophilic HEAT (PBS_HEAT) and Leucine-Rich Variant (LRV) structural motifs, respectively. Self-assembly of the peptides afforded high-aspect ratio helical nanotubes. Cryo-electron microscopy with direct electron detection was employed to analyze the structures of the solvated filaments. The 3D reconstructions from the cryo-EM maps led to atomic models for the HEAT_R1 and LRV_M3Δ1 filaments at resolutions of 6.0 and 4.4 Å, respectively. Surprisingly, despite sequence similarity at the lateral packing interface, HEAT_R1 and LRV_M3Δ1 filaments adopt the opposite helical hand and differ significantly in helical geometry, while retaining a local conformation similar to previously characterized repeat proteins of the same class. The differences in the 2 filaments could be rationalized on the basis of differences in cohesive interactions at the lateral and axial interfaces. These structural data reinforce previous observations regarding the structural plasticity of helical protein assemblies and the need for high-resolution structural analysis. Despite these observations, the native designability of tandem repeat proteins offers the opportunity to engineer novel helical nanotubes. Moreover, the resultant nanotubes have independently addressable and chemically distinguishable interior and exterior surfaces that would facilitate applications in selective recognition, transport, and release.
Collapse
Affiliation(s)
| | - Fengbin Wang
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908
| | - Shengyuan Wang
- Department of Chemistry, Emory University, Atlanta, GA 30322
| | - Mark A B Kreutzberger
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908
| | - Tomasz Osinski
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908
| | - Albina Orlova
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908
| | - Joseph S Wall
- Department of Biology, Brookhaven National Laboratory, Upton, NY 11973
| | - Xiaobing Zuo
- X-Ray Science Division, Argonne National Laboratory, Argonne, IL 60439
| | - Edward H Egelman
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22908
| | | |
Collapse
|
89
|
Li J, Liu H, Raval MH, Wan J, Yengo CM, Liu W, Zhang M. Structure of the MORN4/Myo3a Tail Complex Reveals MORN Repeats as Protein Binding Modules. Structure 2019; 27:1366-1374.e3. [PMID: 31279628 DOI: 10.1016/j.str.2019.06.004] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 05/25/2019] [Accepted: 06/17/2019] [Indexed: 10/26/2022]
Abstract
Tandem repeats are basic building blocks for constructing proteins with diverse structures and functions. Compared with extensively studied α-helix-based tandem repeats such as ankyrin, tetratricopeptide, armadillo, and HEAT repeat proteins, relatively little is known about tandem repeat proteins formed by β hairpins. In this study, we discovered that the MORN repeats from MORN4 function as a protein binding module specifically recognizing a tail cargo binding region from Myo3a. The structure of the MORN4/Myo3a complex shows that MORN4 forms an extended single-layered β-sheet structure and uses a U-shaped groove to bind to the Myo3a tail with high affinity and specificity. Sequence and structural analyses further elucidated the unique sequence features for folding and target binding of MORN repeats. Our work establishes that the β-hairpin-based MORN repeats are protein-protein interaction modules.
Collapse
Affiliation(s)
- Jianchao Li
- Division of Life Science, State Key Laboratory of Molecular Neuroscience, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China; Division of Cell, Developmental and Integrative Biology, School of Medicine, South China University of Technology, Guangzhou 510006, China
| | - Haiyang Liu
- Division of Life Science, State Key Laboratory of Molecular Neuroscience, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China; Shenzhen Key Laboratory for Neuronal Structural Biology, Biomedical Research Institute, Shenzhen Peking University-The Hong Kong University of Science and Technology Medical Center, Shenzhen 518036, China
| | - Manmeet H Raval
- Department of Cellular and Molecular Physiology, Pennsylvania State University College of Medicine, Hershey, PA 17033, USA
| | - Jun Wan
- Division of Life Science, State Key Laboratory of Molecular Neuroscience, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China; Shenzhen Key Laboratory for Neuronal Structural Biology, Biomedical Research Institute, Shenzhen Peking University-The Hong Kong University of Science and Technology Medical Center, Shenzhen 518036, China
| | - Christopher M Yengo
- Department of Cellular and Molecular Physiology, Pennsylvania State University College of Medicine, Hershey, PA 17033, USA
| | - Wei Liu
- Shenzhen Key Laboratory for Neuronal Structural Biology, Biomedical Research Institute, Shenzhen Peking University-The Hong Kong University of Science and Technology Medical Center, Shenzhen 518036, China.
| | - Mingjie Zhang
- Division of Life Science, State Key Laboratory of Molecular Neuroscience, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China; Shenzhen Key Laboratory for Neuronal Structural Biology, Biomedical Research Institute, Shenzhen Peking University-The Hong Kong University of Science and Technology Medical Center, Shenzhen 518036, China; Center of Systems Biology and Human Health, School of Science and Institute for Advanced Study, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China.
| |
Collapse
|
90
|
Perez-Riba A, Lowe AR, Main ERG, Itzhaki LS. Context-Dependent Energetics of Loop Extensions in a Family of Tandem-Repeat Proteins. Biophys J 2019; 114:2552-2562. [PMID: 29874606 PMCID: PMC6129472 DOI: 10.1016/j.bpj.2018.03.038] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 02/28/2018] [Accepted: 03/29/2018] [Indexed: 11/16/2022] Open
Abstract
Consensus-designed tetratricopeptide repeat proteins are highly stable, modular proteins that are strikingly amenable to rational engineering. They therefore have tremendous potential as building blocks for biomaterials and biomedicine. Here, we explore the possibility of extending the loops between repeats to enable further diversification, and we investigate how this modification affects stability and folding cooperativity. We find that extending a single loop by up to 25 residues does not disrupt the overall protein structure, but, strikingly, the effect on stability is highly context-dependent: in a two-repeat array, destabilization is relatively small and can be accounted for purely in entropic terms, whereas extending a loop in the middle of a large array is much more costly because of weakening of the interaction between the repeats. Our findings provide important and, to our knowledge, new insights that increase our understanding of the structure, folding, and function of natural repeat proteins and the design of artificial repeat proteins in biotechnology.
Collapse
Affiliation(s)
- Albert Perez-Riba
- Department of Pharmacology, University of Cambridge, Cambridge, United Kingdom
| | - Alan R Lowe
- London Centre for Nanotechnology, London, United Kingdom; Structural & Molecular Biology, University College London, London, United Kingdom; Department of Biological Sciences, Birkbeck College, University of London, London, United Kingdom
| | - Ewan R G Main
- School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom.
| | - Laura S Itzhaki
- Department of Pharmacology, University of Cambridge, Cambridge, United Kingdom.
| |
Collapse
|
91
|
A Graph-Based Approach for Detecting Sequence Homology in Highly Diverged Repeat Protein Families. Methods Mol Biol 2019. [PMID: 30298401 DOI: 10.1007/978-1-4939-8736-8_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Reconstructing evolutionary relationships in repeat proteins is notoriously difficult due to the high degree of sequence divergence that typically occurs between duplicated repeats. This is complicated further by the fact that proteins with a large number of similar repeats are more likely to produce significant local sequence alignments than proteins with fewer copies of the repeat motif. Furthermore, biologically correct sequence alignments are sometimes impossible to achieve in cases where insertion or translocation events disrupt the order of repeats in one of the sequences being aligned. Combined, these attributes make traditional phylogenetic methods for studying protein families unreliable for repeat proteins, due to the dependence of such methods on accurate sequence alignment.We present here a practical solution to this problem, making use of graph clustering combined with the open-source software package HH-suite, which enables highly sensitive detection of sequence relationships. Carrying out multiple rounds of homology searches via alignment of profile hidden Markov models, large sets of related proteins are generated. By representing the relationships between proteins in these sets as graphs, subsequent clustering with the Markov cluster algorithm enables robust detection of repeat protein subfamilies.
Collapse
|
92
|
Bliven SE, Lafita A, Rose PW, Capitani G, Prlić A, Bourne PE. Analyzing the symmetrical arrangement of structural repeats in proteins with CE-Symm. PLoS Comput Biol 2019; 15:e1006842. [PMID: 31009453 PMCID: PMC6504099 DOI: 10.1371/journal.pcbi.1006842] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 05/07/2019] [Accepted: 01/29/2019] [Indexed: 01/04/2023] Open
Abstract
Many proteins fold into highly regular and repetitive three dimensional structures. The analysis of structural patterns and repeated elements is fundamental to understand protein function and evolution. We present recent improvements to the CE-Symm tool for systematically detecting and analyzing the internal symmetry and structural repeats in proteins. In addition to the accurate detection of internal symmetry, the tool is now capable of i) reporting the type of symmetry, ii) identifying the smallest repeating unit, iii) describing the arrangement of repeats with transformation operations and symmetry axes, and iv) comparing the similarity of all the internal repeats at the residue level. CE-Symm 2.0 helps the user investigate proteins with a robust and intuitive sequence-to-structure analysis, with many applications in protein classification, functional annotation and evolutionary studies. We describe the algorithmic extensions of the method and demonstrate its applications to the study of interesting cases of protein evolution. Many protein structures show a great deal of regularity. Even within single polypeptide chains, about 25% of proteins contain self-similar repeating structures, which can be organized in ring-like symmetric arrangements or linear open repeats. The repeats are often related, and thus comparing the sequence and structure of repeats can give an idea as to the early evolutionary history of a protein family. Additionally, the conservation and divergence of repeats can lead to insights about the function of the proteins. This work describes CE-Symm 2.0, a tool for the analysis of protein symmetry. The method automatically detects internal symmetry in protein structures and produces a multiple alignment of structural repeats. The algorithm is able to detect the geometric relationships between the repeats, including cyclic, dihedral, and polyhedral symmetries, translational repeats, and cases where multiple symmetry operators are applicable in a hierarchical manner. These complex relationships can then be visualized in a graphical interface as a complete structure, as a superposition of repeats, or as a multiple alignment of the protein sequence. CE-Symm 2.0 can be systematically used for the automatic detection of internal symmetry in protein structures, or as an interactive tool for the analysis of structural repeats.
Collapse
Affiliation(s)
- Spencer E. Bliven
- Laboratory of Biomolecular Research, Paul Scherrer Institute, Villigen, Switzerland
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- Institute of Applied Simulation, Zurich University of Applied Science, Wädenswil, Switzerland
- * E-mail: (SEB), (AL)
| | - Aleix Lafita
- Laboratory of Biomolecular Research, Paul Scherrer Institute, Villigen, Switzerland
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- * E-mail: (SEB), (AL)
| | - Peter W. Rose
- RCSB Protein Data Bank, San Diego Supercomputing Center, University of California San Diego, La Jolla, California, United States of America
- Structural Bioinformatics Laboratory, San Diego Supercomputing Center, University of California San Diego, La Jolla, California, United States of America
| | - Guido Capitani
- Laboratory of Biomolecular Research, Paul Scherrer Institute, Villigen, Switzerland
- Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Andreas Prlić
- RCSB Protein Data Bank, San Diego Supercomputing Center, University of California San Diego, La Jolla, California, United States of America
| | - Philip E. Bourne
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America
| |
Collapse
|
93
|
Perez-Riba A, Synakewicz M, Itzhaki LS. Folding cooperativity and allosteric function in the tandem-repeat protein class. Philos Trans R Soc Lond B Biol Sci 2019; 373:rstb.2017.0188. [PMID: 29735741 DOI: 10.1098/rstb.2017.0188] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/17/2018] [Indexed: 01/08/2023] Open
Abstract
The term allostery was originally developed to describe structural changes in one binding site induced by the interaction of a partner molecule with a distant binding site, and it has been studied in depth in the field of enzymology. Here, we discuss the concept of action at a distance in relation to the folding and function of the solenoid class of tandem-repeat proteins such as tetratricopeptide repeats (TPRs) and ankyrin repeats. Distantly located repeats fold cooperatively, even though only nearest-neighbour interactions exist in these proteins. A number of repeat-protein scaffolds have been reported to display allosteric effects, transferred through the repeat array, that enable them to direct the activity of the multi-subunit enzymes within which they reside. We also highlight a recently identified group of tandem-repeat proteins, the RRPNN subclass of TPRs, recent crystal structures of which indicate that they function as allosteric switches to modulate multiple bacterial quorum-sensing mechanisms. We believe that the folding cooperativity of tandem-repeat proteins and the biophysical mechanisms that transform them into allosteric switches are intimately intertwined. This opinion piece aims to combine our understanding of the two areas and develop ideas on their common underlying principles.This article is part of a discussion meeting issue 'Allostery and molecular machines'.
Collapse
Affiliation(s)
- Albert Perez-Riba
- Department of Pharmacology, University of Cambridge, Tennis Court Road, Cambridge CB2 1PD, UK
| | - Marie Synakewicz
- Department of Pharmacology, University of Cambridge, Tennis Court Road, Cambridge CB2 1PD, UK
| | - Laura S Itzhaki
- Department of Pharmacology, University of Cambridge, Tennis Court Road, Cambridge CB2 1PD, UK
| |
Collapse
|
94
|
Perez-Riba A, Itzhaki LS. The tetratricopeptide-repeat motif is a versatile platform that enables diverse modes of molecular recognition. Curr Opin Struct Biol 2019; 54:43-49. [PMID: 30708253 DOI: 10.1016/j.sbi.2018.12.004] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Revised: 12/09/2018] [Accepted: 12/12/2018] [Indexed: 01/05/2023]
Abstract
Tetratricopeptide repeat (TPR) domains and TPR-like domains are widespread across nature. They are involved in varied cellular processes and have been traditionally associated with binding to short linear peptide motifs. However, examples of a much more diverse range of molecular recognition modes are increasing year by year. The Protein Data Bank has an ever-expanding collection of TPR proteins in complex with a myriad of different partners, ranging from short linear peptide motifs to large globular protein domains. In this review, we explore these varied binding modes. Additionally, we hope to highlight an emerging property of this simple, malleable fold-the potential for programmable complexity that can be achieved by acting as a scaffold for multiple binding partners.
Collapse
Affiliation(s)
- Albert Perez-Riba
- Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Canada.
| | - Laura S Itzhaki
- Department of Pharmacology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1PD, UK.
| |
Collapse
|
95
|
Villain E, Nikekhin AA, Kajava AV. Porins and Amyloids are Coded by Similar Sequence Motifs. Proteomics 2018; 19:e1800075. [DOI: 10.1002/pmic.201800075] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Revised: 09/27/2018] [Indexed: 01/25/2023]
Affiliation(s)
- Etienne Villain
- Centre de Recherche en Biologie cellulaire de MontpellierUMR 5237 CNRSUniversité Montpellier 1919 Route de MendeCEDEX 5 34293 Montpellier France
- Institut de Biologie Computationnelle 34095 Montpellier France
| | | | - Andrey V. Kajava
- Centre de Recherche en Biologie cellulaire de MontpellierUMR 5237 CNRSUniversité Montpellier 1919 Route de MendeCEDEX 5 34293 Montpellier France
- Institut de Biologie Computationnelle 34095 Montpellier France
- Institute of BioengineeringITMO University St. Petersburg 197101 Russia
| |
Collapse
|
96
|
Shen C, Du Y, Qiao F, Kong T, Yuan L, Zhang D, Wu X, Li D, Wu YD. Biophysical and structural characterization of the thermostable WD40 domain of a prokaryotic protein, Thermomonospora curvata PkwA. Sci Rep 2018; 8:12965. [PMID: 30154510 PMCID: PMC6113231 DOI: 10.1038/s41598-018-31140-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 08/10/2018] [Indexed: 01/25/2023] Open
Abstract
WD40 proteins belong to a big protein family with members identified in every eukaryotic proteome. However, WD40 proteins were only reported in a few prokaryotic proteomes. Using WDSP (http://wu.scbb.pkusz.edu.cn/wdsp/), a prediction tool, we identified thousands of prokaryotic WD40 proteins, among which few proteins have been biochemically characterized. As shown in our previous bioinformatics study, a large proportion of prokaryotic WD40 proteins have higher intramolecular sequence identity among repeats and more hydrogen networks, which may indicate better stability than eukaryotic WD40s. Here we report our biophysical and structural study on the WD40 domain of PkwA from Thermomonospora curvata (referred as tPkwA-C). We demonstrated that the stability of thermophilic tPkwA-C correlated to ionic strength and tPkwA-C exhibited fully reversible unfolding under different denaturing conditions. Therefore, the folding kinetics was also studied through stopped-flow circular dichroism spectra. The crystal structure of tPkwA-C was further resolved and shed light on the key factors that stabilize its beta-propeller structure. Like other WD40 proteins, DHSW tetrad has a significant impact on the stability of tPkwA-C. Considering its unique features, we proposed that tPkwA-C should be a great structural template for protein engineering to study key residues involved in protein-protein interaction of a WD40 protein.
Collapse
Affiliation(s)
- Chen Shen
- Lab of Computational Chemistry and Drug Design, Laboratory of Chemical Genomics, Peking University Shenzhen Graduate School, Shenzhen, 518055, China
| | - Ye Du
- Lab of Computational Chemistry and Drug Design, Laboratory of Chemical Genomics, Peking University Shenzhen Graduate School, Shenzhen, 518055, China.,Medical Research Center, The People's Hospital of Longhua, Shenzhen, 518109, China
| | - Fangfang Qiao
- Lab of Computational Chemistry and Drug Design, Laboratory of Chemical Genomics, Peking University Shenzhen Graduate School, Shenzhen, 518055, China
| | - Tian Kong
- Lab of Computational Chemistry and Drug Design, Laboratory of Chemical Genomics, Peking University Shenzhen Graduate School, Shenzhen, 518055, China
| | - Lirong Yuan
- Lab of Computational Chemistry and Drug Design, Laboratory of Chemical Genomics, Peking University Shenzhen Graduate School, Shenzhen, 518055, China
| | - Delin Zhang
- Lab of Computational Chemistry and Drug Design, Laboratory of Chemical Genomics, Peking University Shenzhen Graduate School, Shenzhen, 518055, China
| | - Xianhui Wu
- Lab of Computational Chemistry and Drug Design, Laboratory of Chemical Genomics, Peking University Shenzhen Graduate School, Shenzhen, 518055, China
| | - Dongyang Li
- Lab of Computational Chemistry and Drug Design, Laboratory of Chemical Genomics, Peking University Shenzhen Graduate School, Shenzhen, 518055, China. .,SUSTech Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen, 518055, China.
| | - Yun-Dong Wu
- Lab of Computational Chemistry and Drug Design, Laboratory of Chemical Genomics, Peking University Shenzhen Graduate School, Shenzhen, 518055, China. .,College of Chemistry, Peking University, Beijing, 100871, China.
| |
Collapse
|
97
|
Davies HM, Nofal SD, McLaughlin EJ, Osborne AR. Repetitive sequences in malaria parasite proteins. FEMS Microbiol Rev 2018; 41:923-940. [PMID: 29077880 DOI: 10.1093/femsre/fux046] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Accepted: 09/13/2017] [Indexed: 12/13/2022] Open
Abstract
Five species of parasite cause malaria in humans with the most severe disease caused by Plasmodium falciparum. Many of the proteins encoded in the P. falciparum genome are unusually enriched in repetitive low-complexity sequences containing a limited repertoire of amino acids. These repetitive sequences expand and contract dynamically and are among the most rapidly changing sequences in the genome. The simplest repetitive sequences consist of single amino acid repeats such as poly-asparagine tracts that are found in approximately 25% of P. falciparum proteins. More complex repeats of two or more amino acids are also common in diverse parasite protein families. There is no universal explanation for the occurrence of repetitive sequences and it is possible that many confer no function to the encoded protein and no selective advantage or disadvantage to the parasite. However, there are increasing numbers of examples where repetitive sequences are important for parasite protein function. We discuss the diverse roles of low-complexity repetitive sequences throughout the parasite life cycle, from mediating protein-protein interactions to enabling the parasite to evade the host immune system.
Collapse
Affiliation(s)
- Heledd M Davies
- The Francis Crick Institute, London, NW1 1AT, United Kingdom
| | - Stephanie D Nofal
- London School of Hygiene and Tropical Medicine, Keppel Street, London, WC1E 7HT, United Kingdom
| | - Emilia J McLaughlin
- Institute of Structural and Molecular Biology, University College London, Gower Street, London WC1E 6BT, United Kingdom
| | - Andrew R Osborne
- Institute of Structural and Molecular Biology, University College London, Gower Street, London WC1E 6BT, United Kingdom.,Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck, Malet Street, London, WC1E 7HX, United Kingdom
| |
Collapse
|
98
|
Intrinsic Disorder in Proteins with Pathogenic Repeat Expansions. Molecules 2017; 22:molecules22122027. [PMID: 29186753 PMCID: PMC6149999 DOI: 10.3390/molecules22122027] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Revised: 11/18/2017] [Accepted: 11/21/2017] [Indexed: 11/18/2022] Open
Abstract
Intrinsically disordered proteins and proteins with intrinsically disordered regions have been shown to be highly prevalent in disease. Furthermore, disease-causing expansions of the regions containing tandem amino acid repeats often push repetitive proteins towards formation of irreversible aggregates. In fact, in disease-relevant proteins, the increased repeat length often positively correlates with the increased aggregation efficiency and the increased disease severity and penetrance, being negatively correlated with the age of disease onset. The major categories of repeat extensions involved in disease include poly-glutamine and poly-alanine homorepeats, which are often times located in the intrinsically disordered regions, as well as repeats in non-coding regions of genes typically encoding proteins with ordered structures. Repeats in such non-coding regions of genes can be expressed at the mRNA level. Although they can affect the expression levels of encoded proteins, they are not translated as parts of an affected protein and have no effect on its structure. However, in some cases, the repetitive mRNAs can be translated in a non-canonical manner, generating highly repetitive peptides of different length and amino acid composition. The repeat extension-caused aggregation of a repetitive protein may represent a pivotal step for its transformation into a proteotoxic entity that can lead to pathology. The goals of this article are to systematically analyze molecular mechanisms of the proteinopathies caused by the poly-glutamine and poly-alanine homorepeat expansion, as well as by the polypeptides generated as a result of the microsatellite expansions in non-coding gene regions and to examine the related proteins. We also present results of the analysis of the prevalence and functional roles of intrinsic disorder in proteins associated with pathological repeat expansions.
Collapse
|
99
|
Roche DB, Viet PD, Bakulina A, Hirsh L, Tosatto SCE, Kajava AV. Classification of β-hairpin repeat proteins. J Struct Biol 2017; 201:130-138. [PMID: 29017817 DOI: 10.1016/j.jsb.2017.10.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Revised: 10/02/2017] [Accepted: 10/04/2017] [Indexed: 12/11/2022]
Abstract
In recent years, a number of new protein structures that possess tandem repeats have emerged. Many of these proteins are comprised of tandem arrays of β-hairpins. Today, the amount and variety of the data on these β-hairpin repeat (BHR) structures have reached a level that requires detailed analysis and further classification. In this paper, we classified the BHR proteins, compared structures, sequences of repeat motifs, functions and distribution across the major taxonomic kingdoms of life and within organisms. As a result, we identified six different BHR folds in tandem repeat proteins of Class III (elongated structures) and one BHR fold (up-and-down β-barrel) in Class IV ("closed" structures). Our survey reveals the high incidence of the BHR proteins among bacteria and viruses and their possible relationship to the structures of amyloid fibrils. It indicates that BHR folds will be an attractive target for future structural studies, especially in the context of age-related amyloidosis and emerging infectious diseases. This work allowed us to update the RepeatsDB database, which contains annotated tandem repeat protein structures and to construct sequence profiles based on BHR structural alignments.
Collapse
Affiliation(s)
- Daniel B Roche
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, Montpellier 34293, France; Institut de Biologie Computationnelle, Montpellier, France
| | - Phuong Do Viet
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, Montpellier 34293, France; Institut de Biologie Computationnelle, Montpellier, France
| | - Anastasia Bakulina
- Novosibirsk State University, Pirogova str. 1, Novosibirsk 630090, Russia; State Research Center of Virology and Biotechnology VECTOR, Koltsovo, Russia
| | - Layla Hirsh
- Department of Biomedical Sciences, University of Padova, I-35121 Padova, Italy; Engineering Department, Pontifical Catholic University of Peru, Lima 32, Peru
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, I-35121 Padova, Italy
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, Montpellier 34293, France; Institut de Biologie Computationnelle, Montpellier, France.
| |
Collapse
|
100
|
Wood CW, Woolfson DN. CCBuilder 2.0: Powerful and accessible coiled-coil modeling. Protein Sci 2017; 27:103-111. [PMID: 28836317 PMCID: PMC5734305 DOI: 10.1002/pro.3279] [Citation(s) in RCA: 91] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 08/22/2017] [Indexed: 01/06/2023]
Abstract
The increased availability of user-friendly and accessible computational tools for biomolecular modeling would expand the reach and application of biomolecular engineering and design. For protein modeling, one key challenge is to reduce the complexities of 3D protein folds to sets of parametric equations that nonetheless capture the salient features of these structures accurately. At present, this is possible for a subset of proteins, namely, repeat proteins. The α-helical coiled coil provides one such example, which represents ≈ 3-5% of all known protein-encoding regions of DNA. Coiled coils are bundles of α helices that can be described by a small set of structural parameters. Here we describe how this parametric description can be implemented in an easy-to-use web application, called CCBuilder 2.0, for modeling and optimizing both α-helical coiled coils and polyproline-based collagen triple helices. This has many applications from providing models to aid molecular replacement for X-ray crystallography, in silico model building and engineering of natural and designed protein assemblies, and through to the creation of completely de novo "dark matter" protein structures. CCBuilder 2.0 is available as a web-based application, the code for which is open-source and can be downloaded freely. http://coiledcoils.chm.bris.ac.uk/ccbuilder2. LAY SUMMARY We have created CCBuilder 2.0, an easy to use web-based application that can model structures for a whole class of proteins, the α-helical coiled coil, which is estimated to account for 3-5% of all proteins in nature. CCBuilder 2.0 will be of use to a large number of protein scientists engaged in fundamental studies, such as protein structure determination, through to more-applied research including designing and engineering novel proteins that have potential applications in biotechnology.
Collapse
Affiliation(s)
- Christopher W Wood
- School of Chemistry, University of Bristol, Cantock's Close, Bristol, BS8 1TS, United Kingdom
| | - Derek N Woolfson
- School of Chemistry, University of Bristol, Cantock's Close, Bristol, BS8 1TS, United Kingdom.,School of Biochemistry, University of Bristol, Medical Sciences Building, University Walk, Bristol, BS8 1TD, United Kingdom.,BrisSynBio, University of Bristol, Life Sciences Building, Tyndall Avenue, Bristol, BS8 1TQ, United Kingdom
| |
Collapse
|