1
|
Naudé M, Faller P, Lebrun V. A Closer Look at Type I Left-Handed β-Helices Provides a Better Understanding in Their Sequence-Structure Relationship: Toward Their Rational Design. Proteins 2024. [PMID: 38980225 DOI: 10.1002/prot.26726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/17/2024] [Accepted: 06/20/2024] [Indexed: 07/10/2024]
Abstract
Understanding the sequence-structure relationship in protein is of fundamental interest, but has practical applications such as the rational design of peptides and proteins. This relationship in the Type I left-handed β-helix containing proteins is updated and revisited in this study. Analyzing the available experimental structures in the Protein Data Bank, we could describe, further in detail, the structural features that are important for the stability of this fold, as well as its nucleation and termination. This study is meant to complete previous work, as it provides a separate analysis of the N-terminal and C-terminal rungs of the helix. Particular sequence motifs of these rungs are described along with the structural element they form.
Collapse
Affiliation(s)
- Maxime Naudé
- Institute of Chemistry of Strasbourg (UMR 7177), University of Strasbourg-CNRS, Strasbourg, France
| | - Peter Faller
- Institute of Chemistry of Strasbourg (UMR 7177), University of Strasbourg-CNRS, Strasbourg, France
| | - Vincent Lebrun
- Institute of Chemistry of Strasbourg (UMR 7177), University of Strasbourg-CNRS, Strasbourg, France
| |
Collapse
|
2
|
Chakrabarty B, Parekh N. DbStRiPs: Database of structural repeats in proteins. Protein Sci 2022; 31:23-36. [PMID: 33641184 PMCID: PMC8740836 DOI: 10.1002/pro.4052] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 02/11/2021] [Accepted: 02/15/2021] [Indexed: 01/03/2023]
Abstract
Recent interest in repeat proteins has arisen due to stable structural folds, high evolutionary conservation and repertoire of functions provided by these proteins. However, repeat proteins are poorly characterized because of high sequence variation between repeating units and structure-based identification and classification of repeats is desirable. Using a robust network-based pipeline, manual curation and Kajava's structure-based classification schema, we have developed a database of tandem structural repeats, Database of Structural Repeats in Proteins (DbStRiPs). A unique feature of this database is that available knowledge on sequence repeat families is incorporated by mapping Pfam classification scheme onto structural classification. Integration of sequence and structure-based classifications help in identifying different functional groups within the same structural subclass, leading to refinement in the annotation of repeat proteins. Analysis of complete Protein Data Bank revealed 16,472 repeat annotations in 15,141 protein chains, one previously uncharacterized novel protein repeat family (PRF), named left-handed beta helix, and 33 protein repeat clusters (PRCs). Based on their unique structural motif, ~79% of these repeat proteins are classified in one of the 14 PRFs or 33 PRCs, and the remaining are grouped as unclassified repeat proteins. Each repeat protein is provided with a detailed annotation in DbStRiPs that includes start and end boundaries of repeating units, copy number, secondary and tertiary structure view, repeat class/subclass, disease association, MSA of repeating units and cross-references to various protein pattern databases, human protein atlas and interaction resources. DbStRiPs provides easy search and download options to high-quality annotations of structural repeat proteins (URL: http://bioinf.iiit.ac.in/dbstrips/).
Collapse
Affiliation(s)
- Broto Chakrabarty
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information TechnologyHyderabadIndia
| | - Nita Parekh
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information TechnologyHyderabadIndia
| |
Collapse
|
3
|
Lindenburg LH, Pantelejevs T, Gielen F, Zuazua-Villar P, Butz M, Rees E, Kaminski CF, Downs JA, Hyvönen M, Hollfelder F. Improved RAD51 binders through motif shuffling based on the modularity of BRC repeats. Proc Natl Acad Sci U S A 2021; 118:e2017708118. [PMID: 34772801 PMCID: PMC8727024 DOI: 10.1073/pnas.2017708118] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/10/2021] [Indexed: 01/20/2023] Open
Abstract
Exchanges of protein sequence modules support leaps in function unavailable through point mutations during evolution. Here we study the role of the two RAD51-interacting modules within the eight binding BRC repeats of BRCA2. We created 64 chimeric repeats by shuffling these modules and measured their binding to RAD51. We found that certain shuffled module combinations were stronger binders than any of the module combinations in the natural repeats. Surprisingly, the contribution from the two modules was poorly correlated with affinities of natural repeats, with a weak BRC8 repeat containing the most effective N-terminal module. The binding of the strongest chimera, BRC8-2, to RAD51 was improved by -2.4 kCal/mol compared to the strongest natural repeat, BRC4. A crystal structure of RAD51:BRC8-2 complex shows an improved interface fit and an extended β-hairpin in this repeat. BRC8-2 was shown to function in human cells, preventing the formation of nuclear RAD51 foci after ionizing radiation.
Collapse
Affiliation(s)
- Laurens H Lindenburg
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Teodors Pantelejevs
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Fabrice Gielen
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
- Living Systems Institute, University of Exeter, Exeter EX4 4QD, United Kingdom
| | - Pedro Zuazua-Villar
- Division of Cancer Biology, The Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Maren Butz
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Eric Rees
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
| | - Clemens F Kaminski
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom
| | - Jessica A Downs
- Division of Cancer Biology, The Institute of Cancer Research, London SW3 6JB, United Kingdom
| | - Marko Hyvönen
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom;
| | - Florian Hollfelder
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom;
| |
Collapse
|
4
|
Tassia MG, David KT, Townsend JP, Halanych KM. TIAMMAt: Leveraging biodiversity to revise protein domain models, evidence from innate immunity. Mol Biol Evol 2021; 38:5806-5818. [PMID: 34459919 PMCID: PMC8662601 DOI: 10.1093/molbev/msab258] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Sequence annotation is fundamental for studying the evolution of protein families, particularly when working with nonmodel species. Given the rapid, ever-increasing number of species receiving high-quality genome sequencing, accurate domain modeling that is representative of species diversity is crucial for understanding protein family sequence evolution and their inferred function(s). Here, we describe a bioinformatic tool called Taxon-Informed Adjustment of Markov Model Attributes (TIAMMAt) which revises domain profile hidden Markov models (HMMs) by incorporating homologous domain sequences from underrepresented and nonmodel species. Using innate immunity pathways as a case study, we show that revising profile HMM parameters to directly account for variation in homologs among underrepresented species provides valuable insight into the evolution of protein families. Following adjustment by TIAMMAt, domain profile HMMs exhibit changes in their per-site amino acid state emission probabilities and insertion/deletion probabilities while maintaining the overall structure of the consensus sequence. Our results show that domain revision can heavily impact evolutionary interpretations for some families (i.e., NLR’s NACHT domain), whereas impact on other domains (e.g., rel homology domain and interferon regulatory factor domains) is minimal due to high levels of sequence conservation across the sampled phylogenetic depth (i.e., Metazoa). Importantly, TIAMMAt revises target domain models to reflect homologous sequence variation using the taxonomic distribution under consideration by the user. TIAMMAt’s flexibility to revise any subset of the Pfam database using a user-defined taxonomic pool will make it a valuable tool for future protein evolution studies, particularly when incorporating (or focusing) on nonmodel species.
Collapse
Affiliation(s)
- Michael G Tassia
- Department of Biological Sciences, Auburn University, Auburn, Alabama
| | - Kyle T David
- Department of Biological Sciences, Auburn University, Auburn, Alabama
| | - James P Townsend
- Whitman Center, Marine Biological Laboratory, Woods Hole, Massachusetts.,Department of Biology, Providence College, Providence, Rhode Island
| | | |
Collapse
|
5
|
Structural Insights into Ankyrin Repeat-Containing Proteins and Their Influence in Ubiquitylation. Int J Mol Sci 2021; 22:ijms22020609. [PMID: 33435370 PMCID: PMC7826745 DOI: 10.3390/ijms22020609] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 01/05/2021] [Accepted: 01/07/2021] [Indexed: 12/12/2022] Open
Abstract
Ankyrin repeat (AR) domains are considered the most abundant repeat motif found in eukaryotic proteins. AR domains are predominantly known to mediate specific protein-protein interactions (PPIs) without necessarily recognizing specific primary sequences, nor requiring strict conformity within its own primary sequence. This promiscuity allows for one AR domain to recognize and bind to a variety of intracellular substrates, suggesting that AR-containing proteins may be involved in a wide array of functions. Many AR-containing proteins serve a critical role in biological processes including the ubiquitylation signaling pathway (USP). There is also strong evidence that AR-containing protein malfunction are associated with several neurological diseases and disorders. In this review, the structure and mechanism of key AR-containing proteins are discussed to suggest and/or identify how each protein utilizes their AR domains to support ubiquitylation and the cascading pathways that follow upon substrate modification.
Collapse
|
6
|
Perovic V, Leclercq JY, Sumonja N, Richard FD, Veljkovic N, Kajava AV. Tally-2.0: upgraded validator of tandem repeat detection in protein sequences. Bioinformatics 2020; 36:3260-3262. [PMID: 32096820 DOI: 10.1093/bioinformatics/btaa121] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 02/02/2020] [Accepted: 02/18/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Proteins containing tandem repeats (TRs) are abundant, frequently fold in elongated non-globular structures and perform vital functions. A number of computational tools have been developed to detect TRs in protein sequences. A blurred boundary between imperfect TR motifs and non-repetitive sequences gave rise to necessity to validate the detected TRs. RESULTS Tally-2.0 is a scoring tool based on a machine learning (ML) approach, which allows to validate the results of TR detection. It was upgraded by using improved training datasets and additional ML features. Tally-2.0 performs at a level of 93% sensitivity, 83% specificity and an area under the receiver operating characteristic curve of 95%. AVAILABILITY AND IMPLEMENTATION Tally-2.0 is available, as a web tool and as a standalone application published under Apache License 2.0, on the URL https://bioinfo.crbm.cnrs.fr/index.php? route=tools&tool=27. It is supported on Linux. Source code is available upon request. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vladimir Perovic
- Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade 11001, Serbia
| | - Jeremy Y Leclercq
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Université de Montpellier, Montpellier 34293, France
| | - Neven Sumonja
- Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade 11001, Serbia
| | - Francois D Richard
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Université de Montpellier, Montpellier 34293, France.,Laboratory for Translational Breast Cancer Research, Department of Oncology, KU Leuven, Leuven 3000, Belgium
| | - Nevena Veljkovic
- Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade 11001, Serbia
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Université de Montpellier, Montpellier 34293, France
| |
Collapse
|
7
|
Alvarez-Carreño C, Coello G, Arciniega M. FiRES: A computational method for the de novo identification of internal structure similarity in proteins. Proteins 2020; 88:1169-1179. [PMID: 32112578 DOI: 10.1002/prot.25886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 11/12/2019] [Accepted: 02/24/2020] [Indexed: 11/08/2022]
Abstract
Internal structure similarity in proteins can be observed at the domain and subdomain levels. From an evolutionary perspective, structurally similar elements may arise divergently by gene duplication and fusion events but may also be the product of convergent evolution under physicochemical constraints. The characterization of proteins that contain repeated structural elements has implications for many fields of protein science including protein domain evolution, structure classification, structure prediction, and protein engineering. FiRES (Find Repeated Elements in Structure) is an algorithm that relies on a topology-independent structure alignment method to identify repeating elements in protein structure. FiRES was tested against two hand curated databases of protein repeats: MALIDUP, for very divergent duplicated domains; and RepeatsDB for short tandem repeats. The performance of FiRES was compared to that of lalign, RADAR, HHrepID, CE-symm, ReUPred, and Swelfe. FiRES was the method that most accurately detected proteins either with duplicated domains (accuracy = 0.86) or with multiple repeated units (accuracy = 0.92). FiRES is a new methodology for the discovery of proteins containing structurally similar elements. The FiRES web server is publicly available at http://fires.ifc.unam.mx. The scripts, results, and benchmarks from this study can be downloaded from https://github.com/Claualvarez/fires.
Collapse
Affiliation(s)
- Claudia Alvarez-Carreño
- Department of Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico.,School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Gerardo Coello
- Unidad de Cómputo, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Marcelino Arciniega
- Department of Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
8
|
Pagès G, Grudinin S. DeepSymmetry: using 3D convolutional networks for identification of tandem repeats and internal symmetries in protein structures. Bioinformatics 2019; 35:5113-5120. [PMID: 31161198 DOI: 10.1093/bioinformatics/btz454] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 04/16/2019] [Accepted: 05/29/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Thanks to the recent advances in structural biology, nowadays 3D structures of various proteins are solved on a routine basis. A large portion of these structures contain structural repetitions or internal symmetries. To understand the evolution mechanisms of these proteins and how structural repetitions affect the protein function, we need to be able to detect such proteins very robustly. As deep learning is particularly suited to deal with spatially organized data, we applied it to the detection of proteins with structural repetitions. RESULTS We present DeepSymmetry, a versatile method based on 3D convolutional networks that detects structural repetitions in proteins and their density maps. Our method is designed to identify tandem repeat proteins, proteins with internal symmetries, symmetries in the raw density maps, their symmetry order and also the corresponding symmetry axes. Detection of symmetry axes is based on learning 6D Veronese mappings of 3D vectors, and the median angular error of axis determination is less than one degree. We demonstrate the capabilities of our method on benchmarks with tandem-repeated proteins and also with symmetrical assemblies. For example, we have discovered about 7800 putative tandem repeat proteins in the PDB. AVAILABILITY AND IMPLEMENTATION The method is available at https://team.inria.fr/nano-d/software/deepsymmetry. It consists of a C++ executable that transforms molecular structures into volumetric density maps, and a Python code based on the TensorFlow framework for applying the DeepSymmetry model to these maps. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guillaume Pagès
- Inria, Université Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| | - Sergei Grudinin
- Inria, Université Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| |
Collapse
|
9
|
Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P, Gruca A, Grynberg M, Kajava AV, Promponas VJ, Anisimova M, Jakobsen KS, Linke D. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 2019; 47:10994-11006. [PMID: 31584084 PMCID: PMC6868369 DOI: 10.1093/nar/gkz841] [Citation(s) in RCA: 159] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 09/03/2019] [Accepted: 10/01/2019] [Indexed: 12/13/2022] Open
Abstract
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.
Collapse
Affiliation(s)
- Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton. CB10 1SD, UK
| | - Patryk Jarnot
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Aleksandra Gruca
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics PAS, Pawińskiego 5A, 02-106 Warsaw, Poland
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Universite Montpellier 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
- Institut de Biologie Computationnelle, 34095 Montpellier, France
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, PO Box 20537, CY 1678 Nicosia, Cyprus
| | - Maria Anisimova
- Institute of Applied Simulations, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Wädenswil, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Dirk Linke
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| |
Collapse
|
10
|
Hwang HJ, Han JW, Jeon H, Han JW. Induction of Recombinant Lectin Expression by an Artificially Constructed Tandem Repeat Structure: A Case Study Using Bryopsis plumosa Mannose-Binding Lectin. Biomolecules 2018; 8:E146. [PMID: 30441842 PMCID: PMC6316659 DOI: 10.3390/biom8040146] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 11/12/2018] [Accepted: 11/12/2018] [Indexed: 11/16/2022] Open
Abstract
Lectin is an important protein in medical and pharmacological applications. Impurities in lectin derived from natural sources and the generation of inactive proteins by recombinant technology are major obstacles for the use of lectins. Expressing recombinant lectin with a tandem repeat structure can potentially overcome these problems, but few studies have systematically examined this possibility. This was investigated in the present study using three distinct forms of recombinant mannose-binding lectin from Bryopsis plumosa (BPL2)-i.e., the monomer (rD1BPL2), as well as the dimer (rD2BPL2), and tetramer (rD4BPL2) arranged as tandem repeats. The concentration of the inducer molecule isopropyl β-D-1-thiogalactopyranoside and the induction time had no effect on the efficiency of the expression of each construct. Of the tested constructs, only rD4BPL2 showed hemagglutination activity towards horse erythrocytes; the activity of towards the former was 64 times higher than that of native BPL2. Recombinant and native BPL2 showed differences in carbohydrate specificity; the activity of rD4BPL2 was inhibited by the glycoprotein fetuin, whereas that of native BPL2 was also inhibited by d-mannose. Our results indicate that expression as tandem repeat sequences can increase the efficiency of lectin production on a large scale using a bacterial expression system.
Collapse
Affiliation(s)
- Hyun-Ju Hwang
- Department of Genetic Resources, National Marine Biodiversity Institute of Korea, Seocheon 33662, Korea.
| | - Jin-Woo Han
- Department of Genetic Resources, National Marine Biodiversity Institute of Korea, Seocheon 33662, Korea.
| | - Hancheol Jeon
- Department of Genetic Resources, National Marine Biodiversity Institute of Korea, Seocheon 33662, Korea.
| | - Jong Won Han
- Department of Genetic Resources, National Marine Biodiversity Institute of Korea, Seocheon 33662, Korea.
| |
Collapse
|
11
|
Abstract
Accumulating evidence suggests that many classes of DNA repeats exhibit attributes that distinguish them from other genetic variants, including the fact that they are more liable to mutation; this enables them to mediate genetic plasticity. The expansion of tandem repeats, particularly of short tandem repeats, can cause a range of disorders (including Huntington disease, various ataxias, motor neuron disease, frontotemporal dementia, fragile X syndrome and other neurological disorders), and emerging data suggest that tandem repeat polymorphisms (TRPs) can also regulate gene expression in healthy individuals. TRPs in human genomes may also contribute to the missing heritability of polygenic disorders. A better understanding of tandem repeats and their associated repeatome, as well as their capacity for genetic plasticity via both germline and somatic mutations, is needed to transform our understanding of the role of TRPs in health and disease.
Collapse
Affiliation(s)
- Anthony J Hannan
- Florey Institute of Neuroscience and Mental Health, University of Melbourne.,Department of Anatomy and Neuroscience, University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
12
|
Zacharoff LA, Morrone DJ, Bond DR. Geobacter sulfurreducens Extracellular Multiheme Cytochrome PgcA Facilitates Respiration to Fe(III) Oxides But Not Electrodes. Front Microbiol 2017; 8:2481. [PMID: 29312190 PMCID: PMC5732950 DOI: 10.3389/fmicb.2017.02481] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2017] [Accepted: 11/29/2017] [Indexed: 11/13/2022] Open
Abstract
Extracellular cytochromes are hypothesized to facilitate the final steps of electron transfer between the outer membrane of the metal-reducing bacterium Geobacter sulfurreducens and solid-phase electron acceptors such as metal oxides and electrode surfaces during the course of respiration. The triheme c-type cytochrome PgcA exists in the extracellular space of G. sulfurreducens, and is one of many multiheme c-type cytochromes known to be loosely bound to the bacterial outer surface. Deletion of pgcA using a markerless method resulted in mutants unable to transfer electrons to Fe(III) and Mn(IV) oxides; yet the same mutants maintained the ability to respire to electrode surfaces and soluble Fe(III) citrate. When expressed and purified from Shewanella oneidensis, PgcA demonstrated a primarily alpha helical structure, three bound hemes, and was processed into a shorter 41 kDa form lacking the lipodomain. Purified PgcA bound Fe(III) oxides, but not magnetite, and when PgcA was added to cell suspensions of G. sulfurreducens, PgcA accelerated Fe(III) reduction similar to addition of FMN. Addition of soluble PgcA to ΔpgcA mutants also restored Fe(III) reduction. This report highlights a distinction between proteins involved in extracellular electron transfer to metal oxides and poised electrodes, and suggests a specific role for PgcA in facilitating electron transfer at mineral surfaces.
Collapse
Affiliation(s)
- Lori A Zacharoff
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Minneapolis, MN, United States
| | - Dana J Morrone
- St. Louis College of Pharmacy, St. Louis, MO, United States
| | - Daniel R Bond
- Department of Plant and Microbial Biology, University of Minnesota, Minneapolis, MN, United States.,BioTechnology Institute, University of Minnesota, Minneapolis, MN, United States
| |
Collapse
|
13
|
Wang Y, Geng H, Dang X, Xiang H, Li T, Pan G, Zhou Z. Comparative Analysis of the Proteins with Tandem Repeats from 8 Microsporidia and Characterization of a Novel Endospore Wall Protein Colocalizing with Polar Tube from Nosema bombycis. J Eukaryot Microbiol 2017; 64:707-715. [PMID: 28321967 DOI: 10.1111/jeu.12412] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Revised: 03/09/2017] [Accepted: 03/09/2017] [Indexed: 11/27/2022]
Abstract
As a common feature of eukaryotic proteins, tandem amino acid repeat has been studied extensively in both animal and plant proteins. Here, a comparative analysis focusing on the proteins having tandem repeats was conducted in eight microsporidia, including four mammal-infecting microsporidia (Encephalitozoon cuniculi, Encephalitozoon intestinalis, Encephalitozoon hellem and Encephalitozoon bieneusi) and four insect-infecting microsporidia (Nosema apis, Nosema ceranae, Vavraia culicis and Nosema bombycis). We found that the proteins with tandem repeats were abundant in these species. The quantity of these proteins in insect-infecting microsporidia was larger than that of mammal-infecting microsporidia. Additionally, the hydrophilic residues were overrepresented in the tandem repeats of these eight microsporidian proteins and the amino acids residues in these tandem repeat sequences tend to be encoded by GC-rich codons. The tandem repeat position within proteins of insect-infecting microsporidia was randomly distributed, whereas the tandem repeats within proteins of mammal-infecting microsporidia rarely tend to be present in the N terminal regions, when compared with those present in the C terminal and middle regions. Finally, a hypothetical protein EOB14572 possessing four tandem repeats was successfully characterized as a novel endospore wall protein, which colocalized with polar tube of N. bombycis. Our study provided useful insight for the study of the proteins with tandem repeats in N. bombycis, but also further enriched the spore wall components of this obligate unicellular eukaryotic parasite.
Collapse
Affiliation(s)
- Ying Wang
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, 400716, China
| | - Huixia Geng
- School of Mathematics and Finance, Chongqing University of Arts and Sciences, Chongqing, 402160, China
| | - Xiaoqun Dang
- Laboratory of Animal Biology, Chongqing Normal University, Chongqing, 400047, China
| | - Heng Xiang
- College of Animal Science and Technology, Southwest University, Chongqing, 400716, China
| | - Tian Li
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, 400716, China
| | - Guoqing Pan
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, 400716, China
| | - Zeyang Zhou
- State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, 400716, China.,Laboratory of Animal Biology, Chongqing Normal University, Chongqing, 400047, China
| |
Collapse
|
14
|
Exploring the dark foldable proteome by considering hydrophobic amino acids topology. Sci Rep 2017; 7:41425. [PMID: 28134276 PMCID: PMC5278394 DOI: 10.1038/srep41425] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 12/19/2016] [Indexed: 12/18/2022] Open
Abstract
The protein universe corresponds to the set of all proteins found in all organisms. A way to explore it is by taking into account the domain content of the proteins. However, some part of sequences and many entire sequences remain un-annotated despite a converging number of domain families. The un-annotated part of the protein universe is referred to as the dark proteome and remains poorly characterized. In this study, we quantify the amount of foldable domains within the dark proteome by using the hydrophobic cluster analysis methodology. These un-annotated foldable domains were grouped using a combination of remote homology searches and domain annotations, leading to define different levels of darkness. The dark foldable domains were analyzed to understand what make them different from domains stored in databases and thus difficult to annotate. The un-annotated domains of the dark proteome universe display specific features relative to database domains: shorter length, non-canonical content and particular topology in hydrophobic residues, higher propensity for disorder, and a higher energy. These features make them hard to relate to known families. Based on these observations, we emphasize that domain annotation methodologies can still be improved to fully apprehend and decipher the molecular evolution of the protein universe.
Collapse
|
15
|
Paladin L, Hirsh L, Piovesan D, Andrade-Navarro MA, Kajava AV, Tosatto SCE. RepeatsDB 2.0: improved annotation, classification, search and visualization of repeat protein structures. Nucleic Acids Res 2016; 45:D308-D312. [PMID: 27899671 PMCID: PMC5210593 DOI: 10.1093/nar/gkw1136] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Revised: 10/20/2016] [Accepted: 10/31/2016] [Indexed: 12/19/2022] Open
Abstract
RepeatsDB 2.0 (URL: http://repeatsdb.bio.unipd.it/) is an update of the database of annotated tandem repeat protein structures. Repeat proteins are a widespread class of non-globular proteins carrying heterogeneous functions involved in several diseases. Here we provide a new version of RepeatsDB with an improved classification schema including high quality annotations for ∼5400 protein structures. RepeatsDB 2.0 features information on start and end positions for the repeat regions and units for all entries. The extensive growth of repeat unit characterization was possible by applying the novel ReUPred annotation method over the entire Protein Data Bank, with data quality is guaranteed by an extensive manual validation for >60% of the entries. The updated web interface includes a new search engine for complex queries and a fully re-designed entry page for a better overview of structural data. It is now possible to compare unit positions, together with secondary structure, fold information and Pfam domains. Moreover, a new classification level has been introduced on top of the existing scheme as an independent layer for sequence similarity relationships at 40%, 60% and 90% identity.
Collapse
Affiliation(s)
- Lisanna Paladin
- Dept. of Biomedical Sciences, University of Padua, 35121 Padova, Italy
| | - Layla Hirsh
- Dept. of Biomedical Sciences, University of Padua, 35121 Padova, Italy.,Departamento de Ingeniería, Pontificia Universidad Católica del Perú, 32 Lima, Perú
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padua, 35121 Padova, Italy
| | - Miguel A Andrade-Navarro
- Institute of Molecular Biology, Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | - Andrey V Kajava
- Centre de Recherches de Biochimie Macromoléculaire, CNRS, Université Montpellier, 34293 Montpellier, France.,Institut de Biologie Computationnelle (IBC), 34293 Montpellier, France.,Institute of Bioengineering, University ITMO, 197101 St. Petersburg, Russia
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padua, 35121 Padova, Italy .,CNR Institute of Neuroscience, 35121 Padova, Italy
| |
Collapse
|
16
|
Persi E, Wolf YI, Koonin EV. Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins. Nat Commun 2016; 7:13570. [PMID: 27857066 PMCID: PMC5120217 DOI: 10.1038/ncomms13570] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2016] [Accepted: 10/17/2016] [Indexed: 01/21/2023] Open
Abstract
Protein repeats are considered hotspots of protein evolution, associated with acquisition of new functions and novel phenotypic traits, including disease. Paradoxically, however, repeats are often strongly conserved through long spans of evolution. To resolve this conundrum, it is necessary to directly compare paralogous (horizontal) evolution of repeats within proteins with their orthologous (vertical) evolution through speciation. Here we develop a rigorous methodology to identify highly periodic repeats with significant sequence similarity, for which evolutionary rates and selection (dN/dS) can be estimated, and systematically characterize their evolution. We show that horizontal evolution of repeats is markedly accelerated compared with their divergence from orthologues in closely related species. This observation is universal across the diversity of life forms and implies a biphasic evolutionary regime whereby new copies experience rapid functional divergence under combined effects of strongly relaxed purifying selection and positive selection, followed by fixation and conservation of each individual repeat.
Collapse
Affiliation(s)
- Erez Persi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
17
|
Schaeffer RD, Kinch LN, Liao Y, Grishin NV. Classification of proteins with shared motifs and internal repeats in the ECOD database. Protein Sci 2016; 25:1188-203. [PMID: 26833690 DOI: 10.1002/pro.2893] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Revised: 01/23/2016] [Accepted: 01/27/2016] [Indexed: 12/19/2022]
Abstract
Proteins and their domains evolve by a set of events commonly including the duplication and divergence of small motifs. The presence of short repetitive regions in domains has generally constituted a difficult case for structural domain classifications and their hierarchies. We developed the Evolutionary Classification Of protein Domains (ECOD) in part to implement a new schema for the classification of these types of proteins. Here we document the ways in which ECOD classifies proteins with small internal repeats, widespread functional motifs, and assemblies of small domain-like fragments in its evolutionary schema. We illustrate the ways in which the structural genomics project impacted the classification and characterization of new structural domains and sequence families over the decade.
Collapse
Affiliation(s)
- R Dustin Schaeffer
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, 75390-9050
| | - Lisa N Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, 75390-9050
| | - Yuxing Liao
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, 75390-9050
| | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, 75390-9050.,Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, 75390-9050
| |
Collapse
|