1
|
Mac Donagh J, Marchesini A, Spiga A, Fallico MJ, Arrías PN, Monzon AM, Vagiona AC, Gonçalves-Kulik M, Mier P, Andrade-Navarro MA. Structured Tandem Repeats in Protein Interactions. Int J Mol Sci 2024; 25:2994. [PMID: 38474241 DOI: 10.3390/ijms25052994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 02/28/2024] [Accepted: 03/01/2024] [Indexed: 03/14/2024] Open
Abstract
Tandem repeats (TRs) in protein sequences are consecutive, highly similar sequence motifs. Some types of TRs fold into structural units that pack together in ensembles, forming either an (open) elongated domain or a (closed) propeller, where the last unit of the ensemble packs against the first one. Here, we examine TR proteins (TRPs) to see how their sequence, structure, and evolutionary properties favor them for a function as mediators of protein interactions. Our observations suggest that TRPs bind other proteins using large, structured surfaces like globular domains; in particular, open-structured TR ensembles are favored by flexible termini and the possibility to tightly coil against their targets. While, intuitively, open ensembles of TRs seem prone to evolve due to their potential to accommodate insertions and deletions of units, these evolutionary events are unexpectedly rare, suggesting that they are advantageous for the emergence of the ancestral sequence but are early fixed. We hypothesize that their flexibility makes it easier for further proteins to adapt to interact with them, which would explain their large number of protein interactions. We provide insight into the properties of open TR ensembles, which make them scaffolds for alternative protein complexes to organize genes, RNA and proteins.
Collapse
Affiliation(s)
- Juan Mac Donagh
- Science and Technology Department, National University of Quilmes, Bernal B1876, Argentina
- National Scientific and Technical Research Council (CONICET), Buenos Aires C1033AAJ, Argentina
| | - Abril Marchesini
- National Scientific and Technical Research Council (CONICET), Buenos Aires C1033AAJ, Argentina
- Biotechnology and Molecular Biology Institute (IBBM, UNLP-CONICET), Faculty of Exact Sciences, University of La Plata, La Plata 1900, Argentina
| | - Agostina Spiga
- Science and Technology Department, National University of Quilmes, Bernal B1876, Argentina
- National Scientific and Technical Research Council (CONICET), Buenos Aires C1033AAJ, Argentina
| | - Maximiliano José Fallico
- Laboratory of Bioactive Compound Research and Development, Faculty of Exact Sciences, University of La Plata, La Plata 1900, Argentina
| | - Paula Nazarena Arrías
- Department of Biomedical Sciences, University of Padova, Via U. Bassi 58/b, 35121 Padova, Italy
| | - Alexander Miguel Monzon
- Department of Information Engineering, University of Padova, Via Giovanni Gradenigo 6/B, 35131 Padova, Italy
| | - Aimilia-Christina Vagiona
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Mariane Gonçalves-Kulik
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| |
Collapse
|
2
|
Monzon AM, Arrías PN, Elofsson A, Mier P, Andrade-Navarro MA, Bevilacqua M, Clementel D, Bateman A, Hirsh L, Fornasari MS, Parisi G, Piovesan D, Kajava AV, Tosatto SCE. A STRP-ed definition of Structured Tandem Repeats in Proteins. J Struct Biol 2023; 215:108023. [PMID: 37652396 DOI: 10.1016/j.jsb.2023.108023] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 07/31/2023] [Accepted: 08/28/2023] [Indexed: 09/02/2023]
Abstract
Tandem Repeat Proteins (TRPs) are a class of proteins with repetitive amino acid sequences that have been studied extensively for over two decades. Different features at the level of sequence, structure, function and evolution have been attributed to them by various authors. And yet many of its salient features appear only when looking at specific subclasses of protein tandem repeats. Here, we attempt to rationalize the existing knowledge on Tandem Repeat Proteins (TRPs) by pointing out several dichotomies. The emerging picture is more nuanced than generally assumed and allows us to draw some boundaries of what is not a "proper" TRP. We conclude with an operational definition of a specific subset, which we have denominated STRPs (Structural Tandem Repeat Proteins), which separates a subclass of tandem repeats with distinctive features from several other less well-defined types of repeats. We believe that this definition will help researchers in the field to better characterize the biological meaning of this large yet largely understudied group of proteins.
Collapse
Affiliation(s)
- Alexander Miguel Monzon
- Dept. of Information Engineering, University of Padova, via Giovanni Gradenigo 6/B, 35131 Padova, Italy
| | - Paula Nazarena Arrías
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Arne Elofsson
- Dept. of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Tomtebodavägen 23, 171 21 Solna, Sweden
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Martina Bevilacqua
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Damiano Clementel
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Layla Hirsh
- Dept. of Engineering, Faculty of Science and Engineering, Pontifical Catholic University of Peru, Av. Universitaria 1801 San Miguel, Lima 32, Lima, Peru
| | - Maria Silvina Fornasari
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, 34293 Montpellier, France
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy.
| |
Collapse
|
3
|
Deryusheva EI, Machulin AV, Galzitskaya OV. Diversity and features of proteins with structural repeats. Biophys Rev 2023; 15:1159-1169. [PMID: 37974986 PMCID: PMC10643770 DOI: 10.1007/s12551-023-01130-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 08/28/2023] [Indexed: 11/19/2023] Open
Abstract
The review provides information on proteins with structural repeats, including their classification, characteristics, functions, and relevance in disease development. It explores methods for identifying structural repeats and specialized databases. The review also highlights the potential use of repeat proteins as drug design scaffolds and discusses their evolutionary mechanisms.
Collapse
Affiliation(s)
- Evgeniya I. Deryusheva
- Institute for Biological Instrumentation, Federal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences”, Pushchino, Russia
| | - Andrey V. Machulin
- Skryabin Institute of Biochemistry and Physiology of Microorganisms, Federal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences”, Pushchino, Russia
| | - Oxana V. Galzitskaya
- Institute of Protein Research of the Russian Academy of Sciences, Pushchino, Russia
- Institute of Theoretical and Experimental Biophysics of the Russian Academy of Sciences, Pushchino, Russia
| |
Collapse
|
4
|
Apsley AT, Domico ER, Verbiest MA, Brogan CA, Buck ER, Burich AJ, Cardone KM, Stone WJ, Anisimova M, Vandenbergh DJ. A novel hypervariable variable number tandem repeat in the dopamine transporter gene ( SLC6A3). Life Sci Alliance 2023; 6:e202201677. [PMID: 36754567 PMCID: PMC9909461 DOI: 10.26508/lsa.202201677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 01/25/2023] [Accepted: 01/26/2023] [Indexed: 02/10/2023] Open
Abstract
The dopamine transporter gene, SLC6A3, has received substantial attention in genetic association studies of various phenotypes. Although some variable number tandem repeats (VNTRs) present in SLC6A3 have been tested in genetic association studies, results have not been consistent. VNTRs in SLC6A3 that have not been examined genetically were characterized. The Tandem Repeat Annotation Library was used to characterize the VNTRs of 64 unrelated long-read haplotype-phased SLC6A3 sequences. Sequence similarity of each repeat unit of the five VNTRs is reported, along with the correlations of SNP-SNP, SNP-VNTR, and VNTR-VNTR alleles across the gene. One of these VNTRs is a novel hyper-VNTR (hyVNTR) in intron 8 of SLC6A3, which contains a range of 3.4-133.4 repeat copies and has a consensus sequence length of 38 bp, with 82% G+C content. The 38-base repeat was predicted to form G-quadruplexes in silico and was confirmed by circular dichroism spectroscopy. In addition, this hyVNTR contains multiple putative binding sites for PRDM9, which, in combination with low levels of linkage disequilibrium around the hyVNTR, suggests it might be a recombination hotspot.
Collapse
Affiliation(s)
- Abner T Apsley
- Department of Biobehavioral Health, The Pennsylvania State University, State College, PA, USA
- The Molecular, Cellular and Integrative Biosciences Program, The Pennsylvania State University, State College, PA, USA
| | - Emma R Domico
- Department of Biobehavioral Health, The Pennsylvania State University, State College, PA, USA
| | - Max A Verbiest
- Institute of Computational Life Science, School of Life Sciences and Facility Management, Zürich University of Applied Sciences, Wädenswil, Switzerland
- Department of Molecular Life Sciences, Faculty of Science, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Carly A Brogan
- Department of Biobehavioral Health, The Pennsylvania State University, State College, PA, USA
| | - Evan R Buck
- Department of Biobehavioral Health, The Pennsylvania State University, State College, PA, USA
| | - Andrew J Burich
- Department of Information Science and Technologies - Applied Data Sciences, The Pennsylvania State University, State College, PA, USA
| | - Kathleen M Cardone
- Department of Biobehavioral Health, The Pennsylvania State University, State College, PA, USA
| | - Wesley J Stone
- Department of Biobehavioral Health, The Pennsylvania State University, State College, PA, USA
| | - Maria Anisimova
- Institute of Computational Life Science, School of Life Sciences and Facility Management, Zürich University of Applied Sciences, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - David J Vandenbergh
- Department of Biobehavioral Health, The Pennsylvania State University, State College, PA, USA
- The Molecular, Cellular and Integrative Biosciences Program, The Pennsylvania State University, State College, PA, USA
- Institute of the Neurosciences, The Pennsylvania State University, State College, PA, USA
- The Bioinformatics and Genomics Program, The Pennsylvania State University, State College, PA, USA
| |
Collapse
|
5
|
Pajic P, Shen S, Qu J, May AJ, Knox S, Ruhl S, Gokcumen O. A mechanism of gene evolution generating mucin function. SCIENCE ADVANCES 2022; 8:eabm8757. [PMID: 36026444 PMCID: PMC9417175 DOI: 10.1126/sciadv.abm8757] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Accepted: 07/12/2022] [Indexed: 05/12/2023]
Abstract
How novel gene functions evolve is a fundamental question in biology. Mucin proteins, a functionally but not evolutionarily defined group of proteins, allow the study of convergent evolution of gene function. By analyzing the genomic variation of mucins across a wide range of mammalian genomes, we propose that exonic repeats and their copy number variation contribute substantially to the de novo evolution of new gene functions. By integrating bioinformatic, phylogenetic, proteomic, and immunohistochemical approaches, we identified 15 undescribed instances of evolutionary convergence, where novel mucins originated by gaining densely O-glycosylated exonic repeat domains. Our results suggest that secreted proteins rich in proline are natural precursors for acquiring mucin function. Our findings have broad implications for understanding the role of exonic repeats in the parallel evolution of new gene functions, especially those involving protein glycosylation.
Collapse
Affiliation(s)
- Petar Pajic
- Department of Biological Sciences, University at Buffalo, The State University of New York, Buffalo, NY 14260, USA
- Department of Oral Biology, School of Dental Medicine, University at Buffalo, The State University of New York, Buffalo, NY 14214, USA
| | - Shichen Shen
- Department of Pharmaceutical Sciences, University at Buffalo, The State University of New York, Buffalo, NY 14214, USA
- Center of Excellence in Bioinformatics and Life Science, Buffalo, NY 14203, USA
| | - Jun Qu
- Department of Pharmaceutical Sciences, University at Buffalo, The State University of New York, Buffalo, NY 14214, USA
- Center of Excellence in Bioinformatics and Life Science, Buffalo, NY 14203, USA
| | - Alison J. May
- Program in Craniofacial Biology, Department of Cell and Tissue Biology, School of Dentistry, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Sarah Knox
- Program in Craniofacial Biology, Department of Cell and Tissue Biology, School of Dentistry, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Stefan Ruhl
- Department of Oral Biology, School of Dental Medicine, University at Buffalo, The State University of New York, Buffalo, NY 14214, USA
| | - Omer Gokcumen
- Department of Biological Sciences, University at Buffalo, The State University of New York, Buffalo, NY 14260, USA
| |
Collapse
|
6
|
Deryusheva EI, Machulin AV, Galzitskaya OV. Structural, Functional, and Evolutionary Characteristics of Proteins with Repeats. Mol Biol 2021. [DOI: 10.1134/s0026893321040038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
7
|
Torres AG, Rodríguez-Escribà M, Marcet-Houben M, Santos Vieira H, Camacho N, Catena H, Murillo Recio M, Rafels-Ybern À, Reina O, Torres F, Pardo-Saganta A, Gabaldón T, Novoa E, Ribas de Pouplana L. Human tRNAs with inosine 34 are essential to efficiently translate eukarya-specific low-complexity proteins. Nucleic Acids Res 2021; 49:7011-7034. [PMID: 34125917 PMCID: PMC8266599 DOI: 10.1093/nar/gkab461] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 05/07/2021] [Accepted: 05/18/2021] [Indexed: 12/11/2022] Open
Abstract
The modification of adenosine to inosine at the wobble position (I34) of tRNA anticodons is an abundant and essential feature of eukaryotic tRNAs. The expansion of inosine-containing tRNAs in eukaryotes followed the transformation of the homodimeric bacterial enzyme TadA, which generates I34 in tRNAArg and tRNALeu, into the heterodimeric eukaryotic enzyme ADAT, which modifies up to eight different tRNAs. The emergence of ADAT and its larger set of substrates, strongly influenced the tRNA composition and codon usage of eukaryotic genomes. However, the selective advantages that drove the expansion of I34-tRNAs remain unknown. Here we investigate the functional relevance of I34-tRNAs in human cells and show that a full complement of these tRNAs is necessary for the translation of low-complexity protein domains enriched in amino acids cognate for I34-tRNAs. The coding sequences for these domains require codons translated by I34-tRNAs, in detriment of synonymous codons that use other tRNAs. I34-tRNA-dependent low-complexity proteins are enriched in functional categories related to cell adhesion, and depletion in I34-tRNAs leads to cellular phenotypes consistent with these roles. We show that the distribution of these low-complexity proteins mirrors the distribution of I34-tRNAs in the phylogenetic tree.
Collapse
Affiliation(s)
- Adrian Gabriel Torres
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
| | - Marta Rodríguez-Escribà
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
| | - Marina Marcet-Houben
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
- Barcelona Supercomputing Centre (BSC-CNS), Barcelona, Catalonia 08034, Spain
| | | | - Noelia Camacho
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
| | - Helena Catena
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
| | - Marina Murillo Recio
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
| | - Àlbert Rafels-Ybern
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
| | - Oscar Reina
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
| | - Francisco Miguel Torres
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
| | - Ana Pardo-Saganta
- Centre for Applied Medical Research (CIMA Universidad de Navarra), Pamplona 31008, Spain
| | - Toni Gabaldón
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
- Barcelona Supercomputing Centre (BSC-CNS), Barcelona, Catalonia 08034, Spain
- Catalan Institution for Research and Advanced Studies, Barcelona, Catalonia 08010, Spain
| | - Eva Maria Novoa
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08003, Spain
- University Pompeu Fabra, Barcelona, Catalonia 08003, Spain
| | - Lluís Ribas de Pouplana
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
- Catalan Institution for Research and Advanced Studies, Barcelona, Catalonia 08010, Spain
| |
Collapse
|
8
|
Izert MA, Szybowska PE, Górna MW, Merski M. The Effect of Mutations in the TPR and Ankyrin Families of Alpha Solenoid Repeat Proteins. FRONTIERS IN BIOINFORMATICS 2021; 1:696368. [PMID: 36303725 PMCID: PMC9581033 DOI: 10.3389/fbinf.2021.696368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 06/22/2021] [Indexed: 11/20/2022] Open
Abstract
Protein repeats are short, highly similar peptide motifs that occur several times within a single protein, for example the TPR and Ankyrin repeats. Understanding the role of mutation in these proteins is complicated by the competing facts that 1) the repeats are much more restricted to a set sequence than non-repeat proteins, so mutations should be harmful much more often because there are more residues that are heavily restricted due to the need of the sequence to repeat and 2) the symmetry of the repeats in allows the distribution of functional contributions over a number of residues so that sometimes no specific site is singularly responsible for function (unlike enzymatic active site catalytic residues). To address this issue, we review the effects of mutations in a number of natural repeat proteins from the tetratricopeptide and Ankyrin repeat families. We find that mutations are context dependent. Some mutations are indeed highly disruptive to the function of the protein repeats while mutations in identical positions in other repeats in the same protein have little to no effect on structure or function.
Collapse
Affiliation(s)
| | | | | | - Matthew Merski
- *Correspondence: Maria Wiktoria Górna, ; Matthew Merski,
| |
Collapse
|
9
|
Calatayud S, Garcia-Risco M, Capdevila M, Cañestro C, Palacios Ò, Albalat R. Modular Evolution and Population Variability of Oikopleura dioica Metallothioneins. Front Cell Dev Biol 2021; 9:702688. [PMID: 34277643 PMCID: PMC8283569 DOI: 10.3389/fcell.2021.702688] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 06/09/2021] [Indexed: 01/29/2023] Open
Abstract
Chordate Oikopleura dioica probably is the fastest evolving metazoan reported so far, and thereby, a suitable system in which to explore the limits of evolutionary processes. For this reason, and in order to gain new insights on the evolution of protein modularity, we have investigated the organization, function and evolution of multi-modular metallothionein (MT) proteins in O. dioica. MTs are a heterogeneous group of modular proteins defined by their cysteine (C)-rich domains, which confer the capacity of coordinating different transition metal ions. O. dioica has two MTs, a bi-modular OdiMT1 consisting of two domains (t-12C and 12C), and a multi-modular OdiMT2 with six t-12C/12C repeats. By means of mass spectrometry and spectroscopy of metal-protein complexes, we have shown that the 12C domain is able to autonomously bind four divalent metal ions, although the t-12C/12C pair –as it is found in OdiMT1– is the optimized unit for divalent metal binding. We have also shown a direct relationship between the number of the t-12C/12C repeats and the metal-binding capacity of the MTs, which means a stepwise mode of functional and structural evolution for OdiMT2. Finally, after analyzing four different O. dioica populations worldwide distributed, we have detected several OdiMT2 variants with changes in their number of t-12C/12C domain repeats. This finding reveals that the number of repeats fluctuates between current O. dioica populations, which provides a new perspective on the evolution of domain repeat proteins.
Collapse
Affiliation(s)
- Sara Calatayud
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Mario Garcia-Risco
- Departament de Química, Facultat de Ciències, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | - Mercè Capdevila
- Departament de Química, Facultat de Ciències, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | - Cristian Cañestro
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Òscar Palacios
- Departament de Química, Facultat de Ciències, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | - Ricard Albalat
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
10
|
Delucchi M, Näf P, Bliven S, Anisimova M. TRAL 2.0: Tandem Repeat Detection With Circular Profile Hidden Markov Models and Evolutionary Aligner. FRONTIERS IN BIOINFORMATICS 2021; 1:691865. [PMID: 36303789 PMCID: PMC9581039 DOI: 10.3389/fbinf.2021.691865] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 06/11/2021] [Indexed: 11/13/2022] Open
Abstract
The Tandem Repeat Annotation Library (TRAL) focuses on analyzing tandem repeat units in genomic sequences. TRAL can integrate and harmonize tandem repeat annotations from a large number of external tools, and provides a statistical model for evaluating and filtering the detected repeats. TRAL version 2.0 includes new features such as a module for identifying repeats from circular profile hidden Markov models, a new repeat alignment method based on the progressive Poisson Indel Process, an improved installation procedure and a docker container. TRAL is an open-source Python 3 library and is available, together with documentation and tutorials viavital-it.ch/software/tral.
Collapse
Affiliation(s)
- Matteo Delucchi
- Institute of Applied Simulations, School of Life Sciences und Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Paulina Näf
- Institute of Applied Simulations, School of Life Sciences und Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Spencer Bliven
- Institute of Applied Simulations, School of Life Sciences und Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Laboratory for Scientific Computing and Modelling, Paul Scherrer Institute, Villigen PSI, Villigen, Switzerland
| | - Maria Anisimova
- Institute of Applied Simulations, School of Life Sciences und Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- *Correspondence: Maria Anisimova,
| |
Collapse
|
11
|
Verbiest MA, Delucchi M, Bilgin Sonay T, Anisimova M. Beyond Microsatellite Instability: Intrinsic Disorder as a Potential Link Between Protein Short Tandem Repeats and Cancer. FRONTIERS IN BIOINFORMATICS 2021; 1:685844. [DOI: 10.3389/fbinf.2021.685844] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 05/21/2021] [Indexed: 12/28/2022] Open
Abstract
Short tandem repeats (STRs) are abundant in genomic sequences and are known for comparatively high mutation rates; STRs therefore are thought to be a potent source of genetic diversity. In protein-coding sequences STRs primarily encode disorder-promoting amino acids and are often located in intrinsically disordered regions (IDRs). STRs are frequently studied in the scope of microsatellite instability (MSI) in cancer, with little focus on the connection between protein STRs and IDRs. We believe, however, that this relationship should be explicitly included when ascertaining STR functionality in cancer. Here we explore this notion using all canonical human proteins from SwissProt, wherein we detected 3,699 STRs. Over 80% of these consisted completely of disorder promoting amino acids. 62.1% of amino acids in STR sequences were predicted to also be in an IDR, compared to 14.2% for non-repeat sequences. Over-representation analysis showed STR-containing proteins to be primarily located in the nucleus where they perform protein- and nucleotide-binding functions and regulate gene expression. They were also enriched in cancer-related signaling pathways. Furthermore, we found enrichments of STR-containing proteins among those correlated with patient survival for cancers derived from eight different anatomical sites. Intriguingly, several of these cancer types are not known to have a MSI-high (MSI-H) phenotype, suggesting that protein STRs play a role in cancer pathology in non MSI-H settings. Their intrinsic link with IDRs could therefore be an attractive topic of future research to further explore the role of STRs and IDRs in cancer. We speculate that our observations may be linked to the known dosage-sensitivity of disordered proteins, which could hint at a concentration-dependent gain-of-function mechanism in cancer for proteins containing STRs and IDRs.
Collapse
|
12
|
Homopeptide and homocodon levels across fungi are coupled to GC/AT-bias and intrinsic disorder, with unique behaviours for some amino acids. Sci Rep 2021; 11:10025. [PMID: 33976321 PMCID: PMC8113271 DOI: 10.1038/s41598-021-89650-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 04/22/2021] [Indexed: 11/09/2022] Open
Abstract
Homopeptides (runs of one amino-acid type) are evolutionarily important since they are prone to expand/contract during DNA replication, recombination and repair. To gain insight into the genomic/proteomic traits driving their variation, we analyzed how homopeptides and homocodons (which are pure codon repeats) vary across 405 Dikarya, and probed their linkage to genome GC/AT bias and other factors. We find that amino-acid homopeptide frequencies vary diversely between clades, with the AT-rich Saccharomycotina trending distinctly. As organisms evolve, homocodon and homopeptide numbers are majorly coupled to GC/AT-bias, exhibiting a bi-furcated correlation with degree of AT- or GC-bias. Mid-GC/AT genomes tend to have markedly fewer simply because they are mid-GC/AT. Despite these trends, homopeptides tend to be GC-biased relative to other parts of coding sequences, even in AT-rich organisms, indicating they absorb AT bias less or are inherently more GC-rich. The most frequent and most variable homopeptide amino acids favour intrinsic disorder, and there are an opposing correlation and anti-correlation versus homopeptide levels for intrinsic disorder and structured-domain content respectively. Specific homopeptides show unique behaviours that we suggest are linked to inherent slippage probabilities during DNA replication and recombination, such as poly-glutamine, which is an evolutionarily very variable homopeptide with a codon repertoire unbiased for GC/AT, and poly-lysine whose homocodons are overwhelmingly made from the codon AAG.
Collapse
|
13
|
Accurate contact-based modelling of repeat proteins predicts the structure of new repeats protein families. PLoS Comput Biol 2021; 17:e1008798. [PMID: 33857128 PMCID: PMC8078820 DOI: 10.1371/journal.pcbi.1008798] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 04/27/2021] [Accepted: 02/15/2021] [Indexed: 12/18/2022] Open
Abstract
Repeat proteins are abundant in eukaryotic proteomes. They are involved in many eukaryotic specific functions, including signalling. For many of these proteins, the structure is not known, as they are difficult to crystallise. Today, using direct coupling analysis and deep learning it is often possible to predict a protein’s structure. However, the unique sequence features present in repeat proteins have been a challenge to use direct coupling analysis for predicting contacts. Here, we show that deep learning-based methods (trRosetta, DeepMetaPsicov (DMP) and PconsC4) overcomes this problem and can predict intra- and inter-unit contacts in repeat proteins. In a benchmark dataset of 815 repeat proteins, about 90% can be correctly modelled. Further, among 48 PFAM families lacking a protein structure, we produce models of forty-one families with estimated high accuracy. Repeat proteins are widespread among organisms and particularly abundant in eukaryotic proteomes. Their primary sequence presents repetition in the amino acid sequences that origin structures with repeated folds/domains. Although the repeated units often can be recognised from the sequence alone, often structural information is missing. Here, we used contact prediction for predicting the structure of repeats protein directly from their primary sequences. We benchmark the methods on a dataset comprehensive of all the known repeated structures. We evaluate the contact predictions and the obtained models for different classes of repeat proteins. Further, we develop and benchmark a quality assessment (QA) method specific for repeat proteins. Finally, we used the prediction pipeline for all PFAM repeat families without resolved structures and found that forty-one of them could be modelled with high accuracy.
Collapse
|
14
|
Persi E, Wolf YI, Horn D, Ruppin E, Demichelis F, Gatenby RA, Gillies RJ, Koonin EV. Mutation-selection balance and compensatory mechanisms in tumour evolution. Nat Rev Genet 2020; 22:251-262. [PMID: 33257848 DOI: 10.1038/s41576-020-00299-4] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/16/2020] [Indexed: 12/11/2022]
Abstract
Intratumour heterogeneity and phenotypic plasticity, sustained by a range of somatic aberrations, as well as epigenetic and metabolic adaptations, are the principal mechanisms that enable cancers to resist treatment and survive under environmental stress. A comprehensive picture of the interplay between different somatic aberrations, from point mutations to whole-genome duplications, in tumour initiation and progression is lacking. We posit that different genomic aberrations generally exhibit a temporal order, shaped by a balance between the levels of mutations and selective pressures. Repeat instability emerges first, followed by larger aberrations, with compensatory effects leading to robust tumour fitness maintained throughout the tumour progression. A better understanding of the interplay between genetic aberrations, the microenvironment, and epigenetic and metabolic cellular states is essential for early detection and prevention of cancer as well as development of efficient therapeutic strategies.
Collapse
Affiliation(s)
- Erez Persi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - David Horn
- School of Physics and Astronomy, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel-Aviv University, Tel-Aviv, Israel
| | - Eytan Ruppin
- Cancer Data Science Lab, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Francesca Demichelis
- Department for Cellular, Computational and Integrative Biology, University of Trento, Trento, Italy.,Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital, Weill Cornell Medicine, New York, NY, USA
| | - Robert A Gatenby
- Integrated Mathematical Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA
| | - Robert J Gillies
- Department of Cancer Physiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA.
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
15
|
Balzano E, Pelliccia F, Giunta S. Genome (in)stability at tandem repeats. Semin Cell Dev Biol 2020; 113:97-112. [PMID: 33109442 DOI: 10.1016/j.semcdb.2020.10.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 09/26/2020] [Accepted: 10/10/2020] [Indexed: 12/12/2022]
Abstract
Repeat sequences account for over half of the human genome and represent a significant source of variation that underlies physiological and pathological states. Yet, their study has been hindered due to limitations in short-reads sequencing technology and difficulties in assembly. A important category of repetitive DNA in the human genome is comprised of tandem repeats (TRs), where repetitive units are arranged in a head-to-tail pattern. Compared to other regions of the genome, TRs carry between 10 and 10,000 fold higher mutation rate. There are several mutagenic mechanisms that can give rise to this propensity toward instability, but their precise contribution remains speculative. Given the high degree of homology between these sequences and their arrangement in tandem, once damaged, TRs have an intrinsic propensity to undergo aberrant recombination with non-allelic exchange and generate harmful rearrangements that may undermine the stability of the entire genome. The dynamic mutagenesis at TRs has been found to underlie individual polymorphism associated with neurodegenerative and neuromuscular disorders, as well as complex genetic diseases like cancer and diabetes. Here, we review our current understanding of the surveillance and repair mechanisms operating within these regions, and we describe how alterations in these protective processes can readily trigger mutational signatures found at TRs, ultimately resulting in the pathological correlation between TRs instability and human diseases. Finally, we provide a viewpoint to counter the detrimental effects that TRs pose in light of their selection and conservation, as important drivers of human evolution.
Collapse
Affiliation(s)
- Elisa Balzano
- Dipartimento di Biologia e Biotecnologie "Charles Darwin", Sapienza Università di Roma, 00185 Roma, Italy
| | - Franca Pelliccia
- Dipartimento di Biologia e Biotecnologie "Charles Darwin", Sapienza Università di Roma, 00185 Roma, Italy
| | - Simona Giunta
- The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA; Dipartimento di Biologia e Biotecnologie "Charles Darwin", Sapienza Università di Roma, 00185 Roma, Italy.
| |
Collapse
|
16
|
Paladin L, Necci M, Piovesan D, Mier P, Andrade-Navarro MA, Tosatto SCE. A novel approach to investigate the evolution of structured tandem repeat protein families by exon duplication. J Struct Biol 2020; 212:107608. [PMID: 32896658 DOI: 10.1016/j.jsb.2020.107608] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 08/19/2020] [Accepted: 08/21/2020] [Indexed: 11/30/2022]
Abstract
Tandem Repeat Proteins (TRPs) are ubiquitous in cells and are enriched in eukaryotes. They contributed to the evolution of organism complexity, specializing for functions that require quick adaptability such as immunity-related functions. To investigate the hypothesis of repeat protein evolution through exon duplication and rearrangement, we designed a tool to analyze the relationships between exon/intron patterns and structural symmetries. The tool allows comparison of the structure fragments as defined by exon/intron boundaries from Ensembl against the structural element repetitions from RepeatsDB. The all-against-all pairwise structural alignment between fragments and comparison of the two definitions (structural units and exons) are visualized in a single matrix, the "repeat/exon plot". An analysis of different repeat protein families, including the solenoids Leucine-Rich, Ankyrin, Pumilio, HEAT repeats and the β propellers Kelch-like, WD40 and RCC1, shows different behaviors, illustrated here through examples. For each example, the analysis of the exon mapping in homologous proteins supports the conservation of their exon patterns. We propose that when a clear-cut relationship between exon and structural boundaries can be identified, it is possible to infer a specific "evolutionary pattern" which may improve TRPs detection and classification.
Collapse
Affiliation(s)
| | - Marco Necci
- Dept. of Biomedical Sciences, University of Padova, Italy
| | | | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University of Mainz, Germany
| | | | | |
Collapse
|
17
|
Gloeckner CJ, Porras P. Guilt-by-Association - Functional Insights Gained From Studying the LRRK2 Interactome. Front Neurosci 2020; 14:485. [PMID: 32508578 PMCID: PMC7251075 DOI: 10.3389/fnins.2020.00485] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Accepted: 04/20/2020] [Indexed: 12/11/2022] Open
Abstract
The Parkinson's disease-associated Leucine-rich repeat kinase 2 (LRRK2) is a complex multi-domain protein belonging to the Roco protein family, a unique group of G-proteins. Variants of this gene are associated with an increased risk of Parkinson's disease. Besides its well-characterized enzymatic activities, conferred by its GTPase and kinase domains, and a central dimerization domain, it contains four predicted repeat domains, which are, based on their structure, commonly involved in protein-protein interactions (PPIs). In the past decades, tremendous progress has been made in determining comprehensive interactome maps for the human proteome. Knowledge of PPIs has been instrumental in assigning functions to proteins involved in human disease and helped to understand the connectivity between different disease pathways and also significantly contributed to the functional understanding of LRRK2. In addition to an increased kinase activity observed for proteins containing PD-associated variants, various studies helped to establish LRRK2 as a large scaffold protein in the interface between cytoskeletal dynamics and the vesicular transport. This review first discusses a number of specific LRRK2-associated PPIs for which a functional consequence can at least be speculated upon, and then considers the representation of LRRK2 protein interactions in public repositories, providing an outlook on open research questions and challenges in this field.
Collapse
Affiliation(s)
- Christian Johannes Gloeckner
- German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany
- Center for Ophthalmology, Institute for Ophthalmic Research, Core Facility for Medical Bioanalytics, University of Tübingen, Tübingen, Germany
- Hertie Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany
| | - Pablo Porras
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cherry Hinton, United Kingdom
| |
Collapse
|
18
|
Merski M, Młynarczyk K, Ludwiczak J, Skrzeczkowski J, Dunin-Horkawicz S, Górna MW. Self-analysis of repeat proteins reveals evolutionarily conserved patterns. BMC Bioinformatics 2020; 21:179. [PMID: 32381046 PMCID: PMC7204011 DOI: 10.1186/s12859-020-3493-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 04/15/2020] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Protein repeats can confound sequence analyses because the repetitiveness of their amino acid sequences lead to difficulties in identifying whether similar repeats are due to convergent or divergent evolution. We noted that the patterns derived from traditional "dot plot" protein sequence self-similarity analysis tended to be conserved in sets of related repeat proteins and this conservation could be quantitated using a Jaccard metric. RESULTS Comparison of these dot plots obviated the issues due to sequence similarity for analysis of repeat proteins. A high Jaccard similarity score was suggestive of a conserved relationship between closely related repeat proteins. The dot plot patterns decayed quickly in the absence of selective pressure with an expected loss of 50% of Jaccard similarity due to a loss of 8.2% sequence identity. To perform method testing, we assembled a standard set of 79 repeat proteins representing all the subgroups in RepeatsDB. Comparison of known repeat and non-repeat proteins from the PDB suggested that the information content in dot plots could be used to identify repeat proteins from pure sequence with no requirement for structural information. Analysis of the UniRef90 database suggested that 16.9% of all known proteins could be classified as repeat proteins. These 13.3 million putative repeat protein chains were clustered and a significant amount (82.9%) of clusters containing between 5 and 200 members were of a single functional type. CONCLUSIONS Dot plot analysis of repeat proteins attempts to obviate issues that arise due to the sequence degeneracy of repeat proteins. These results show that this kind of analysis can efficiently be applied to analyze repeat proteins on a large scale.
Collapse
Affiliation(s)
- Matthew Merski
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Krzysztof Młynarczyk
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Jan Ludwiczak
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
- Laboratory of Bioinformatics, Nencki Institute of Experimental Biology, Warsaw, Poland
| | - Jakub Skrzeczkowski
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Stanisław Dunin-Horkawicz
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Maria W. Górna
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| |
Collapse
|
19
|
A New Census of Protein Tandem Repeats and Their Relationship with Intrinsic Disorder. Genes (Basel) 2020; 11:genes11040407. [PMID: 32283633 PMCID: PMC7230257 DOI: 10.3390/genes11040407] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 03/29/2020] [Accepted: 04/01/2020] [Indexed: 12/31/2022] Open
Abstract
Protein tandem repeats (TRs) are often associated with immunity-related functions and diseases. Since that last census of protein TRs in 1999, the number of curated proteins increased more than seven-fold and new TR prediction methods were published. TRs appear to be enriched with intrinsic disorder and vice versa. The significance and the biological reasons for this association are unknown. Here, we characterize protein TRs across all kingdoms of life and their overlap with intrinsic disorder in unprecedented detail. Using state-of-the-art prediction methods, we estimate that 50.9% of proteins contain at least one TR, often located at the sequence flanks. Positive linear correlation between the proportion of TRs and the protein length was observed universally, with Eukaryotes in general having more TRs, but when the difference in length is taken into account the difference is quite small. TRs were enriched with disorder-promoting amino acids and were inside intrinsically disordered regions. Many such TRs were homorepeats. Our results support that TRs mostly originate by duplication and are involved in essential functions such as transcription processes, structural organization, electron transport and iron-binding. In viruses, TRs are found in proteins essential for virulence.
Collapse
|
20
|
Ntountoumi C, Vlastaridis P, Mossialos D, Stathopoulos C, Iliopoulos I, Promponas V, Oliver SG, Amoutzias GD. Low complexity regions in the proteins of prokaryotes perform important functional roles and are highly conserved. Nucleic Acids Res 2019; 47:9998-10009. [PMID: 31504783 PMCID: PMC6821194 DOI: 10.1093/nar/gkz730] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 07/16/2019] [Accepted: 08/15/2019] [Indexed: 01/27/2023] Open
Abstract
We provide the first high-throughput analysis of the properties and functional role of Low Complexity Regions (LCRs) in more than 1500 prokaryotic and phage proteomes. We observe that, contrary to a widespread belief based on older and sparse data, LCRs actually have a significant, persistent and highly conserved presence and role in many and diverse prokaryotes. Their specific amino acid content is linked to proteins with certain molecular functions, such as the binding of RNA, DNA, metal-ions and polysaccharides. In addition, LCRs have been repeatedly identified in very ancient, and usually highly expressed proteins of the translation machinery. At last, based on the amino acid content enriched in certain categories, we have developed a neural network web server to identify LCRs and accurately predict whether they can bind nucleic acids, metal-ions or are involved in chaperone functions. An evaluation of the tool showed that it is highly accurate for eukaryotic proteins as well.
Collapse
Affiliation(s)
- Chrysa Ntountoumi
- Bioinformatics Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, 41500, Greece
| | - Panayotis Vlastaridis
- Bioinformatics Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, 41500, Greece
| | - Dimitris Mossialos
- Microbial Biotechnology-Molecular Bacteriology-Virology Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, 41500, Greece
| | | | | | - Vasilios Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, New Campus, University of Cyprus, PO Box 20537, CY-1678 Nicosia, Cyprus
| | - Stephen G Oliver
- Cambridge Systems Biology Centre & Department of Biochemistry, University of Cambridge, CB2 1GA, UK
| | - Grigoris D Amoutzias
- Bioinformatics Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, 41500, Greece
| |
Collapse
|
21
|
Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P, Gruca A, Grynberg M, Kajava AV, Promponas VJ, Anisimova M, Jakobsen KS, Linke D. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 2019; 47:10994-11006. [PMID: 31584084 PMCID: PMC6868369 DOI: 10.1093/nar/gkz841] [Citation(s) in RCA: 169] [Impact Index Per Article: 28.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 09/03/2019] [Accepted: 10/01/2019] [Indexed: 12/13/2022] Open
Abstract
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.
Collapse
Affiliation(s)
- Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton. CB10 1SD, UK
| | - Patryk Jarnot
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Aleksandra Gruca
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics PAS, Pawińskiego 5A, 02-106 Warsaw, Poland
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Universite Montpellier 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
- Institut de Biologie Computationnelle, 34095 Montpellier, France
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, PO Box 20537, CY 1678 Nicosia, Cyprus
| | - Maria Anisimova
- Institute of Applied Simulations, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Wädenswil, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Dirk Linke
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| |
Collapse
|
22
|
An Evolutionary Perspective on the Impact of Genomic Copy Number Variation on Human Health. J Mol Evol 2019; 88:104-119. [PMID: 31522275 DOI: 10.1007/s00239-019-09911-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Accepted: 08/27/2019] [Indexed: 02/06/2023]
Abstract
Copy number variants (CNVs), deletions and duplications of segments of DNA, account for at least five times more variable base pairs in humans than single-nucleotide variants. Several common CNVs were shown to change coding and regulatory sequences and thus dramatically affect adaptive phenotypes involving immunity, perception, metabolism, skin structure, among others. Some of these CNVs were also associated with susceptibility to cancer, infection, and metabolic disorders. These observations raise the possibility that CNVs are a primary contributor to human phenotypic variation and consequently evolve under selective pressures. Indeed, locus-specific haplotype-level analyses revealed signatures of natural selection on several CNVs. However, more traditional tests of selection which are often applied to single-nucleotide variation often have diminished statistical power when applied to CNVs because they often do not show strong linkage disequilibrium with nearby variants. Recombination-based formation mechanisms of CNVs lead to frequent recurrence and gene conversion events, breaking the linkage disequilibrium involving CNVs. Similar methodological challenges also prevent routine genome-wide association studies to adequately investigate the impact of CNVs on heritable human disease. Thus, we argue that the full relevance of CNVs to human health and evolution is yet to be elucidated. We further argue that a holistic investigation of formation mechanisms within an evolutionary framework would provide a powerful framework to understand the functional and biomedical impact of CNVs. In this paper, we review several cases where studies reveal diverse evolutionary histories and unexpected functional consequences of CNVs. We hope that this review will encourage further work on CNVs by both evolutionary and medical geneticists.
Collapse
|
23
|
Hirsh L, Paladin L, Piovesan D, Tosatto SCE. RepeatsDB-lite: a web server for unit annotation of tandem repeat proteins. Nucleic Acids Res 2019; 46:W402-W407. [PMID: 29746699 PMCID: PMC6031040 DOI: 10.1093/nar/gky360] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 04/24/2018] [Indexed: 11/15/2022] Open
Abstract
RepeatsDB-lite (http://protein.bio.unipd.it/repeatsdb-lite) is a web server for the prediction of repetitive structural elements and units in tandem repeat (TR) proteins. TRs are a widespread but poorly annotated class of non-globular proteins carrying heterogeneous functions. RepeatsDB-lite extends the prediction to all TR types and strongly improves the performance both in terms of computational time and accuracy over previous methods, with precision above 95% for solenoid structures. The algorithm exploits an improved TR unit library derived from the RepeatsDB database to perform an iterative structural search and assignment. The web interface provides tools for analyzing the evolutionary relationships between units and manually refine the prediction by changing unit positions and protein classification. An all-against-all structure-based sequence similarity matrix is calculated and visualized in real-time for every user edit. Reviewed predictions can be submitted to RepeatsDB for review and inclusion.
Collapse
Affiliation(s)
- Layla Hirsh
- Dept. of Biomedical Sciences, University of Padua, Padua, Italy.,Dept. of Engineering, Pontificia Universidad Católica del Perú, Lima, Perú
| | - Lisanna Paladin
- Dept. of Biomedical Sciences, University of Padua, Padua, Italy
| | | | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padua, Padua, Italy.,CNR Institute of Neurosciences, Padua, Italy
| |
Collapse
|
24
|
Proteomic and genomic signatures of repeat instability in cancer and adjacent normal tissues. Proc Natl Acad Sci U S A 2019; 116:16987-16996. [PMID: 31387980 DOI: 10.1073/pnas.1908790116] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Repetitive sequences are hotspots of evolution at multiple levels. However, due to difficulties involved in their assembly and analysis, the role of repeats in tumor evolution is poorly understood. We developed a rigorous motif-based methodology to quantify variations in the repeat content, beyond microsatellites, in proteomes and genomes directly from proteomic and genomic raw data. This method was applied to a wide range of tumors and normal tissues. We identify high similarity between repeat instability patterns in tumors and their patient-matched adjacent normal tissues. Nonetheless, tumor-specific signatures both in protein expression and in the genome strongly correlate with cancer progression and robustly predict the tumorigenic state. In a patient, the hierarchy of genomic repeat instability signatures accurately reconstructs tumor evolution, with primary tumors differentiated from metastases. We observe an inverse relationship between repeat instability and point mutation load within and across patients independent of other somatic aberrations. Thus, repeat instability is a distinct, transient, and compensatory adaptive mechanism in tumor evolution and a potential signal for early detection.
Collapse
|
25
|
A Graph-Based Approach for Detecting Sequence Homology in Highly Diverged Repeat Protein Families. Methods Mol Biol 2019. [PMID: 30298401 DOI: 10.1007/978-1-4939-8736-8_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Reconstructing evolutionary relationships in repeat proteins is notoriously difficult due to the high degree of sequence divergence that typically occurs between duplicated repeats. This is complicated further by the fact that proteins with a large number of similar repeats are more likely to produce significant local sequence alignments than proteins with fewer copies of the repeat motif. Furthermore, biologically correct sequence alignments are sometimes impossible to achieve in cases where insertion or translocation events disrupt the order of repeats in one of the sequences being aligned. Combined, these attributes make traditional phylogenetic methods for studying protein families unreliable for repeat proteins, due to the dependence of such methods on accurate sequence alignment.We present here a practical solution to this problem, making use of graph clustering combined with the open-source software package HH-suite, which enables highly sensitive detection of sequence relationships. Carrying out multiple rounds of homology searches via alignment of profile hidden Markov models, large sets of related proteins are generated. By representing the relationships between proteins in these sets as graphs, subsequent clustering with the Markov cluster algorithm enables robust detection of repeat protein subfamilies.
Collapse
|
26
|
Xu D, Pavlidis P, Taskent RO, Alachiotis N, Flanagan C, DeGiorgio M, Blekhman R, Ruhl S, Gokcumen O. Archaic Hominin Introgression in Africa Contributes to Functional Salivary MUC7 Genetic Variation. Mol Biol Evol 2017; 34:2704-2715. [PMID: 28957509 PMCID: PMC5850612 DOI: 10.1093/molbev/msx206] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
One of the most abundant proteins in human saliva, mucin-7, is encoded by the MUC7 gene, which harbors copy number variable subexonic repeats (PTS-repeats) that affect the size and glycosylation potential of this protein. We recently documented the adaptive evolution of MUC7 subexonic copy number variation among primates. Yet, the evolution of MUC7 genetic variation in humans remained unexplored. Here, we found that PTS-repeat copy number variation has evolved recurrently in the human lineage, thereby generating multiple haplotypic backgrounds carrying five or six PTS-repeat copy number alleles. Contrary to previous studies, we found no associations between the copy number of PTS-repeats and protection against asthma. Instead, we revealed a significant association of MUC7 haplotypic variation with the composition of the oral microbiome. Furthermore, based on in-depth simulations, we conclude that a divergent MUC7 haplotype likely originated in an unknown African hominin population and introgressed into ancestors of modern Africans.
Collapse
Affiliation(s)
- Duo Xu
- Department of Biological Sciences, University at Buffalo, The State University of New York, Buffalo, NY
| | - Pavlos Pavlidis
- Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology - Hellas, Heraklion, Crete, Greece
| | - Recep Ozgur Taskent
- Department of Biological Sciences, University at Buffalo, The State University of New York, Buffalo, NY
| | - Nikolaos Alachiotis
- Institute of Computer Science (ICS), Foundation for Research and Technology - Hellas, Heraklion, Crete, Greece
| | - Colin Flanagan
- Department of Biological Sciences, University at Buffalo, The State University of New York, Buffalo, NY
| | - Michael DeGiorgio
- Department of Biology and the Institute for CyberScience, Pennsylvania State University, University Park, PA
| | - Ran Blekhman
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Twin Cities, MN
| | - Stefan Ruhl
- Department of Oral Biology, School of Dental Medicine, University at Buffalo, The State University of New York, Buffalo, NY
| | - Omer Gokcumen
- Department of Biological Sciences, University at Buffalo, The State University of New York, Buffalo, NY
| |
Collapse
|
27
|
Bagshaw AT. Functional Mechanisms of Microsatellite DNA in Eukaryotic Genomes. Genome Biol Evol 2017; 9:2428-2443. [PMID: 28957459 PMCID: PMC5622345 DOI: 10.1093/gbe/evx164] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/23/2017] [Indexed: 02/06/2023] Open
Abstract
Microsatellite repeat DNA is best known for its length mutability, which is implicated in several neurological diseases and cancers, and often exploited as a genetic marker. Less well-known is the body of work exploring the widespread and surprisingly diverse functional roles of microsatellites. Recently, emerging evidence includes the finding that normal microsatellite polymorphism contributes substantially to the heritability of human gene expression on a genome-wide scale, calling attention to the task of elucidating the mechanisms involved. At present, these are underexplored, but several themes have emerged. I review evidence demonstrating roles for microsatellites in modulation of transcription factor binding, spacing between promoter elements, enhancers, cytosine methylation, alternative splicing, mRNA stability, selection of transcription start and termination sites, unusual structural conformations, nucleosome positioning and modification, higher order chromatin structure, noncoding RNA, and meiotic recombination hot spots.
Collapse
|
28
|
Trujillo JT, Beilstein MA, Mosher RA. The Argonaute-binding platform of NRPE1 evolves through modulation of intrinsically disordered repeats. THE NEW PHYTOLOGIST 2016; 212:1094-1105. [PMID: 27431917 PMCID: PMC5125548 DOI: 10.1111/nph.14089] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2016] [Accepted: 06/04/2016] [Indexed: 05/26/2023]
Abstract
Argonaute (Ago) proteins are important effectors in RNA silencing pathways, but they must interact with other machinery to trigger silencing. Ago hooks have emerged as a conserved motif responsible for interaction with Ago proteins, but little is known about the sequence surrounding Ago hooks that must restrict or enable interaction with specific Argonautes. Here we investigated the evolutionary dynamics of an Ago-binding platform in NRPE1, the largest subunit of RNA polymerase V. We compared NRPE1 sequences from > 50 species, including dense sampling of two plant lineages. This study demonstrates that the Ago-binding platform of NRPE1 retains Ago hooks, intrinsic disorder, and repetitive character while being highly labile at the sequence level. We reveal that loss of sequence conservation is the result of relaxed selection and frequent expansions and contractions of tandem repeat arrays. These factors allow a complete restructuring of the Ago-binding platform over 50-60 million yr. This evolutionary pattern is also detected in a second Ago-binding platform, suggesting it is a general mechanism. The presence of labile repeat arrays in all analyzed NRPE1 Ago-binding platforms indicates that selection maintains repetitive character, potentially to retain the ability to rapidly restructure the Ago-binding platform.
Collapse
Affiliation(s)
- Joshua T Trujillo
- The School of Plant Sciences, The University of Arizona, Tucson, AZ, 85721-0036, USA
| | - Mark A Beilstein
- The School of Plant Sciences, The University of Arizona, Tucson, AZ, 85721-0036, USA
| | - Rebecca A Mosher
- The School of Plant Sciences, The University of Arizona, Tucson, AZ, 85721-0036, USA
| |
Collapse
|
29
|
Persi E, Wolf YI, Koonin EV. Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins. Nat Commun 2016; 7:13570. [PMID: 27857066 PMCID: PMC5120217 DOI: 10.1038/ncomms13570] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2016] [Accepted: 10/17/2016] [Indexed: 01/21/2023] Open
Abstract
Protein repeats are considered hotspots of protein evolution, associated with acquisition of new functions and novel phenotypic traits, including disease. Paradoxically, however, repeats are often strongly conserved through long spans of evolution. To resolve this conundrum, it is necessary to directly compare paralogous (horizontal) evolution of repeats within proteins with their orthologous (vertical) evolution through speciation. Here we develop a rigorous methodology to identify highly periodic repeats with significant sequence similarity, for which evolutionary rates and selection (dN/dS) can be estimated, and systematically characterize their evolution. We show that horizontal evolution of repeats is markedly accelerated compared with their divergence from orthologues in closely related species. This observation is universal across the diversity of life forms and implies a biphasic evolutionary regime whereby new copies experience rapid functional divergence under combined effects of strongly relaxed purifying selection and positive selection, followed by fixation and conservation of each individual repeat.
Collapse
Affiliation(s)
- Erez Persi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
30
|
Lokhande S, Patra BN, Ray A. A link between chromatin condensation mechanisms and Huntington's disease: connecting the dots. MOLECULAR BIOSYSTEMS 2016; 12:3515-3529. [PMID: 27714015 DOI: 10.1039/c6mb00598e] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Huntington's disease is a rare neurodegenerative disorder whose complex pathophysiology exhibits system-wide changes in the body, with striking and debilitating clinical features targeting the central nervous system. Among the various molecular functions affected in this disease, mitochondrial dysfunction and transcriptional dysregulation are some of the most studied aspects of this disease. However, there is evidence of the involvement of a mutant Huntingtin protein in the processes of DNA damage, chromosome condensation and DNA repair. This review attempts to briefly recapitulate the clinical features, model systems used to study the disease, major molecular processes affected, and, more importantly, examines recent evidence for the involvement of the mutant Huntingtin protein in the processes regulating chromosome condensation, leading to DNA damage response and neuronal death.
Collapse
Affiliation(s)
- Sonali Lokhande
- Keck Graduate Institute of Applied Life Sciences, Claremont, CA 91711, USA.
| | - Biranchi N Patra
- Keck Graduate Institute of Applied Life Sciences, Claremont, CA 91711, USA.
| | - Animesh Ray
- Keck Graduate Institute of Applied Life Sciences, Claremont, CA 91711, USA.
| |
Collapse
|
31
|
Abstract
Repeats are ubiquitous elements of proteins and they play important roles for cellular function and during evolution. Repeats are, however, also notoriously difficult to capture computationally and large scale studies so far had difficulties in linking genetic causes, structural properties and evolutionary trajectories of protein repeats. Here we apply recently developed methods for repeat detection and analysis to a large dataset comprising over hundred metazoan genomes. We find that repeats in larger protein families experience generally very few insertions or deletions (indels) of repeat units but there is also a significant fraction of noteworthy volatile outliers with very high indel rates. Analysis of structural data indicates that repeats with an open structure and independently folding units are more volatile and more likely to be intrinsically disordered. Such disordered repeats are also significantly enriched in sites with a high functional potential such as linear motifs. Furthermore, the most volatile repeats have a high sequence similarity between their units. Since many volatile repeats also show signs of recombination, we conclude they are often shaped by concerted evolution. Intriguingly, many of these conserved yet volatile repeats are involved in host-pathogen interactions where they might foster fast but subtle adaptation in biological arms races. KEY WORDS: protein evolution, domain rearrangements, protein repeats, concerted evolution.
Collapse
Affiliation(s)
- Andreas Schüler
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University, Huefferstrasse 1, Muenster, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University, Huefferstrasse 1, Muenster, Germany
| |
Collapse
|
32
|
Xu D, Pavlidis P, Thamadilok S, Redwood E, Fox S, Blekhman R, Ruhl S, Gokcumen O. Recent evolution of the salivary mucin MUC7. Sci Rep 2016; 6:31791. [PMID: 27558399 PMCID: PMC4997351 DOI: 10.1038/srep31791] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Accepted: 07/26/2016] [Indexed: 11/23/2022] Open
Abstract
Genomic structural variants constitute the majority of variable base pairs in primate genomes and affect gene function in multiple ways. While whole gene duplications and deletions are relatively well-studied, the biology of subexonic (i.e., within coding exon sequences), copy number variation remains elusive. The salivary MUC7 gene provides an opportunity for studying such variation, as it harbors copy number variable subexonic repeat sequences that encode for densely O-glycosylated domains (PTS-repeats) with microbe-binding properties. To understand the evolution of this gene, we analyzed mammalian and primate genomes within a comparative framework. Our analyses revealed that (i) MUC7 has emerged in the placental mammal ancestor and rapidly gained multiple sites for O-glycosylation; (ii) MUC7 has retained its extracellular activity in saliva in placental mammals; (iii) the anti-fungal domain of the protein was remodified under positive selection in the primate lineage; and (iv) MUC7 PTS-repeats have evolved recurrently and under adaptive constraints. Our results establish MUC7 as a major player in salivary adaptation, likely as a response to diverse pathogenic exposure in primates. On a broader scale, our study highlights variable subexonic repeats as a primary source for modular evolutionary innovation that lead to rapid functional adaptation.
Collapse
Affiliation(s)
- Duo Xu
- Department of Biological Sciences, State University of New York at Buffalo, New York 14260, USA
| | - Pavlos Pavlidis
- Institute of Computer Science (ICS), Foundation of Research and Technology-Hellas, Heraklion, Crete, Greece
| | - Supaporn Thamadilok
- Department of Oral Biology, School of Dental Medicine, State University of New York at Buffalo, New York 14214, USA
| | - Emilie Redwood
- Department of Biological Sciences, State University of New York at Buffalo, New York 14260, USA
| | - Sara Fox
- Department of Biological Sciences, State University of New York at Buffalo, New York 14260, USA
| | - Ran Blekhman
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Twin Cities, Minnesota 55455, USA
| | - Stefan Ruhl
- Department of Oral Biology, School of Dental Medicine, State University of New York at Buffalo, New York 14214, USA
| | - Omer Gokcumen
- Department of Biological Sciences, State University of New York at Buffalo, New York 14260, USA
| |
Collapse
|
33
|
Wu X, Li G. Prevalent Accumulation of Non-Optimal Codons through Somatic Mutations in Human Cancers. PLoS One 2016; 11:e0160463. [PMID: 27513638 PMCID: PMC4981346 DOI: 10.1371/journal.pone.0160463] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Accepted: 07/19/2016] [Indexed: 11/27/2022] Open
Abstract
Cancer is characterized by uncontrolled cell growth, and the cause of different cancers is generally attributed to checkpoint dysregulation of cell proliferation and apoptosis. Recent studies have shown that non-optimal codons were preferentially adopted by genes to generate cell cycle-dependent oscillations in protein levels. This raises the intriguing question of how dynamic changes of codon usage modulate the cancer genome to cope with a non-controlled proliferative cell cycle. In this study, we comprehensively analyzed the somatic mutations of codons in human cancers, and found that non-optimal codons tended to be accumulated through both synonymous and non-synonymous mutations compared with other types of genomic substitution. We further demonstrated that non-optimal codons were prevalently accumulated across different types of cancers, amino acids, and chromosomes, and genes with accumulation of non-optimal codons tended to be involved in protein interaction/signaling networks and encoded important enzymes in metabolic networks that played roles in cancer-related pathways. This study provides insights into the dynamics of codons in the cancer genome and demonstrates that accumulation of non-optimal codons may be an adaptive strategy for cancerous cells to win the competition with normal cells. This deeper interpretation of the patterns and the functional characterization of somatic mutations of codons will help to broaden the current understanding of the molecular basis of cancers.
Collapse
Affiliation(s)
- Xudong Wu
- Laboratory of Molecular Modeling and Design, State key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Rd., Dalian 116023, PR China
| | - Guohui Li
- Laboratory of Molecular Modeling and Design, State key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Rd., Dalian 116023, PR China
- * E-mail:
| |
Collapse
|
34
|
Pellegrini M. Tandem Repeats in Proteins: Prediction Algorithms and Biological Role. Front Bioeng Biotechnol 2015; 3:143. [PMID: 26442257 PMCID: PMC4585158 DOI: 10.3389/fbioe.2015.00143] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 09/07/2015] [Indexed: 12/30/2022] Open
Abstract
Tandem repetitions in protein sequence and structure is a fascinating subject of research which has been a focus of study since the late 1990s. In this survey, we give an overview on the multi-faceted aspects of research on protein tandem repeats (PTR for short), including prediction algorithms, databases, early classification efforts, mechanisms of PTR formation and evolution, and synthetic PTR design. We also touch on the rather open issue of the relationship between PTR and flexibility (or disorder) in proteins. Detection of PTR either from protein sequence or structure data is challenging due to inherent high (biological) signal-to-noise ratio that is a key feature of this problem. As early in silico analytic tools have been key enablers for starting this field of study, we expect that current and future algorithmic and statistical breakthroughs will have a high impact on the investigations of the biological role of PTR.
Collapse
Affiliation(s)
- Marco Pellegrini
- Laboratory for Integrative Systems Medicine (LISM), Istituto di Informatica e Telematica, and Istituto di Fisiologia Clinica, Consiglio Nazionale delle Ricerche , Pisa , Italy
| |
Collapse
|
35
|
Chakrabarty B, Parekh N. PRIGSA: protein repeat identification by graph spectral analysis. J Bioinform Comput Biol 2015; 12:1442009. [PMID: 25385083 DOI: 10.1142/s0219720014420098] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Repetition of a structural motif within protein is associated with a wide range of structural and functional roles. In most cases the repeating units are well conserved at the structural level while at the sequence level, they are mostly undetectable suggesting the need for structure-based methods. Since most known methods require a training dataset, de novo approach is desirable. Here, we propose an efficient graph-based approach for detecting structural repeats in proteins. In a protein structure represented as a graph, interactions between inter- and intra-repeat units are well captured by the eigen spectra of adjacency matrix of the graph. These conserved interactions give rise to similar connections and a unique profile of the principal eigen spectra for each repeating unit. The efficacy of the approach is shown on eight repeat families annotated in UniProt, comprising of both solenoid and nonsolenoid repeats with varied secondary structure architecture and repeat lengths. The performance of the approach is also tested on other known benchmark datasets and the performance compared with two repeat identification methods. For a known repeat type, the algorithm also identifies the type of repeat present in the protein. A web tool implementing the algorithm is available at the URL http://bioinf.iiit.ac.in/PRIGSA/.
Collapse
Affiliation(s)
- Broto Chakrabarty
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | | |
Collapse
|
36
|
Gemayel R, Chavali S, Pougach K, Legendre M, Zhu B, Boeynaems S, van der Zande E, Gevaert K, Rousseau F, Schymkowitz J, Babu MM, Verstrepen KJ. Variable Glutamine-Rich Repeats Modulate Transcription Factor Activity. Mol Cell 2015; 59:615-27. [PMID: 26257283 PMCID: PMC4543046 DOI: 10.1016/j.molcel.2015.07.003] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Revised: 06/26/2015] [Accepted: 07/01/2015] [Indexed: 12/15/2022]
Abstract
Excessive expansions of glutamine (Q)-rich repeats in various human proteins are known to result in severe neurodegenerative disorders such as Huntington's disease and several ataxias. However, the physiological role of these repeats and the consequences of more moderate repeat variation remain unknown. Here, we demonstrate that Q-rich domains are highly enriched in eukaryotic transcription factors where they act as functional modulators. Incremental changes in the number of repeats in the yeast transcriptional regulator Ssn6 (Cyc8) result in systematic, repeat-length-dependent variation in expression of target genes that result in direct phenotypic changes. The function of Ssn6 increases with its repeat number until a certain threshold where further expansion leads to aggregation. Quantitative proteomic analysis reveals that the Ssn6 repeats affect its solubility and interactions with Tup1 and other regulators. Thus, Q-rich repeats are dynamic functional domains that modulate a regulator's innate function, with the inherent risk of pathogenic repeat expansions.
Collapse
Affiliation(s)
- Rita Gemayel
- Laboratory of Systems Biology, VIB, Gaston Geenslaan 1, 3001 Heverlee, Belgium; Laboratory of Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), Department M2S, KU Leuven, Gaston Geenslaan 1, 3001 Heverlee, Belgium
| | - Sreenivas Chavali
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Ksenia Pougach
- Laboratory of Systems Biology, VIB, Gaston Geenslaan 1, 3001 Heverlee, Belgium; Laboratory of Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), Department M2S, KU Leuven, Gaston Geenslaan 1, 3001 Heverlee, Belgium
| | - Matthieu Legendre
- Structural and Genomic Information Laboratory, IGS UMR7256, Centre National de la Recherche Scientifique, Aix-Marseille Université, Institut de Microbiologie de la Méditerranée (IMM), 13288 Marseille Cedex 9, France
| | - Bo Zhu
- Laboratory of Systems Biology, VIB, Gaston Geenslaan 1, 3001 Heverlee, Belgium; Laboratory of Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), Department M2S, KU Leuven, Gaston Geenslaan 1, 3001 Heverlee, Belgium
| | - Steven Boeynaems
- Laboratory of Systems Biology, VIB, Gaston Geenslaan 1, 3001 Heverlee, Belgium; Laboratory of Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), Department M2S, KU Leuven, Gaston Geenslaan 1, 3001 Heverlee, Belgium
| | - Elisa van der Zande
- Laboratory of Systems Biology, VIB, Gaston Geenslaan 1, 3001 Heverlee, Belgium; Laboratory of Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), Department M2S, KU Leuven, Gaston Geenslaan 1, 3001 Heverlee, Belgium
| | - Kris Gevaert
- Department of Medical Protein Research, VIB, 9000 Ghent, Belgium; Department of Biochemistry, Ghent University, 9000 Ghent, Belgium
| | - Frederic Rousseau
- Switch Laboratory, VIB, Campus Gasthuisberg, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Joost Schymkowitz
- Switch Laboratory, VIB, Campus Gasthuisberg, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - M Madan Babu
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Kevin J Verstrepen
- Laboratory of Systems Biology, VIB, Gaston Geenslaan 1, 3001 Heverlee, Belgium; Laboratory of Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), Department M2S, KU Leuven, Gaston Geenslaan 1, 3001 Heverlee, Belgium.
| |
Collapse
|
37
|
Schüler A, Schmitz G, Reft A, Özbek S, Thurm U, Bornberg-Bauer E. The Rise and Fall of TRP-N, an Ancient Family of Mechanogated Ion Channels, in Metazoa. Genome Biol Evol 2015; 7:1713-27. [PMID: 26100409 PMCID: PMC4494053 DOI: 10.1093/gbe/evv091] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Mechanoreception, the sensing of mechanical forces, is an ancient means of orientation and communication and tightly linked to the evolution of motile animals. In flies, the transient-receptor-potential N protein (TRP-N) was found to be a cilia-associated mechanoreceptor. TRP-N belongs to a large and diverse family of ion channels. Its unusually long N-terminal repeat of 28 ankyrin domains presumably acts as the gating spring by which mechanical energy induces channel gating. We analyzed the evolutionary origins and possible diversification of TRP-N. Using a custom-made set of highly discriminative sequence profiles we scanned a representative set of metazoan genomes and subsequently corrected several gene models. We find that, contrary to other ion channel families, TRP-N is remarkably conserved in its domain arrangements and copy number (1) in all Bilateria except for amniotes, even in the wake of several whole-genome duplications. TRP-N is absent in Porifera but present in Ctenophora and Placozoa. Exceptional multiplications of TRP-N occurred in Cnidaria, independently along the Hydra and the Nematostella lineage. Molecular signals of subfunctionalization can be attributed to different mechanisms of activation of the gating spring. In Hydra this is further supported by in situ hybridization and immune staining, suggesting that at least three paralogs adapted to nematocyte discharge, which is key for predation and defense. We propose that these new candidate proteins help explain the sensory complexity of Cnidaria which has been previously observed but so far has lacked a molecular underpinning. Also, the ancient appearance of TRP-N supports a common origin of important components of the nervous systems in Ctenophores, Cnidaria, and Bilateria.
Collapse
Affiliation(s)
- Andreas Schüler
- Institute for Evolution and Biodiversity, University of Muenster, Germany
| | - Gregor Schmitz
- Institute for Evolution and Biodiversity, University of Muenster, Germany
| | - Abigail Reft
- Centre for Organismal Studies, University of Heidelberg, Germany
| | - Suat Özbek
- Centre for Organismal Studies, University of Heidelberg, Germany HEIKA-Heidelberg Karlsruhe Research Partnership, Heidelberg University, Karlsruhe Institute of Technology (KIT), Heidelberg and Karlsruhe, Germany
| | - Ulrich Thurm
- Institute for Neurobiology and Behavioural Biology, University of Muenster, Germany
| | | |
Collapse
|
38
|
Schaper E, Korsunsky A, Pečerska J, Messina A, Murri R, Stockinger H, Zoller S, Xenarios I, Anisimova M. TRAL: tandem repeat annotation library. Bioinformatics 2015; 31:3051-3. [PMID: 25987568 DOI: 10.1093/bioinformatics/btv306] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 05/08/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Currently, more than 40 sequence tandem repeat detectors are published, providing heterogeneous, partly complementary, partly conflicting results. RESULTS We present TRAL, a tandem repeat annotation library that allows running and parsing of various detection outputs, clustering of redundant or overlapping annotations, several statistical frameworks for filtering false positive annotations, and importantly a tandem repeat annotation and refinement module based on circular profile hidden Markov models (cpHMMs). Using TRAL, we evaluated the performance of a multi-step tandem repeat annotation workflow on 547 085 sequences in UniProtKB/Swiss-Prot. The researcher can use these results to predict run-times for specific datasets, and to choose annotation complexity accordingly. AVAILABILITY AND IMPLEMENTATION TRAL is an open-source Python 3 library and is available, together with documentation and tutorials via http://www.vital-it.ch/software/tral. CONTACT elke.schaper@isb-sib.ch.
Collapse
Affiliation(s)
- Elke Schaper
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wäde
| | - Alexander Korsunsky
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland
| | - Jūlija Pečerska
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wäde
| | - Antonio Messina
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland
| | - Riccardo Murri
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland
| | - Heinz Stockinger
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland
| | - Stefan Zoller
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland
| | - Ioannis Xenarios
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland
| | - Maria Anisimova
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland
| |
Collapse
|
39
|
Anisimova M. Darwin and Fisher meet at biotech: on the potential of computational molecular evolution in industry. BMC Evol Biol 2015; 15:76. [PMID: 25928234 PMCID: PMC4422139 DOI: 10.1186/s12862-015-0352-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Accepted: 04/15/2015] [Indexed: 12/22/2022] Open
Abstract
Background Today computational molecular evolution is a vibrant research field that benefits from the availability of large and complex new generation sequencing data – ranging from full genomes and proteomes to microbiomes, metabolomes and epigenomes. The grounds for this progress were established long before the discovery of the DNA structure. Specifically, Darwin’s theory of evolution by means of natural selection not only remains relevant today, but also provides a solid basis for computational research with a variety of applications. But a long-term progress in biology was ensured by the mathematical sciences, as exemplified by Sir R. Fisher in early 20th century. Now this is true more than ever: The data size and its complexity require biologists to work in close collaboration with experts in computational sciences, modeling and statistics. Results Natural selection drives function conservation and adaptation to emerging pathogens or new environments; selection plays key role in immune and resistance systems. Here I focus on computational methods for evaluating selection in molecular sequences, and argue that they have a high potential for applications. Pharma and biotech industries can successfully use this potential, and should take the initiative to enhance their research and development with state of the art bioinformatics approaches. Conclusions This review provides a quick guide to the current computational approaches that apply the evolutionary principles of natural selection to real life problems – from drug target validation, vaccine design and protein engineering to applications in agriculture, ecology and conservation.
Collapse
Affiliation(s)
- Maria Anisimova
- Institute of Applied Simulations, School of Life Sciences and Facility Management, Zürich University of Applied Sciences, Einsiedlerstrasse 31a, Wädenswil, 8820, Switzerland. .,Department of Computer Science, ETH, Zurich, Switzerland. .,Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
40
|
Zoller S, Boskova V, Anisimova M. Maximum-Likelihood Tree Estimation Using Codon Substitution Models with Multiple Partitions. Mol Biol Evol 2015; 32:2208-16. [PMID: 25911229 DOI: 10.1093/molbev/msv097] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Many protein sequences have distinct domains that evolve with different rates, different selective pressures, or may differ in codon bias. Instead of modeling these differences by more and more complex models of molecular evolution, we present a multipartition approach that allows maximum-likelihood phylogeny inference using different codon models at predefined partitions in the data. Partition models can, but do not have to, share free parameters in the estimation process. We test this approach with simulated data as well as in a phylogenetic study of the origin of the leucin-rich repeat regions in the type III effector proteins of the pythopathogenic bacteria Ralstonia solanacearum. Our study does not only show that a simple two-partition model resolves the phylogeny better than a one-partition model but also gives more evidence supporting the hypothesis of lateral gene transfer events between the bacterial pathogens and its eukaryotic hosts.
Collapse
Affiliation(s)
- Stefan Zoller
- Computational Biochemistry Research Group, ETH Zürich, Zürich, Switzerland Swiss Institute of Bioinformatics, Switzerland
| | - Veronika Boskova
- Computational Biochemistry Research Group, ETH Zürich, Zürich, Switzerland
| | - Maria Anisimova
- Institute of Applied Simulations, School of Life Sciences and Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
| |
Collapse
|
41
|
Schaper E, Anisimova M. The evolution and function of protein tandem repeats in plants. THE NEW PHYTOLOGIST 2015; 206:397-410. [PMID: 25420631 DOI: 10.1111/nph.13184] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2014] [Accepted: 10/18/2014] [Indexed: 05/27/2023]
Abstract
Sequence tandem repeats (TRs) are abundant in proteomes across all domains of life. For plants, little is known about their distribution or contribution to protein function. We exhaustively annotated TRs and studied the evolution of TR unit variations for all Ensembl plants. Using phylogenetic patterns of TR units, we detected conserved TRs with unit number and order preserved during evolution, and those TRs that have diverged via recent TR unit gains/losses. We correlated the mode of evolution of TRs to protein function. TR number was strongly correlated with proteome size, with about one-half of all TRs recognized as common protein domains. The majority of TRs have been highly conserved over long evolutionary distances, some since the separation of red algae and green plants c. 1.6 billion yr ago. Conversely, recurrent recent TR unit mutations were rare. Our results suggest that the first TRs by far predate the first plants, and that TR appearance is an ongoing process with similar rates across the plant kingdom. Interestingly, the few detected highly mutable TRs might provide a source of variation for rapid adaptation. In particular, such TRs are enriched in leucine-rich repeats (LRRs) commonly found in R genes, where TR unit gain/loss may facilitate resistance to emerging pathogens.
Collapse
Affiliation(s)
- Elke Schaper
- Department of Computer Science, ETH Zürich, Zürich, 8092, Switzerland
- Institute of Integrative Biology, ETH Zürich, Zürich, 8092, Switzerland
- Vital-IT Competency Center, Swiss Institute for Bioinformatics (SIB), Lausanne, 1015, Switzerland
| | - Maria Anisimova
- Institute of Applied Simulation (IAS), School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Wädenswil, 8820, Switzerland
| |
Collapse
|
42
|
Anisimova M, Pečerska J, Schaper E. Statistical approaches to detecting and analyzing tandem repeats in genomic sequences. Front Bioeng Biotechnol 2015; 3:31. [PMID: 25853125 PMCID: PMC4362331 DOI: 10.3389/fbioe.2015.00031] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 02/26/2015] [Indexed: 11/13/2022] Open
Abstract
Tandem repeats (TRs) are frequently observed in genomes across all domains of life. Evidence suggests that some TRs are crucial for proteins with fundamental biological functions and can be associated with virulence, resistance, and infectious/neurodegenerative diseases. Genome-scale systematic studies of TRs have the potential to unveil core mechanisms governing TR evolution and TR roles in shaping genomes. However, TR-related studies are often non-trivial due to heterogeneous and sometimes fast evolving TR regions. In this review, we discuss these intricacies and their consequences. We present our recent contributions to computational and statistical approaches for TR significance testing, sequence profile-based TR annotation, TR-aware sequence alignment, phylogenetic analyses of TR unit number and order, and TR benchmarks. Importantly, all these methods explicitly rely on the evolutionary definition of a tandem repeat as a sequence of adjacent repeat units stemming from a common ancestor. The discussed work has a focus on protein TRs, yet is generally applicable to nucleic acid TRs, sharing similar features.
Collapse
Affiliation(s)
- Maria Anisimova
- Institute of Applied Simulation, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW) , Wädenswil , Switzerland
| | - Julija Pečerska
- Department of Biosystems Science and Engineering, ETH Zürich , Basel , Switzerland ; Department of Computer Science, ETH Zürich , Zürich , Switzerland
| | - Elke Schaper
- Department of Computer Science, ETH Zürich , Zürich , Switzerland ; Vital-IT Competency Center, Swiss Institute for Bioinformatics , Lausanne , Switzerland
| |
Collapse
|
43
|
Huang Y, Kendall T, Forsythe ES, Dorantes-Acosta A, Li S, Caballero-Pérez J, Chen X, Arteaga-Vázquez M, Beilstein MA, Mosher RA. Ancient Origin and Recent Innovations of RNA Polymerase IV and V. Mol Biol Evol 2015; 32:1788-99. [PMID: 25767205 PMCID: PMC4476159 DOI: 10.1093/molbev/msv060] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Small RNA-mediated chromatin modification is a conserved feature of eukaryotes. In flowering plants, the short interfering (si)RNAs that direct transcriptional silencing are abundant and subfunctionalization has led to specialized machinery responsible for synthesis and action of these small RNAs. In particular, plants possess polymerase (Pol) IV and Pol V, multi-subunit homologs of the canonical DNA-dependent RNA Pol II, as well as specialized members of the RNA-dependent RNA Polymerase (RDR), Dicer-like (DCL), and Argonaute (AGO) families. Together these enzymes are required for production and activity of Pol IV-dependent (p4-)siRNAs, which trigger RNA-directed DNA methylation (RdDM) at homologous sequences. p4-siRNAs accumulate highly in developing endosperm, a specialized tissue found only in flowering plants, and are rare in nonflowering plants, suggesting that the evolution of flowers might coincide with the emergence of specialized RdDM machinery. Through comprehensive identification of RdDM genes from species representing the breadth of the land plant phylogeny, we describe the ancient origin of Pol IV and Pol V, suggesting that a nearly complete and functional RdDM pathway could have existed in the earliest land plants. We also uncover innovations in these enzymes that are coincident with the emergence of seed plants and flowering plants, and recent duplications that might indicate additional subfunctionalization. Phylogenetic analysis reveals rapid evolution of Pol IV and Pol V subunits relative to their Pol II counterparts and suggests that duplicates were retained and subfunctionalized through Escape from Adaptive Conflict. Evolution within the carboxy-terminal domain of the Pol V largest subunit is particularly striking, where illegitimate recombination facilitated extreme sequence divergence.
Collapse
Affiliation(s)
- Yi Huang
- The School of Plant Sciences, The University of Arizona
| | - Timmy Kendall
- The School of Plant Sciences, The University of Arizona
| | | | - Ana Dorantes-Acosta
- Instituto de Biotecnología y Ecología Aplicada (INBIOTECA), Universidad Veracruzana, Veracruz, México
| | - Shaofang Li
- Department of Botany and Plant Sciences, Institute of Integrative Genome Biology, University of California, Riverside
| | | | - Xuemei Chen
- Department of Botany and Plant Sciences, Institute of Integrative Genome Biology, University of California, Riverside
| | - Mario Arteaga-Vázquez
- Instituto de Biotecnología y Ecología Aplicada (INBIOTECA), Universidad Veracruzana, Veracruz, México
| | | | - Rebecca A Mosher
- The School of Plant Sciences, The University of Arizona The Bio5 Institute, The University of Arizona
| |
Collapse
|
44
|
Jernigan KK, Bordenstein SR. Tandem-repeat protein domains across the tree of life. PeerJ 2015; 3:e732. [PMID: 25653910 PMCID: PMC4304861 DOI: 10.7717/peerj.732] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2014] [Accepted: 12/29/2014] [Indexed: 12/19/2022] Open
Abstract
Tandem-repeat protein domains, composed of repeated units of conserved stretches of 20–40 amino acids, are required for a wide array of biological functions. Despite their diverse and fundamental functions, there has been no comprehensive assessment of their taxonomic distribution, incidence, and associations with organismal lifestyle and phylogeny. In this study, we assess for the first time the abundance of armadillo (ARM) and tetratricopeptide (TPR) repeat domains across all three domains in the tree of life and compare the results to our previous analysis on ankyrin (ANK) repeat domains in this journal. All eukaryotes and a majority of the bacterial and archaeal genomes analyzed have a minimum of one TPR and ARM repeat. In eukaryotes, the fraction of ARM-containing proteins is approximately double that of TPR and ANK-containing proteins, whereas bacteria and archaea are enriched in TPR-containing proteins relative to ARM- and ANK-containing proteins. We show in bacteria that phylogenetic history, rather than lifestyle or pathogenicity, is a predictor of TPR repeat domain abundance, while neither phylogenetic history nor lifestyle predicts ARM repeat domain abundance. Surprisingly, pathogenic bacteria were not enriched in TPR-containing proteins, which have been associated within virulence factors in certain species. Taken together, this comparative analysis provides a newly appreciated view of the prevalence and diversity of multiple types of tandem-repeat protein domains across the tree of life. A central finding of this analysis is that tandem repeat domain-containing proteins are prevalent not just in eukaryotes, but also in bacterial and archaeal species.
Collapse
Affiliation(s)
- Kristin K Jernigan
- Department of Cell and Developmental Biology, Vanderbilt University , Nashville, TN , USA
| | - Seth R Bordenstein
- Department of Biological Sciences, Vanderbilt University , Nashville, TN , USA ; Department of Pathology, Microbiology, and Immunology, Vanderbilt University , Nashville, TN , USA
| |
Collapse
|
45
|
Press MO, Carlson KD, Queitsch C. The overdue promise of short tandem repeat variation for heritability. Trends Genet 2014; 30:504-12. [PMID: 25182195 DOI: 10.1016/j.tig.2014.07.008] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Revised: 07/23/2014] [Accepted: 07/24/2014] [Indexed: 12/11/2022]
Abstract
Short tandem repeat (STR) variation has been proposed as a major explanatory factor in the heritability of complex traits in humans and model organisms. However, we still struggle to incorporate STR variation into genotype-phenotype maps. We review here the promise of STRs in contributing to complex trait heritability and highlight the challenges that STRs pose due to their repetitive nature. We argue that STR variants are more likely than single-nucleotide variants to have epistatic interactions, reiterate the need for targeted assays to genotype STRs accurately, and call for more appropriate statistical methods in detecting STR-phenotype associations. Lastly, we suggest that somatic STR variation within individuals may serve as a read-out of disease susceptibility, and is thus potentially a valuable covariate for future association studies.
Collapse
Affiliation(s)
- Maximilian O Press
- Department of Genome Sciences, University of Washington, Foege Building S-250, Box 355065, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Keisha D Carlson
- Department of Genome Sciences, University of Washington, Foege Building S-250, Box 355065, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Foege Building S-250, Box 355065, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA.
| |
Collapse
|
46
|
Abstract
It is widely appreciated that short tandem repeat (STR) variation underlies substantial phenotypic variation in organisms. Some propose that the high mutation rates of STRs in functional genomic regions facilitate evolutionary adaptation. Despite their high mutation rate, some STRs show little to no variation in populations. One such STR occurs in the Arabidopsis thaliana gene PFT1 (MED25), where it encodes an interrupted polyglutamine tract. Although the PFT1 STR is large (∼270 bp), and thus expected to be extremely variable, it shows only minuscule variation across A. thaliana strains. We hypothesized that the PFT1 STR is under selective constraint, due to previously undescribed roles in PFT1 function. We investigated this hypothesis using plants expressing transgenic PFT1 constructs with either an endogenous STR or synthetic STRs of varying length. Transgenic plants carrying the endogenous PFT1 STR generally performed best in complementing a pft1 null mutant across adult PFT1-dependent traits. In stark contrast, transgenic plants carrying a PFT1 transgene lacking the STR phenocopied a pft1 loss-of-function mutant for flowering time phenotypes and were generally hypomorphic for other traits, establishing the functional importance of this domain. Transgenic plants carrying various synthetic constructs occupied the phenotypic space between wild-type and pft1 loss-of-function mutants. By varying PFT1 STR length, we discovered that PFT1 can act as either an activator or repressor of flowering in a photoperiod-dependent manner. We conclude that the PFT1 STR is constrained to its approximate wild-type length by its various functional requirements. Our study implies that there is strong selection on STRs not only to generate allelic diversity, but also to maintain certain lengths pursuant to optimal molecular function.
Collapse
|