1
|
Mier P, Andrade-Navarro MA. Predicting the involvement of polyQ- and polyA in protein-protein interactions by their amino acid context. Heliyon 2024; 10:e37861. [PMID: 39323775 PMCID: PMC11422028 DOI: 10.1016/j.heliyon.2024.e37861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Accepted: 09/11/2024] [Indexed: 09/27/2024] Open
Abstract
Homorepeats, specifically polyglutamine (polyQ) and polyalanine (polyA), are often implicated in protein-protein interactions (PPIs). So far, a method to predict the participation of homorepeats in protein interactions is lacking. We propose a machine learning approach to identify PPI-involved polyQ and polyA regions within the human proteome based on known interacting regions. Using the dataset of human homorepeats, we identified 157 polyQ and 745 polyA regions potentially involved in PPIs. Machine learning models, trained on amino acid context and homorepeat length, demonstrated high precision (0.90-0.98) but variable recall (0.42-0.85). Random forest outperformed other models (AUC polyQ = 0.686, AUC polyA = 0.732) using the positions surrounding the homorepeat -10 to +10. Integrating paralog information marginally improved predictions but was excluded for model simplicity. Further optimization revealed that for polyQ, using amino acid surrounding positions from -6 to +6 increased AUC to 0.715. For polyA, no improvement was found. Incorporating coiled coil overlap information enhanced polyA predictions (AUC = 0.745) but not polyQ. Finally, we applied these models to predict PPI involvement across all polyQ and polyA regions, identifying potential interactions. Case studies illustrated the method's predictive capacity, highlighting known interacting regions with high scores and elucidating potential false negatives.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| |
Collapse
|
2
|
Teekas L, Sharma S, Vijay N. Terminal regions of a protein are a hotspot for low complexity regions and selection. Open Biol 2024; 14:230439. [PMID: 38862022 DOI: 10.1098/rsob.230439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 05/13/2024] [Indexed: 06/13/2024] Open
Abstract
Volatile low complexity regions (LCRs) are a novel source of adaptive variation, functional diversification and evolutionary novelty. An interplay of selection and mutation governs the composition and length of low complexity regions. High %GC and mutations provide length variability because of mechanisms like replication slippage. Owing to the complex dynamics between selection and mutation, we need a better understanding of their coexistence. Our findings underscore that positively selected sites (PSS) and low complexity regions prefer the terminal regions of genes, co-occurring in most Tetrapoda clades. We observed that positively selected sites within a gene have position-specific roles. Central-positively selected site genes primarily participate in defence responses, whereas terminal-positively selected site genes exhibit non-specific functions. Low complexity region-containing genes in the Tetrapoda clade exhibit a significantly higher %GC and lower ω (dN/dS: non-synonymous substitution rate/synonymous substitution rate) compared with genes without low complexity regions. This lower ω implies that despite providing rapid functional diversity, low complexity region-containing genes are subjected to intense purifying selection. Furthermore, we observe that low complexity regions consistently display ubiquitous prevalence at lower purity levels, but exhibit a preference for specific positions within a gene as the purity of the low complexity region stretch increases, implying a composition-dependent evolutionary role. Our findings collectively contribute to the understanding of how genetic diversity and adaptation are shaped by the interplay of selection and low complexity regions in the Tetrapoda clade.
Collapse
Affiliation(s)
- Lokdeep Teekas
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| | - Sandhya Sharma
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| | - Nagarjun Vijay
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| |
Collapse
|
3
|
Dickson ZW, Golding GB. Evolution of Transcript Abundance is Influenced by Indels in Protein Low Complexity Regions. J Mol Evol 2024; 92:153-168. [PMID: 38485789 DOI: 10.1007/s00239-024-10158-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 01/24/2024] [Indexed: 04/02/2024]
Abstract
Protein Protein low complexity regions (LCRs) are compositionally biased amino acid sequences, many of which have significant evolutionary impacts on the proteins which contain them. They are mutationally unstable experiencing higher rates of indels and substitutions than higher complexity regions. LCRs also impact the expression of their proteins, likely through multiple effects along the path from gene transcription, through translation, and eventual protein degradation. It has been observed that proteins which contain LCRs are associated with elevated transcript abundance (TAb), despite having lower protein abundance. We have gathered and integrated human data to investigate the co-evolution of TAb and LCRs through ancestral reconstructions and model inference using an approximate Bayesian calculation based method. We observe that on short evolutionary timescales TAb evolution is significantly impacted by changes in LCR length, with insertions driving TAb down. But in contrast, the observed data is best explained by indel rates in LCRs which are unaffected by shifts in TAb. Our work demonstrates a coupling between LCR and TAb evolution, and the utility of incorporating multiple responses into evolutionary analyses.
Collapse
Affiliation(s)
| | - G Brian Golding
- Department of Biology, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
4
|
Chuang CN, Liu HC, Woo TT, Chao JL, Chen CY, Hu HT, Hsueh YP, Wang TF. Noncanonical usage of stop codons in ciliates expands proteins with structurally flexible Q-rich motifs. eLife 2024; 12:RP91405. [PMID: 38393970 PMCID: PMC10942620 DOI: 10.7554/elife.91405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2024] Open
Abstract
Serine(S)/threonine(T)-glutamine(Q) cluster domains (SCDs), polyglutamine (polyQ) tracts and polyglutamine/asparagine (polyQ/N) tracts are Q-rich motifs found in many proteins. SCDs often are intrinsically disordered regions that mediate protein phosphorylation and protein-protein interactions. PolyQ and polyQ/N tracts are structurally flexible sequences that trigger protein aggregation. We report that due to their high percentages of STQ or STQN amino acid content, four SCDs and three prion-causing Q/N-rich motifs of yeast proteins possess autonomous protein expression-enhancing activities. Since these Q-rich motifs can endow proteins with structural and functional plasticity, we suggest that they represent useful toolkits for evolutionary novelty. Comparative Gene Ontology (GO) analyses of the near-complete proteomes of 26 representative model eukaryotes reveal that Q-rich motifs prevail in proteins involved in specialized biological processes, including Saccharomyces cerevisiae RNA-mediated transposition and pseudohyphal growth, Candida albicans filamentous growth, ciliate peptidyl-glutamic acid modification and microtubule-based movement, Tetrahymena thermophila xylan catabolism and meiosis, Dictyostelium discoideum development and sexual cycles, Plasmodium falciparum infection, and the nervous systems of Drosophila melanogaster, Mus musculus and Homo sapiens. We also show that Q-rich-motif proteins are expanded massively in 10 ciliates with reassigned TAAQ and TAGQ codons. Notably, the usage frequency of CAGQ is much lower in ciliates with reassigned TAAQ and TAGQ codons than in organisms with expanded and unstable Q runs (e.g. D. melanogaster and H. sapiens), indicating that the use of noncanonical stop codons in ciliates may have coevolved with codon usage biases to avoid triplet repeat disorders mediated by CAG/GTC replication slippage.
Collapse
Affiliation(s)
| | - Hou-Cheng Liu
- Institute of Molecular Biology, Academia SinicaTaipeiTaiwan
| | - Tai-Ting Woo
- Institute of Molecular Biology, Academia SinicaTaipeiTaiwan
| | - Ju-Lan Chao
- Institute of Molecular Biology, Academia SinicaTaipeiTaiwan
| | - Chiung-Ya Chen
- Institute of Molecular Biology, Academia SinicaTaipeiTaiwan
| | - Hisao-Tang Hu
- Institute of Molecular Biology, Academia SinicaTaipeiTaiwan
| | - Yi-Ping Hsueh
- Institute of Molecular Biology, Academia SinicaTaipeiTaiwan
- Department of Biochemical Science and Technology, National Chiayi UniversityChiayiTaiwan
| | - Ting-Fang Wang
- Institute of Molecular Biology, Academia SinicaTaipeiTaiwan
- Department of Biochemical Science and Technology, National Chiayi UniversityChiayiTaiwan
| |
Collapse
|
5
|
Monzon AM, Arrías PN, Elofsson A, Mier P, Andrade-Navarro MA, Bevilacqua M, Clementel D, Bateman A, Hirsh L, Fornasari MS, Parisi G, Piovesan D, Kajava AV, Tosatto SCE. A STRP-ed definition of Structured Tandem Repeats in Proteins. J Struct Biol 2023; 215:108023. [PMID: 37652396 DOI: 10.1016/j.jsb.2023.108023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 07/31/2023] [Accepted: 08/28/2023] [Indexed: 09/02/2023]
Abstract
Tandem Repeat Proteins (TRPs) are a class of proteins with repetitive amino acid sequences that have been studied extensively for over two decades. Different features at the level of sequence, structure, function and evolution have been attributed to them by various authors. And yet many of its salient features appear only when looking at specific subclasses of protein tandem repeats. Here, we attempt to rationalize the existing knowledge on Tandem Repeat Proteins (TRPs) by pointing out several dichotomies. The emerging picture is more nuanced than generally assumed and allows us to draw some boundaries of what is not a "proper" TRP. We conclude with an operational definition of a specific subset, which we have denominated STRPs (Structural Tandem Repeat Proteins), which separates a subclass of tandem repeats with distinctive features from several other less well-defined types of repeats. We believe that this definition will help researchers in the field to better characterize the biological meaning of this large yet largely understudied group of proteins.
Collapse
Affiliation(s)
- Alexander Miguel Monzon
- Dept. of Information Engineering, University of Padova, via Giovanni Gradenigo 6/B, 35131 Padova, Italy
| | - Paula Nazarena Arrías
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Arne Elofsson
- Dept. of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Tomtebodavägen 23, 171 21 Solna, Sweden
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Martina Bevilacqua
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Damiano Clementel
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Layla Hirsh
- Dept. of Engineering, Faculty of Science and Engineering, Pontifical Catholic University of Peru, Av. Universitaria 1801 San Miguel, Lima 32, Lima, Peru
| | - Maria Silvina Fornasari
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, 34293 Montpellier, France
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy.
| |
Collapse
|
6
|
Elena-Real CA, Mier P, Sibille N, Andrade-Navarro MA, Bernadó P. Structure-function relationships in protein homorepeats. Curr Opin Struct Biol 2023; 83:102726. [PMID: 37924569 DOI: 10.1016/j.sbi.2023.102726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 10/06/2023] [Accepted: 10/09/2023] [Indexed: 11/06/2023]
Abstract
Homorepeats (or polyX), protein segments containing repetitions of the same amino acid, are abundant in proteomes from all kingdoms of life and are involved in crucial biological functions as well as several neurodegenerative and developmental diseases. Mainly inserted in disordered segments of proteins, the structure/function relationships of homorepeats remain largely unexplored. In this review, we summarize present knowledge for the most abundant homorepeats, highlighting the role of the inherent structure and the conformational influence exerted by their flanking regions. Recent experimental and computational methods enable residue-specific investigations of these regions and promise novel structural and dynamic information for this elusive group of proteins. This information should increase our knowledge about the structural bases of phenomena such as liquid-liquid phase separation and trinucleotide repeat disorders.
Collapse
Affiliation(s)
- Carlos A Elena-Real
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS. 29 rue de Navacelles, 34090 Montpellier, France. https://twitter.com/carloselenareal
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz. Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Nathalie Sibille
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS. 29 rue de Navacelles, 34090 Montpellier, France
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz. Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Pau Bernadó
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS. 29 rue de Navacelles, 34090 Montpellier, France.
| |
Collapse
|
7
|
Mier P, Andrade-Navarro MA. The nucleotide landscape of polyXY regions. Comput Struct Biotechnol J 2023; 21:5408-5412. [PMID: 38022702 PMCID: PMC10652141 DOI: 10.1016/j.csbj.2023.10.054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 10/30/2023] [Accepted: 10/30/2023] [Indexed: 12/01/2023] Open
Abstract
PolyXY regions are compositionally biased regions composed of two different amino acids. They are classified according to the arrangement of the two amino acid types 'X' and 'Y' into direpeats (composed of alternating amino acids, e.g. 'XYXYXY'), joined (composed of two consecutive stretches of each amino acid, e.g. 'XXXYYY') and shuffled (other arrangements, e.g., 'XYXXYY'). They have been characterized at the amino acid level in all domains of life, and are described as often found within intrinsically disordered regions. Since DNA replication slippage has been proposed as a driver of repeat variation, and given that some polyXY have a repetitive nature, we hypothesized that characterizing the nucleotide coding of various types of polyXY could give hints about their origin and evolution. To test this, we obtained all polyXY regions in the human transcriptome, categorized them, and studied their coding nucleotide sequences. We observed that polyXY exacerbates the codon biases, and that the similarity between the X and Y codons is higher than in the background proteome. Our results support a general mechanism of emergence and evolution of polyXY from single-codon polyX. PolyXY are revealed as hotspots for replication slippage, particularly those composed of repeats: joined and direpeat polyXY. Inter-conversion to shuffled polyXY disrupts nucleotide repeats and restricts further evolution by replication slippage, a mechanism that we previously observed in polyX. Our results shed light on polyXY composition and should simplify the determination of their functions.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A. Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| |
Collapse
|
8
|
Barbosa Pereira PJ, Manso JA, Macedo-Ribeiro S. The structural plasticity of polyglutamine repeats. Curr Opin Struct Biol 2023; 80:102607. [PMID: 37178477 DOI: 10.1016/j.sbi.2023.102607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 04/11/2023] [Accepted: 04/12/2023] [Indexed: 05/15/2023]
Abstract
From yeast to humans, polyglutamine (polyQ) repeat tracts are found frequently in the proteome and are particularly prominent in the activation domains of transcription factors. PolyQ is a polymorphic motif that modulates functional protein-protein interactions and aberrant self-assembly. Expansion of the polyQ repeated sequences beyond critical physiological repeat length thresholds triggers self-assembly and is linked to severe pathological implications. This review provides an overview of the current knowledge on the structures of polyQ tracts in the soluble and aggregated states and discusses the influence of neighboring regions on polyQ secondary structure, aggregation, and fibril morphologies. The influence of the genetic context of the polyQ-encoding trinucleotides is briefly discussed as a challenge for future endeavors in this field.
Collapse
Affiliation(s)
- Pedro José Barbosa Pereira
- IBMC - Instituto de Biologia Molecular e Celular, Universidade do Porto, 4200-135, Porto, Portugal; Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135, Porto, Portugal.
| | - José A Manso
- IBMC - Instituto de Biologia Molecular e Celular, Universidade do Porto, 4200-135, Porto, Portugal; Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135, Porto, Portugal
| | - Sandra Macedo-Ribeiro
- IBMC - Instituto de Biologia Molecular e Celular, Universidade do Porto, 4200-135, Porto, Portugal; Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135, Porto, Portugal
| |
Collapse
|
9
|
Erdozain S, Barrionuevo E, Ripoll L, Mier P, Andrade-Navarro MA. Protein repeats evolve and emerge in giant viruses. J Struct Biol 2023; 215:107962. [PMID: 37031868 DOI: 10.1016/j.jsb.2023.107962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 03/21/2023] [Accepted: 04/04/2023] [Indexed: 04/11/2023]
Abstract
Nucleocytoplasmatic large DNA viruses (NCLDVs or giant viruses) stand out because of their relatively large genomes encoding hundreds of proteins. These species give us an unprecedented opportunity to study the emergence and evolution of repeats in protein sequences. On the one hand, as viruses, these species have a restricted set of functions, which can help us better define the functional landscape of repeats. On the other hand, given the particular use of the genetic machinery of the host, it is worth asking whether this allows the variations of genetic material that lead to repeats in non-viral species. To support research in the characterization of repeat protein evolution and function, we present here an analysis focused on the repeat proteins of giant viruses, namely tandem repeats (TRs), short repeats (SRs), and homorepeats (polyX). Proteins with large and short repeats are not very frequent in non-eukaryotic organisms because of the difficulties that their folding may entail; however, their presence in giant viruses remarks their advantage for performance in the protein environment of the eukaryotic host. The heterogeneous content of these TRs, SRs and polyX in some viruses hints at diverse needs. Comparisons to homologs suggest that the mechanisms that generate these repeats are extensively used by some of these viruses, but also their capacity to adopt genes with repeats. Giant viruses could be very good models for the study of the emergence and evolution of protein repeats.
Collapse
Affiliation(s)
- Sofía Erdozain
- Instituto de Biotecnología y Biología Molecular, Departamento de Ciencias Biológicas, Facultad de Ciencias Exactas, Universidad Nacional de La Plata, Argentina
| | - Emilia Barrionuevo
- Laboratory of Bioactive Research and Development, Faculty of Exact Sciences, National University of La Plata, Argentina
| | - Lucas Ripoll
- Laboratory of Genetic Engineering, Cell, and Molecular Biology, National University of Quilmes, Argentina
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | | |
Collapse
|
10
|
Petrzilek J, Pasulka J, Malik R, Horvat F, Kataruka S, Fulka H, Svoboda P. De novo emergence, existence, and demise of a protein-coding gene in murids. BMC Biol 2022; 20:272. [PMID: 36482406 PMCID: PMC9733328 DOI: 10.1186/s12915-022-01470-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 11/15/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Genes, principal units of genetic information, vary in complexity and evolutionary history. Less-complex genes (e.g., long non-coding RNA (lncRNA) expressing genes) readily emerge de novo from non-genic sequences and have high evolutionary turnover. Genesis of a gene may be facilitated by adoption of functional genic sequences from retrotransposon insertions. However, protein-coding sequences in extant genomes rarely lack any connection to an ancestral protein-coding sequence. RESULTS We describe remarkable evolution of the murine gene D6Ertd527e and its orthologs in the rodent Muroidea superfamily. The D6Ertd527e emerged in a common ancestor of mice and hamsters most likely as a lncRNA-expressing gene. A major contributing factor was a long terminal repeat (LTR) retrotransposon insertion carrying an oocyte-specific promoter and a 5' terminal exon of the gene. The gene survived as an oocyte-specific lncRNA in several extant rodents while in some others the gene or its expression were lost. In the ancestral lineage of Mus musculus, the gene acquired protein-coding capacity where the bulk of the coding sequence formed through CAG (AGC) trinucleotide repeat expansion and duplications. These events generated a cytoplasmic serine-rich maternal protein. Knock-out of D6Ertd527e in mice has a small but detectable effect on fertility and the maternal transcriptome. CONCLUSIONS While this evolving gene is not showing a clear function in laboratory mice, its documented evolutionary history in Muroidea during the last ~ 40 million years provides a textbook example of how a several common mutation events can support de novo gene formation, evolution of protein-coding capacity, as well as gene's demise.
Collapse
Affiliation(s)
- Jan Petrzilek
- Institute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20, Prague 4, Czech Republic
- Present address: Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, Vienna, Austria
| | - Josef Pasulka
- Institute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20, Prague 4, Czech Republic
| | - Radek Malik
- Institute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20, Prague 4, Czech Republic
| | - Filip Horvat
- Institute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20, Prague 4, Czech Republic
- Bioinformatics Group, Division of Biology, Faculty of Science, University of Zagreb, Horvatovac 102a, 10000, Zagreb, Croatia
| | - Shubhangini Kataruka
- Institute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20, Prague 4, Czech Republic
- Present address: Department of Genetics, Yale School of Medicine, New Haven, CT, 06510, USA
| | - Helena Fulka
- Institute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20, Prague 4, Czech Republic
- Current address: Institute of Experimental Medicine of the Czech Academy of Sciences, Videnska 1083, 142 20, Prague 4, Czech Republic
| | - Petr Svoboda
- Institute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20, Prague 4, Czech Republic.
| |
Collapse
|
11
|
Shukla S, Lazarchuk P, Pavlova MN, Sidorova JM. Genome-wide survey of D/E repeats in human proteins uncovers their instability and aids in identifying their role in the chromatin regulator ATAD2. iScience 2022; 25:105464. [PMCID: PMC9672403 DOI: 10.1016/j.isci.2022.105464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 08/03/2022] [Accepted: 10/26/2022] [Indexed: 11/15/2022] Open
Abstract
D/E repeats are stretches of aspartic and/or glutamic acid residues found in over 150 human proteins. We examined genomic stability of D/E repeats and functional characteristics of D/E repeat-containing proteins vis-à-vis the proteins with poly-Q or poly-A repeats, which are known to undergo pathologic expansions. Mining of tumor sequencing data revealed that D/E repeat-coding regions are similar to those coding poly-Qs and poly-As in increased incidence of trinucleotide insertions/deletions but differ in types and incidence of substitutions. D/E repeat-containing proteins preferentially function in chromatin metabolism and are the more likely to be nuclear and interact with core histones, the longer their repeats are. One of the longest D/E repeats of unknown function is in ATAD2, a bromodomain family ATPase frequently overexpressed in tumors. We demonstrate that D/E repeat deletion in ATAD2 suppresses its binding to nascent and mature chromatin and to the constitutive pericentromeric heterochromatin, where ATAD2 represses satellite transcription. Many human proteins contain runs of aspartic/glutamic acid residues (D/E repeats) D/E repeats show increased incidence of in-frame insertions/deletions in tumors Nuclear and histone-interacting proteins often have long D/E repeats D/E repeat of the oncogene ATAD2 controls its binding to pericentric chromatin
Collapse
Affiliation(s)
- Shalabh Shukla
- Department of Laboratory Medicine and Pathology, University of Washington, 1959 NE Pacific St., Box 357705, Seattle, WA 98195, USA
| | - Pavlo Lazarchuk
- Department of Laboratory Medicine and Pathology, University of Washington, 1959 NE Pacific St., Box 357705, Seattle, WA 98195, USA
| | - Maria N. Pavlova
- Department of Laboratory Medicine and Pathology, University of Washington, 1959 NE Pacific St., Box 357705, Seattle, WA 98195, USA
| | - Julia M. Sidorova
- Department of Laboratory Medicine and Pathology, University of Washington, 1959 NE Pacific St., Box 357705, Seattle, WA 98195, USA
- Corresponding author
| |
Collapse
|
12
|
Mier P, Elena-Real CA, Cortés J, Bernadó P, Andrade-Navarro MA. The sequence context in poly-alanine regions: structure, function and conservation. Bioinformatics 2022; 38:4851-4858. [PMID: 36106994 PMCID: PMC9620824 DOI: 10.1093/bioinformatics/btac610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 07/07/2022] [Accepted: 09/05/2022] [Indexed: 11/24/2022] Open
Abstract
MOTIVATION Poly-alanine (polyA) regions are protein stretches mostly composed of alanines. Despite their abundance in eukaryotic proteomes and their association to nine inherited human diseases, the structural and functional roles exerted by polyA stretches remain poorly understood. In this work we study how the amino acid context in which polyA regions are settled in proteins influences their structure and function. RESULTS We identified glycine and proline as the most abundant amino acids within polyA and in the flanking regions of polyA tracts, in human proteins as well as in 17 additional eukaryotic species. Our analyses indicate that the non-structuring nature of these two amino acids influences the α-helical conformations predicted for polyA, suggesting a relevant role in reducing the inherent aggregation propensity of long polyA. Then, we show how polyA position in protein N-termini relates with their function as transit peptides. PolyA placed just after the initial methionine is often predicted as part of mitochondrial transit peptides, whereas when placed in downstream positions, polyA are part of signal peptides. A few examples from known structures suggest that short polyA can emerge by alanine substitutions in α-helices; but evolution by insertion is observed for longer polyA. Our results showcase the importance of studying the sequence context of homorepeats as a mechanism to shape their structure-function relationships. AVAILABILITY AND IMPLEMENTATION The datasets used and/or analyzed during the current study are available from the corresponding author onreasonable request. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, 55128 Mainz, Germany
| | - Carlos A Elena-Real
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS, 34090 Montpellier, France
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| | - Pau Bernadó
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS, 34090 Montpellier, France
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, 55128 Mainz, Germany
| |
Collapse
|
13
|
Mier P, Andrade-Navarro MA. Regions with two amino acids in protein sequences: a step forward from homorepeats into the low complexity landscape. Comput Struct Biotechnol J 2022; 20:5516-5523. [PMID: 36249567 PMCID: PMC9550522 DOI: 10.1016/j.csbj.2022.09.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 09/07/2022] [Accepted: 09/07/2022] [Indexed: 11/17/2022] Open
Abstract
Low complexity regions (LCRs) differ in amino acid composition from the background provided by the corresponding proteomes. The simplest LCRs are homorepeats (or polyX), regions composed of mostly-one amino acid type. Extensive research has been done to characterize homorepeats, and their taxonomic, functional and structural features depend on the amino acid type and sequence context. From them, the next step towards the study of LCRs are the regions composed of two types of amino acids, which we call polyXY. We classify polyXY in three categories based on the arrangement of the two amino acid types ‘X’ and ‘Y’: direpeats (e.g. ‘XYXYXY’), joined (e.g. ‘XXXYYY’) and shuffled (e.g. ‘XYYXXY’). We developed a script to search for polyXY, and located them in a comprehensive set of 20,340 reference proteomes. These results are available in a dedicated web server called XYs, in which the user can also submit their own protein datasets to detect polyXY. We studied the distribution of polyXY types by amino acid pair XY and category, and show that polyXY in Eukaryota are mainly located within intrinsically disordered regions. Our study provides a first step towards the characterization of polyXY as protein motifs.
Collapse
Affiliation(s)
- Pablo Mier
- Corresponding author at: Hanns-Dieter-Hüsch-Weg 15 55118 Mainz (Germany).
| | | |
Collapse
|
14
|
Low Complexity Induces Structure in Protein Regions Predicted as Intrinsically Disordered. Biomolecules 2022; 12:biom12081098. [PMID: 36008992 PMCID: PMC9405754 DOI: 10.3390/biom12081098] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 08/02/2022] [Accepted: 08/06/2022] [Indexed: 01/02/2023] Open
Abstract
There is increasing evidence that many intrinsically disordered regions (IDRs) in proteins play key functional roles through interactions with other proteins or nucleic acids. These interactions often exhibit a context-dependent structural behavior. We hypothesize that low complexity regions (LCRs), often found within IDRs, could have a role in inducing local structure in IDRs. To test this, we predicted IDRs in the human proteome and analyzed their structures or those of homologous sequences in the Protein Data Bank (PDB). We then identified two types of simple LCRs within IDRs: regions with only one (polyX or homorepeats) or with only two types of amino acids (polyXY). We were able to assign structural information from the PDB more often to these LCRs than to the surrounding IDRs (polyX 61.8% > polyXY 50.5% > IDRs 39.7%). The most frequently observed polyX and polyXY within IDRs contained E (Glu) or G (Gly). Structural analyses of these sequences and of homologs indicate that polyEK regions induce helical conformations, while the other most frequent LCRs induce coil structures. Our work proposes bioinformatics methods to help in the study of the structural behavior of IDRs and provides a solid basis suggesting a structuring role of LCRs within them.
Collapse
|
15
|
Mier P, Andrade-Navarro MA. PolyX2: Fast Detection of Homorepeats in Large Protein Datasets. Genes (Basel) 2022; 13:758. [PMID: 35627143 PMCID: PMC9141109 DOI: 10.3390/genes13050758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 04/22/2022] [Accepted: 04/22/2022] [Indexed: 12/03/2022] Open
Abstract
Homorepeat sequences, consecutive runs of identical amino acids, are prevalent in eukaryotic proteins. It has become necessary to annotate and evaluate this feature in entire proteomes. The definition of what constitutes a homorepeat is not fixed, and different research approaches may require different definitions; therefore, flexible approaches to analyze homorepeats in complete proteomes are needed. Here, we present polyX2, a fast, simple but tunable script to scan protein datasets for all possible homorepeats. The user can modify the length of the window to scan, the minimum number of identical residues that must be found in the window, and the types of homorepeats to be found.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, 55128 Mainz, Germany;
| | | |
Collapse
|
16
|
Mier P, Paladin L, Tamana S, Petrosian S, Hajdu-Soltész B, Urbanek A, Gruca A, Plewczynski D, Grynberg M, Bernadó P, Gáspári Z, Ouzounis CA, Promponas VJ, Kajava AV, Hancock JM, Tosatto SCE, Dosztanyi Z, Andrade-Navarro MA. Disentangling the complexity of low complexity proteins. Brief Bioinform 2021; 21:458-472. [PMID: 30698641 PMCID: PMC7299295 DOI: 10.1093/bib/bbz007] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 12/19/2018] [Accepted: 01/07/2019] [Indexed: 12/31/2022] Open
Abstract
There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, and more generally the overlaps between different properties related to LCRs, using examples. We argue that statistical measures alone cannot capture all structural aspects of LCRs and recommend the combined usage of a variety of predictive tools and measurements. While the methodologies available to study LCRs are already very advanced, we foresee that a more comprehensive annotation of sequences in the databases will enable the improvement of predictions and a better understanding of the evolution and the connection between structure and function of LCRs. This will require the use of standards for the generation and exchange of data describing all aspects of LCRs. Short abstract There are multiple definitions for low complexity regions (LCRs) in protein sequences. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, plus overlaps between different properties related to LCRs, using examples.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, Mainz, Germany
| | - Lisanna Paladin
- Department of Biomedical Science, University of Padova, Padova, Italy
| | - Stella Tamana
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Sophia Petrosian
- Biological Computation and Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica, Greece
| | - Borbála Hajdu-Soltész
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Annika Urbanek
- Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, Montpellier, France
| | - Aleksandra Gruca
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland
| | - Dariusz Plewczynski
- Center of New Technologies, University of Warsaw, Warsaw, Poland.,Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | | | - Pau Bernadó
- Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, Montpellier, France
| | - Zoltán Gáspári
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
| | - Christos A Ouzounis
- Biological Computation and Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica, Greece
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Andrey V Kajava
- Centre de Recherche en Biologie Cellulaire de Montpellier, CNRS-UMR, Institut de Biologie Computationnelle, Universite de Montpellier, Montpellier, France.,Institute of Bioengineering, University ITMO, St. Petersburg, Russia
| | - John M Hancock
- Earlham Institute, Norwich, UK.,ELIXIR Hub, Welcome Genome Campus, Hinxton, UK
| | - Silvio C E Tosatto
- Department of Biomedical Science, University of Padova, Padova, Italy.,CNR Institute of Neuroscience, Padova, Italy
| | - Zsuzsanna Dosztanyi
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, Mainz, Germany
| |
Collapse
|
17
|
Alanis-Lobato G, Möllmann JS, Schaefer MH, Andrade-Navarro MA. MIPPIE: the mouse integrated protein-protein interaction reference. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2020:5850252. [PMID: 32496562 PMCID: PMC7271249 DOI: 10.1093/database/baaa035] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 04/02/2020] [Accepted: 04/29/2020] [Indexed: 12/13/2022]
Abstract
Cells operate and react to environmental signals thanks to a complex network of protein–protein interactions (PPIs), the malfunction of which can severely disrupt cellular homeostasis. As a result, mapping and analyzing protein networks are key to advancing our understanding of biological processes and diseases. An invaluable part of these endeavors has been the house mouse (Mus musculus), the mammalian model organism par excellence, which has provided insights into human biology and disorders. The importance of investigating PPI networks in the context of mouse prompted us to develop the Mouse Integrated Protein–Protein Interaction rEference (MIPPIE). MIPPIE inherits a robust infrastructure from HIPPIE, its sister database of human PPIs, allowing for the assembly of reliable networks supported by different evidence sources and high-quality experimental techniques. MIPPIE networks can be further refined with tissue, directionality and effect information through a user-friendly web interface. Moreover, all MIPPIE data and meta-data can be accessed via a REST web service or downloaded as text files, thus facilitating the integration of mouse PPIs into follow-up bioinformatics pipelines.
Collapse
Affiliation(s)
- Gregorio Alanis-Lobato
- Faculty of Biology, Johannes Gutenberg University, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany.,Human Embryo and Stem Cell Laboratory, The Francis Crick Institute, 1 Midland Road, NW1 1AT London, UK
| | - Jannik S Möllmann
- Faculty of Biology, Johannes Gutenberg University, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Martin H Schaefer
- Department of Experimental Oncology, European Institute of Oncology IRCCS, Via Adamello 16, 20139 Milan, Italy
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| |
Collapse
|
18
|
Homopeptide and homocodon levels across fungi are coupled to GC/AT-bias and intrinsic disorder, with unique behaviours for some amino acids. Sci Rep 2021; 11:10025. [PMID: 33976321 PMCID: PMC8113271 DOI: 10.1038/s41598-021-89650-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 04/22/2021] [Indexed: 11/09/2022] Open
Abstract
Homopeptides (runs of one amino-acid type) are evolutionarily important since they are prone to expand/contract during DNA replication, recombination and repair. To gain insight into the genomic/proteomic traits driving their variation, we analyzed how homopeptides and homocodons (which are pure codon repeats) vary across 405 Dikarya, and probed their linkage to genome GC/AT bias and other factors. We find that amino-acid homopeptide frequencies vary diversely between clades, with the AT-rich Saccharomycotina trending distinctly. As organisms evolve, homocodon and homopeptide numbers are majorly coupled to GC/AT-bias, exhibiting a bi-furcated correlation with degree of AT- or GC-bias. Mid-GC/AT genomes tend to have markedly fewer simply because they are mid-GC/AT. Despite these trends, homopeptides tend to be GC-biased relative to other parts of coding sequences, even in AT-rich organisms, indicating they absorb AT bias less or are inherently more GC-rich. The most frequent and most variable homopeptide amino acids favour intrinsic disorder, and there are an opposing correlation and anti-correlation versus homopeptide levels for intrinsic disorder and structured-domain content respectively. Specific homopeptides show unique behaviours that we suggest are linked to inherent slippage probabilities during DNA replication and recombination, such as poly-glutamine, which is an evolutionarily very variable homopeptide with a codon repertoire unbiased for GC/AT, and poly-lysine whose homocodons are overwhelmingly made from the codon AAG.
Collapse
|
19
|
Kastano K, Mier P, Andrade-Navarro MA. The Role of Low Complexity Regions in Protein Interaction Modes: An Illustration in Huntingtin. Int J Mol Sci 2021; 22:1727. [PMID: 33572172 PMCID: PMC7915032 DOI: 10.3390/ijms22041727] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 01/25/2021] [Accepted: 02/04/2021] [Indexed: 12/11/2022] Open
Abstract
Low complexity regions (LCRs) are very frequent in protein sequences, generally having a lower propensity to form structured domains and tending to be much less evolutionarily conserved than globular domains. Their higher abundance in eukaryotes and in species with more cellular types agrees with a growing number of reports on their function in protein interactions regulated by post-translational modifications. LCRs facilitate the increase of regulatory and network complexity required with the emergence of organisms with more complex tissue distribution and development. Although the low conservation and structural flexibility of LCRs complicate their study, evolutionary studies of proteins across species have been used to evaluate their significance and function. To investigate how to apply this evolutionary approach to the study of LCR function in protein-protein interactions, we performed a detailed analysis for Huntingtin (HTT), a large protein that is a hub for interaction with hundreds of proteins, has a variety of LCRs, and for which partial structural information (in complex with HAP40) is available. We hypothesize that proteins RASA1, SYN2, and KAT2B may compete with HAP40 for their attachment to the core of HTT using similar LCRs. Our results illustrate how evolution might favor the interplay of LCRs with domains, and the possibility of detecting multiple modes of LCR-mediated protein-protein interactions with a large hub such as HTT when enough protein interaction data is available.
Collapse
Affiliation(s)
| | | | - Miguel A. Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany; (K.K.); (P.M.)
| |
Collapse
|
20
|
Mier P, Andrade-Navarro MA. Assessing the low complexity of protein sequences via the low complexity triangle. PLoS One 2020; 15:e0239154. [PMID: 33378336 PMCID: PMC7773278 DOI: 10.1371/journal.pone.0239154] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Accepted: 08/31/2020] [Indexed: 11/24/2022] Open
Abstract
Background Proteins with low complexity regions (LCRs) have atypical sequence and structural features. Their amino acid composition varies from the expected, determined proteome-wise, and they do not follow the rules of structural folding that prevail in globular regions. One way to characterize these regions is by assessing the repeatability of a sequence, that is, calculating the local propensity of a region to be part of a repeat. Results We combine two local measures of low complexity, repeatability (using the RES algorithm) and fraction of the most frequent amino acid, to evaluate different proteomes, datasets of protein regions with specific features, and individual cases of proteins with extreme compositions. We apply a representation called ‘low complexity triangle’ as a proof-of-concept to represent the low complexity measured values. Results show that proteomes have distinct signatures in the low complexity triangle, and that these signatures are associated to complexity features of the sequences. We developed a web tool called LCT (http://cbdm-01.zdv.uni-mainz.de/~munoz/lct/) to allow users to calculate the low complexity triangle of a given protein or region of interest. Conclusions The low complexity triangle proves to be a suitable procedure to represent the general low complexity of a sequence or protein dataset. Homorepeats, direpeats, compositionally biased regions and globular regions occupy characteristic positions in the triangle. The described pipeline can be used to characterize LCRs and may help in quantifying the content of degenerated tandem repeats in proteins and proteomes.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Mainz, Germany
- * E-mail:
| | - Miguel A. Andrade-Navarro
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Mainz, Germany
| |
Collapse
|
21
|
Chavali S, Singh AK, Santhanam B, Babu MM. Amino acid homorepeats in proteins. Nat Rev Chem 2020; 4:420-434. [PMID: 37127972 DOI: 10.1038/s41570-020-0204-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/04/2020] [Indexed: 12/16/2022]
Abstract
Amino acid homorepeats, or homorepeats, are polypeptide segments found in proteins that contain stretches of identical amino acid residues. Although abnormal homorepeat expansions are linked to pathologies such as neurodegenerative diseases, homorepeats are prevalent in eukaryotic proteomes, suggesting that they are important for normal physiology. In this Review, we discuss recent advances in our understanding of the biological functions of homorepeats, which range from facilitating subcellular protein localization to mediating interactions between proteins across diverse cellular pathways. We explore how the functional diversity of homorepeat-containing proteins could be linked to the ability of homorepeats to adopt different structural conformations, an ability influenced by repeat composition, repeat length and the nature of flanking sequences. We conclude by highlighting how an understanding of homorepeats will help us better characterize and develop therapeutics against the human diseases to which they contribute.
Collapse
Affiliation(s)
- Sreenivas Chavali
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, UK.
- Department of Biology, Indian Institute of Science Education and Research (IISER) Tirupati, Tirupati, India.
| | - Anjali K Singh
- Department of Biology, Indian Institute of Science Education and Research (IISER) Tirupati, Tirupati, India
| | - Balaji Santhanam
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, UK
- Department of Structural Biology and Center for Data Driven Discovery, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - M Madan Babu
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, UK.
- Department of Structural Biology and Center for Data Driven Discovery, St. Jude Children's Research Hospital, Memphis, TN, USA.
| |
Collapse
|
22
|
Mier P, Andrade-Navarro MA. The features of polyglutamine regions depend on their evolutionary stability. BMC Evol Biol 2020; 20:59. [PMID: 32448113 PMCID: PMC7247214 DOI: 10.1186/s12862-020-01626-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Accepted: 05/13/2020] [Indexed: 11/29/2022] Open
Abstract
Background Polyglutamine regions (polyQ) are one of the most studied and prevalent homorepeats in eukaryotes. They have a particular length-dependent codon usage, which relates to a characteristic CAG-slippage mechanism. Pathologically expanded tracts of polyQ are known to form aggregates and are involved in the development of several human neurodegenerative diseases. The non-pathogenic function of polyQ is to mediate protein-protein interactions via a coiled-coil pairing with an interactor. They are usually located in a helical context. Results Here we study the stability of polyQ regions in evolution, using a set of 60 proteomes from four distinct taxonomic groups (Insecta, Teleostei, Sauria and Mammalia). The polyQ regions can be distinctly grouped in three categories based on their evolutionary stability: stable, unstable by length variation (inserted), and unstable by mutations (mutated). PolyQ regions in these categories can be significantly distinguished by their glutamine codon usage, and we show that the CAG-slippage mechanism is predominant in inserted polyQ of Sauria and Mammalia. The polyQ amino acid context is also influenced by the polyQ stability, with a higher proportion of proline residues around inserted polyQ. By studying the secondary structure of the sequences surrounding polyQ regions, we found that regarding the structural conformation around a polyQ, its stability category is more relevant than its taxonomic information. The protein-protein interaction capacity of a polyQ is also affected by its stability, as stable polyQ have more interactors than unstable polyQ. Conclusions Our results show that apart from the sequence of a polyQ, information about its orthologous sequences is needed to assess its function. Codon usage, amino acid context, structural conformation and the protein-protein interaction capacity of polyQ from all studied taxa critically depend on the region stability. There are however some taxa-specific polyQ features that override this importance. We conclude that a taxa-driven evolutionary analysis is of the highest importance for the comprehensive study of any feature of polyglutamine regions.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany.
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
| |
Collapse
|
23
|
Urbanek A, Popovic M, Morató A, Estaña A, Elena-Real CA, Mier P, Fournet A, Allemand F, Delbecq S, Andrade-Navarro MA, Cortés J, Sibille N, Bernadó P. Flanking Regions Determine the Structure of the Poly-Glutamine in Huntingtin through Mechanisms Common among Glutamine-Rich Human Proteins. Structure 2020; 28:733-746.e5. [PMID: 32402249 DOI: 10.1016/j.str.2020.04.008] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 02/18/2020] [Accepted: 04/11/2020] [Indexed: 10/24/2022]
Abstract
The causative agent of Huntington's disease, the poly-Q homo-repeat in the N-terminal region of huntingtin (httex1), is flanked by a 17-residue-long fragment (N17) and a proline-rich region (PRR), which promote and inhibit the aggregation propensity of the protein, respectively, by poorly understood mechanisms. Based on experimental data obtained from site-specifically labeled NMR samples, we derived an ensemble model of httex1 that identified both flanking regions as opposing poly-Q secondary structure promoters. While N17 triggers helicity through a promiscuous hydrogen bond network involving the side chains of the first glutamines in the poly-Q tract, the PRR promotes extended conformations in neighboring glutamines. Furthermore, a bioinformatics analysis of the human proteome showed that these structural traits are present in many human glutamine-rich proteins and that they are more prevalent in proteins with longer poly-Q tracts. Taken together, these observations provide the structural bases to understand previous biophysical and functional data on httex1.
Collapse
Affiliation(s)
- Annika Urbanek
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Matija Popovic
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Anna Morató
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Alejandro Estaña
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France; LAAS-CNRS, Université de Toulouse, CNRS, 31400 Toulouse, France
| | - Carlos A Elena-Real
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | - Aurélie Fournet
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Frédéric Allemand
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Stephane Delbecq
- Laboratoire de Biologie Cellulaire et Moléculaire (LBCM-EA4558 Vaccination Antiparasitaire), UFR Pharmacie, Université de Montpellier, 34090 Montpellier, France
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, 31400 Toulouse, France
| | - Nathalie Sibille
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Pau Bernadó
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France.
| |
Collapse
|
24
|
Urbanek A, Popovic M, Elena-Real CA, Morató A, Estaña A, Fournet A, Allemand F, Gil AM, Cativiela C, Cortés J, Jiménez AI, Sibille N, Bernadó P. Evidence of the Reduced Abundance of Proline cis Conformation in Protein Poly Proline Tracts. J Am Chem Soc 2020; 142:7976-7986. [PMID: 32266815 DOI: 10.1021/jacs.0c02263] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Proline is found in a cis conformation in proteins more often than other proteinogenic amino acids, where it influences structure and modulates function, being the focus of several high-resolution structural studies. However, until now, technical and methodological limitations have hampered the site-specific investigation of the conformational preferences of prolines present in poly proline (poly-P) homorepeats in their protein context. Here, we apply site-specific isotopic labeling to obtain high-resolution NMR data on the cis/trans equilibrium of prolines within the poly-P repeats of huntingtin exon 1, the causative agent of Huntington's disease. Screening prolines in different positions in long (poly-P11) and short (poly-P3) poly-P tracts, we found that, while the first proline of poly-P tracts adopts similar levels of cis conformation as isolated prolines, a length-dependent reduced abundance of cis conformers is observed for terminal prolines. Interestingly, the cis isomer could not be detected in inner prolines, in line with percentages derived from a large database of proline-centered tripeptides extracted from crystallographic structures. These results suggest a strong cooperative effect within poly-Ps that enhances their stiffness by diminishing the stability of the cis conformation. This rigidity is key to rationalizing the protection toward aggregation that the poly-P tract confers to huntingtin. Furthermore, the study provides new avenues to probe the structural properties of poly-P tracts in protein design as scaffolds or nanoscale rulers.
Collapse
Affiliation(s)
- Annika Urbanek
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier. 29, rue de Navacelles, 34090 Montpellier, France
| | - Matija Popovic
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier. 29, rue de Navacelles, 34090 Montpellier, France
| | - Carlos A Elena-Real
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier. 29, rue de Navacelles, 34090 Montpellier, France
| | - Anna Morató
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier. 29, rue de Navacelles, 34090 Montpellier, France
| | - Alejandro Estaña
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier. 29, rue de Navacelles, 34090 Montpellier, France.,LAAS-CNRS, Université de Toulouse, CNRS, 7 Avenue du Colonel Roche, 31400 Toulouse, France
| | - Aurélie Fournet
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier. 29, rue de Navacelles, 34090 Montpellier, France
| | - Frédéric Allemand
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier. 29, rue de Navacelles, 34090 Montpellier, France
| | - Ana M Gil
- Departamento de Quı́mica Orgánica, Instituto de Sı́ntesis Quı́mica y Catálisis Homogénea (ISQCH), CSIC-Universidad de Zaragoza, 50009 Zaragoza, Spain
| | - Carlos Cativiela
- Departamento de Quı́mica Orgánica, Instituto de Sı́ntesis Quı́mica y Catálisis Homogénea (ISQCH), CSIC-Universidad de Zaragoza, 50009 Zaragoza, Spain
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, 7 Avenue du Colonel Roche, 31400 Toulouse, France
| | - Ana I Jiménez
- Departamento de Quı́mica Orgánica, Instituto de Sı́ntesis Quı́mica y Catálisis Homogénea (ISQCH), CSIC-Universidad de Zaragoza, 50009 Zaragoza, Spain
| | - Nathalie Sibille
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier. 29, rue de Navacelles, 34090 Montpellier, France
| | - Pau Bernadó
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier. 29, rue de Navacelles, 34090 Montpellier, France
| |
Collapse
|
25
|
Pelassa I, Cibelli M, Villeri V, Lilliu E, Vaglietti S, Olocco F, Ghirardi M, Montarolo PG, Corà D, Fiumara F. Compound Dynamics and Combinatorial Patterns of Amino Acid Repeats Encode a System of Evolutionary and Developmental Markers. Genome Biol Evol 2020; 11:3159-3178. [PMID: 31589292 PMCID: PMC6839033 DOI: 10.1093/gbe/evz216] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/27/2019] [Indexed: 01/05/2023] Open
Abstract
Homopolymeric amino acid repeats (AARs) like polyalanine (polyA) and polyglutamine (polyQ) in some developmental proteins (DPs) regulate certain aspects of organismal morphology and behavior, suggesting an evolutionary role for AARs as developmental "tuning knobs." It is still unclear, however, whether these are occasional protein-specific phenomena or hints at the existence of a whole AAR-based regulatory system in DPs. Using novel approaches to trace their functional and evolutionary history, we find quantitative evidence supporting a generalized, combinatorial role of AARs in developmental processes with evolutionary implications. We observe nonrandom AAR distributions and combinations in HOX and other DPs, as well as in their interactomes, defining elements of a proteome-wide combinatorial functional code whereby different AARs and their combinations appear preferentially in proteins involved in the development of specific organs/systems. Such functional associations can be either static or display detectable evolutionary dynamics. These findings suggest that progressive changes in AAR occurrence/combination, by altering embryonic development, may have contributed to taxonomic divergence, leaving detectable traces in the evolutionary history of proteomes. Consistent with this hypothesis, we find that the evolutionary trajectories of the 20 AARs in eukaryotic proteomes are highly interrelated and their individual or compound dynamics can sharply mark taxonomic boundaries, or display clock-like trends, carrying overall a strong phylogenetic signal. These findings provide quantitative evidence and an interpretive framework outlining a combinatorial system of AARs whose compound dynamics mark at the same time DP functions and evolutionary transitions.
Collapse
Affiliation(s)
- Ilaria Pelassa
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy
| | - Marica Cibelli
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy
| | - Veronica Villeri
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy
| | - Elena Lilliu
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy
| | - Serena Vaglietti
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy
| | - Federica Olocco
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy
| | - Mirella Ghirardi
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy.,National Institute of Neuroscience (INN), Torino, Italy
| | - Pier Giorgio Montarolo
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy.,National Institute of Neuroscience (INN), Torino, Italy
| | - Davide Corà
- Department of Translational Medicine, Piemonte Orientale University, Novara, Italy.,Center for Translational Research on Autoimmune and Allergic Disease (CAAD), Novara, Italy
| | - Ferdinando Fiumara
- Department of Neuroscience Rita Levi Montalcini, University of Torino, Italy.,National Institute of Neuroscience (INN), Torino, Italy
| |
Collapse
|
26
|
Mier P, Elena-Real C, Urbanek A, Bernadó P, Andrade-Navarro MA. The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context. Comput Struct Biotechnol J 2020; 18:306-313. [PMID: 32071707 PMCID: PMC7016039 DOI: 10.1016/j.csbj.2020.01.012] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 12/13/2019] [Accepted: 01/30/2020] [Indexed: 12/18/2022] Open
Abstract
Polyglutamine (polyQ) regions are one of the most prevalent homorepeats in eukaryotes. It is however difficult to evaluate their prevalence because various studies claim different results. The reason is the lack of a consensus to define what is indeed a polyQ region. We have tackled this issue by studying how the use of different thresholds (i.e., minimum number of glutamines required in a protein region of a given size), to detect polyQ regions in the human proteome influences not only their prevalence but also their general features and sequence context. Threshold definition shapes the length distribution of the polyQ dataset, and changes the observed number and position of impurities (amino acids other than glutamine) within polyQ regions. Irrespective of the chosen threshold, leucine and proline residues are enriched both within and around polyQ. While leucine is enriched at the N-terminus of polyQ and specially at position -1 (amino acid preceding the polyQ), proline is prevalent in the C-terminus (positions +1 to +5, that is, the first five amino acids after the polyQ). We also checked the suitability of these thresholds for other species, and compared their polyQ features with those found in humans. As the sequence context and features of polyQ regions are threshold-dependent, we propose a method to quickly scan the polyQ landscape of a proteome. We complement our results with a summarized overview about which biases are to be expected per threshold when studying polyQ regions.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Carlos Elena-Real
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090 Montpellier, France
| | - Annika Urbanek
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090 Montpellier, France
| | - Pau Bernadó
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090 Montpellier, France
| | - Miguel A. Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| |
Collapse
|
27
|
Urbanek A, Elena-Real CA, Popovic M, Morató A, Fournet A, Allemand F, Delbecq S, Sibille N, Bernadó P. Site-Specific Isotopic Labeling (SSIL): Access to High-Resolution Structural and Dynamic Information in Low-Complexity Proteins. Chembiochem 2019; 21:769-775. [PMID: 31697025 DOI: 10.1002/cbic.201900583] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 11/05/2019] [Indexed: 12/17/2022]
Abstract
Remarkable technical progress in the area of structural biology has paved the way to study previously inaccessible targets. For example, large protein complexes can now be easily investigated by cryo-electron microscopy, and modern high-field NMR magnets have challenged the limits of high-resolution characterization of proteins in solution. However, the structural and dynamic characteristics of certain proteins with important functions still cannot be probed by conventional methods. These proteins in question contain low-complexity regions (LCRs), compositionally biased sequences where only a limited number of amino acids is repeated multiple times, which hamper their characterization. This Concept article describes a site-specific isotopic labeling (SSIL) strategy, which combines nonsense suppression and cell-free protein synthesis to overcome these limitations. An overview on how poly-glutamine tracts were made amenable to high-resolution structural studies is used to illustrate the usefulness of SSIL. Furthermore, we discuss the potential of this methodology to give further insights into the roles of LCRs in human pathologies and liquid-liquid phase separation, as well as the challenges that must be addressed in the future for the popularization of SSIL.
Collapse
Affiliation(s)
- Annika Urbanek
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090, Montpellier, France
| | - Carlos A Elena-Real
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090, Montpellier, France
| | - Matija Popovic
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090, Montpellier, France
| | - Anna Morató
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090, Montpellier, France
| | - Aurélie Fournet
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090, Montpellier, France
| | - Frédéric Allemand
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090, Montpellier, France
| | - Stephane Delbecq
- Laboratoire de Biologie Cellulaire et Moléculaire, (LBCM-EA4558 Vaccination Antiparasitaire), UFR Pharmacie, Université de Montpellier, 15, Av. Charles Flahault, BP 14491, 34000, Montpellier, France
| | - Nathalie Sibille
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090, Montpellier, France
| | - Pau Bernadó
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090, Montpellier, France
| |
Collapse
|
28
|
Repeatability in protein sequences. J Struct Biol 2019; 208:86-91. [PMID: 31408700 DOI: 10.1016/j.jsb.2019.08.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 08/06/2019] [Accepted: 08/08/2019] [Indexed: 02/07/2023]
Abstract
Low complexity regions (LCRs) in protein sequences have special properties that are very different from those of globular proteins. The rules that define secondary structure elements do not apply when the distribution of amino acids becomes biased. While there is a tendency towards structural disorder in LCRs, various examples, and particularly homorepeats of single amino acids, suggest that very short repeats could adopt structures very difficult to predict. These structures are possibly variable and dependant on the context of intra- or inter-molecular interactions. In general, short repeats in LCRs can induce structure. This could explain the observation that very short (non-perfect) repeats are widespread and many define regions with a function in protein interactions. For these reasons, we have developed an algorithm to quickly analyze local repeatability along protein sequences, that is, how close a protein fragment is from a perfect repeat. Using this algorithm we identified that the proteins of the yeast Saccharomyces cerevisiae are depleted in short repeats (approximate or not) of odd-length, while the human proteins are not, that the fish Danio rerio has many proteins with repeats of length two and that the plant Arabidopsis thaliana has an unusually large amount of repeats of length seven. Our method (REpeatability Scanner, RES, accessible at http://cbdm-01.zdv.uni-mainz.de/~munoz/res/) allows to find regions with approximate short repeats in protein sequences, and helps to characterize the variable use of LCRs and compositional bias in different organisms.
Collapse
|
29
|
Abstract
C-terminal polylysine (PL) can be synthesized from the polyadenine tail of prematurely cleaved mRNAs or when a read-though of a stop codon happens. Due to the highly positive charge, PL stalls in the electrostatically negative ribosomal exit channel. The stalled polypeptide recruits the Ribosome-associated quality control (RQC) complex which processes and extracts the nascent chain. Dysfunction of the RQC leads to the accumulation of PL-tagged proteins, induction of a stress response, and cellular toxicity. Not much is known about the PL-specific aspect of protein quality control. Using quantitative mass spectrometry, we uncovered the post-ribosomal PL-processing machinery in human cytosol. It encompasses key cytosolic complexes of the proteostasis network, such as chaperonin TCP-1 ring complexes (TRiC) and half-capped 19S-20S proteasomes. Furthermore, we found that the nuclear transport machinery associates with PL, which suggests a novel mechanism by which faulty proteins can be compartmentalized in the cell. The enhanced nuclear import of a PL-tagged polypeptide confirmed this implication, which leads to questions regarding the biological rationale behind it.
Collapse
|
30
|
Mier P, Andrade-Navarro MA. Glutamine Codon Usage and polyQ Evolution in Primates Depend on the Q Stretch Length. Genome Biol Evol 2018; 10:816-825. [PMID: 29608721 PMCID: PMC5841385 DOI: 10.1093/gbe/evy046] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/19/2018] [Indexed: 12/16/2022] Open
Abstract
Amino acid usage in a proteome depends mostly on its taxonomy, as it does the codon usage in transcriptomes. Here, we explore the level of variation in the codon usage of a specific amino acid, glutamine, in relation to the number of consecutive glutamine residues. We show that CAG triplets are consistently more abundant in short glutamine homorepeats (polyQ, four to eight residues) than in shorter glutamine stretches (one to three residues), leading to the evolutionary growth of the repeat region in a CAG-dependent manner. The length of orthologous polyQ regions is mostly stable in primates, particularly the short ones. Interestingly, given a short polyQ the CAG usage is higher in unstable-in-length orthologous polyQ regions. This indicates that CAG triplets produce the necessary instability for a glutamine stretch to grow. Proteins related to polyQ-associated diseases behave in a more extreme way, with longer glutamine stretches in human and evolutionarily closer nonhuman primates, and an overall higher CAG usage. In the light of our results, we suggest an evolutionary model to explain the glutamine codon usage in polyQ regions.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Germany
- Institute of Molecular Biology, Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Germany
- Institute of Molecular Biology, Mainz, Germany
| |
Collapse
|
31
|
Barik S. Amino acid repeats avert mRNA folding through conservative substitutions and synonymous codons, regardless of codon bias. Heliyon 2017; 3:e00492. [PMID: 29387823 PMCID: PMC5772840 DOI: 10.1016/j.heliyon.2017.e00492] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Revised: 12/06/2017] [Accepted: 12/13/2017] [Indexed: 11/18/2022] Open
Abstract
A significant number of proteins in all living species contains amino acid repeats (AARs) of various lengths and compositions, many of which play important roles in protein structure and function. Here, I have surveyed select homopolymeric single [(A)n] and double [(AB)n] AARs in the human proteome. A close examination of their codon pattern and analysis of RNA structure propensity led to the following set of empirical rules: (1) One class of amino acid repeats (Class I) uses a mixture of synonymous codons, some of which approximate the codon bias ratio in the overall human proteome; (2) The second class (Class II) disregards the codon bias ratio, and appears to have originated by simple repetition of the same codon (or just a few codons); and finally, (3) In all AARs (including Class I, Class II, and the in-betweens), the codons are chosen in a manner that precludes the formation of RNA secondary structure. It appears that the AAR genes have evolved by orchestrating a balance between codon usage and mRNA secondary structure. The insights gained here should provide a better understanding of AAR evolution and may assist in designing synthetic genes.
Collapse
|
32
|
Intrinsic Disorder in Proteins with Pathogenic Repeat Expansions. Molecules 2017; 22:molecules22122027. [PMID: 29186753 PMCID: PMC6149999 DOI: 10.3390/molecules22122027] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Revised: 11/18/2017] [Accepted: 11/21/2017] [Indexed: 11/18/2022] Open
Abstract
Intrinsically disordered proteins and proteins with intrinsically disordered regions have been shown to be highly prevalent in disease. Furthermore, disease-causing expansions of the regions containing tandem amino acid repeats often push repetitive proteins towards formation of irreversible aggregates. In fact, in disease-relevant proteins, the increased repeat length often positively correlates with the increased aggregation efficiency and the increased disease severity and penetrance, being negatively correlated with the age of disease onset. The major categories of repeat extensions involved in disease include poly-glutamine and poly-alanine homorepeats, which are often times located in the intrinsically disordered regions, as well as repeats in non-coding regions of genes typically encoding proteins with ordered structures. Repeats in such non-coding regions of genes can be expressed at the mRNA level. Although they can affect the expression levels of encoded proteins, they are not translated as parts of an affected protein and have no effect on its structure. However, in some cases, the repetitive mRNAs can be translated in a non-canonical manner, generating highly repetitive peptides of different length and amino acid composition. The repeat extension-caused aggregation of a repetitive protein may represent a pivotal step for its transformation into a proteotoxic entity that can lead to pathology. The goals of this article are to systematically analyze molecular mechanisms of the proteinopathies caused by the poly-glutamine and poly-alanine homorepeat expansion, as well as by the polypeptides generated as a result of the microsatellite expansions in non-coding gene regions and to examine the related proteins. We also present results of the analysis of the prevalence and functional roles of intrinsic disorder in proteins associated with pathological repeat expansions.
Collapse
|