1
|
Kausar MA, Narayan J, Mishra N, Akhter Y, Singh R, Khalifa AM, El-Hag ABM, Ahmed RME, Tyagi N, Mahfooz S. Studying Human Pathogenic Cryptococcus Gattii Lineages by Utilizing Simple Sequence Repeats to Create Diagnostic Markers and Analyzing Diversity. Biochem Genet 2024:10.1007/s10528-024-10812-7. [PMID: 38773043 DOI: 10.1007/s10528-024-10812-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 04/11/2024] [Indexed: 05/23/2024]
Abstract
In this study, we compared the occurrence, relative abundance (RA), and density (RD) of simple sequence repeats (SSRs) among the lineages of human pathogenic Cryptococcus gattii using an in-silico approach to gain a deeper understanding of the structure and evolution of their genomes. C. gattii isolate MF34 showed the highest RA and RD of SSRs in both the genomic and transcriptomic sequences, followed by isolate WM276. In both the genomic (50%) and transcriptomic (65%) sequences, trinucleotide SSRs were the most common SSR class. A motif conservation study found that the isolates had stronger conservation (56.1%) of motifs, with isolate IND107 having the most (5.7%) unique motifs. We discovered the presence of SSRs in genes that are directly or indirectly associated with disease using gene enrichment analysis. Isolate-specific unique motifs identified in this study could be utilized as molecular probes for isolate identification. To improve genetic resources among C. gattii isolates, 6499 primers were developed. These genomic resources developed in this study could help with diversity analysis and the development of isolate-specific markers.
Collapse
Affiliation(s)
- Mohd Adnan Kausar
- Department of Biochemistry, College of Medicine, University of Ha'il, Hail, 2440, Saudi Arabia.
| | - Jitendra Narayan
- CSIR- Institute of Genomics and Integrative Biology, Mall Road, New Delhi, 110007, India
| | - Nishtha Mishra
- Department of Chemistry, Deen Dayal Upadhyaya Gorakhpur University, Gorakhpur, 273009, India
| | - Yusuf Akhter
- Department of Biotechnology, Babasaheb Bhimrao Ambedkar University, Lucknow, 226025, India
| | - Rajeev Singh
- Department of Environmental Science, Jamia Millia Islamia Central University, New Delhi, 110025, India
| | - Amany Mohammed Khalifa
- Department of Pathology, College of Medicine, University of Ha'il, Hail, 2440, Saudi Arabia
| | | | | | - Neetu Tyagi
- Bone Biology Laboratory, Department of Physiology, University of Louisville, Louisville, USA
| | - Sahil Mahfooz
- Department of Industrial Microbiology, Deen Dayal Upadhyaya Gorakhpur University, Gorakhpur, 273009, India.
| |
Collapse
|
2
|
Bezerra-Brandao M, Tunque Cahui RR, Hirsh L. Daisy: An integrated repeat protein curation service. J Struct Biol 2023; 215:108033. [PMID: 37797915 DOI: 10.1016/j.jsb.2023.108033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 09/20/2023] [Accepted: 10/02/2023] [Indexed: 10/07/2023]
Abstract
Tandem repeats in proteins identification, classification and curation is a complex process that requires manual processing from experts, processing power and time. There are recent and relevant advances applying machine learning for protein structure prediction and repeat classification that are useful for this process. However, no service contemplates required databases and software to supplement researching on repeat proteins. In this publication we present Daisy, an integrated repeat protein curation web service. This service can process Protein Data Bank (PDB) and the AlphaFold Database entries for tandem repeats identification. In addition, it uses an algorithm to search a sequence against a library of Pfam hidden Markov model (HMM). Repeat classifications are associated with the identified families through RepeatsDB. This prediction is considered for enhancing the ReUPred algorithm execution and hastening the repeat units identification process. The service can also operate every associated PDB and AlphaFold structure with a UniProt proteome registry. Availability: The Daisy web service is freely accessible at daisy.bioinformatica.org.
Collapse
Affiliation(s)
| | | | - Layla Hirsh
- Department of Engineering, Pontifical Catholic University of Peru, Lima 32, Peru.
| |
Collapse
|
3
|
Monzon AM, Arrías PN, Elofsson A, Mier P, Andrade-Navarro MA, Bevilacqua M, Clementel D, Bateman A, Hirsh L, Fornasari MS, Parisi G, Piovesan D, Kajava AV, Tosatto SCE. A STRP-ed definition of Structured Tandem Repeats in Proteins. J Struct Biol 2023; 215:108023. [PMID: 37652396 DOI: 10.1016/j.jsb.2023.108023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 07/31/2023] [Accepted: 08/28/2023] [Indexed: 09/02/2023]
Abstract
Tandem Repeat Proteins (TRPs) are a class of proteins with repetitive amino acid sequences that have been studied extensively for over two decades. Different features at the level of sequence, structure, function and evolution have been attributed to them by various authors. And yet many of its salient features appear only when looking at specific subclasses of protein tandem repeats. Here, we attempt to rationalize the existing knowledge on Tandem Repeat Proteins (TRPs) by pointing out several dichotomies. The emerging picture is more nuanced than generally assumed and allows us to draw some boundaries of what is not a "proper" TRP. We conclude with an operational definition of a specific subset, which we have denominated STRPs (Structural Tandem Repeat Proteins), which separates a subclass of tandem repeats with distinctive features from several other less well-defined types of repeats. We believe that this definition will help researchers in the field to better characterize the biological meaning of this large yet largely understudied group of proteins.
Collapse
Affiliation(s)
- Alexander Miguel Monzon
- Dept. of Information Engineering, University of Padova, via Giovanni Gradenigo 6/B, 35131 Padova, Italy
| | - Paula Nazarena Arrías
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Arne Elofsson
- Dept. of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Tomtebodavägen 23, 171 21 Solna, Sweden
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Martina Bevilacqua
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Damiano Clementel
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Layla Hirsh
- Dept. of Engineering, Faculty of Science and Engineering, Pontifical Catholic University of Peru, Av. Universitaria 1801 San Miguel, Lima 32, Lima, Peru
| | - Maria Silvina Fornasari
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, 34293 Montpellier, France
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy.
| |
Collapse
|
4
|
Elena-Real CA, Mier P, Sibille N, Andrade-Navarro MA, Bernadó P. Structure-function relationships in protein homorepeats. Curr Opin Struct Biol 2023; 83:102726. [PMID: 37924569 DOI: 10.1016/j.sbi.2023.102726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 10/06/2023] [Accepted: 10/09/2023] [Indexed: 11/06/2023]
Abstract
Homorepeats (or polyX), protein segments containing repetitions of the same amino acid, are abundant in proteomes from all kingdoms of life and are involved in crucial biological functions as well as several neurodegenerative and developmental diseases. Mainly inserted in disordered segments of proteins, the structure/function relationships of homorepeats remain largely unexplored. In this review, we summarize present knowledge for the most abundant homorepeats, highlighting the role of the inherent structure and the conformational influence exerted by their flanking regions. Recent experimental and computational methods enable residue-specific investigations of these regions and promise novel structural and dynamic information for this elusive group of proteins. This information should increase our knowledge about the structural bases of phenomena such as liquid-liquid phase separation and trinucleotide repeat disorders.
Collapse
Affiliation(s)
- Carlos A Elena-Real
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS. 29 rue de Navacelles, 34090 Montpellier, France. https://twitter.com/carloselenareal
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz. Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Nathalie Sibille
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS. 29 rue de Navacelles, 34090 Montpellier, France
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz. Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Pau Bernadó
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS. 29 rue de Navacelles, 34090 Montpellier, France.
| |
Collapse
|
5
|
Mier P, Andrade-Navarro MA. Evolutionary Study of Protein Short Tandem Repeats in Protein Families. Biomolecules 2023; 13:1116. [PMID: 37509152 PMCID: PMC10377733 DOI: 10.3390/biom13071116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 07/06/2023] [Accepted: 07/12/2023] [Indexed: 07/30/2023] Open
Abstract
Tandem repeats in proteins are patterns of residues repeated directly adjacent to each other. The evolution of these repeats can be assessed by using groups of homologous sequences, which can help pointing to events of unit duplication or deletion. High pressure in a protein family for variation of a given type of repeat might point to their function. Here, we propose the analysis of protein families to calculate protein short tandem repeats (pSTRs) in each protein sequence and assess their variability within the family in terms of number of units. To facilitate this analysis, we developed the pSTR tool, a method to analyze the evolution of protein short tandem repeats in a given protein family by pairwise comparisons between evolutionarily related protein sequences. We evaluated pSTR unit number variation in protein families of 12 complete metazoan proteomes. We hypothesize that families with more dynamic ensembles of repeats could reflect particular roles of these repeats in processes that require more adaptability.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, 55128 Mainz, Germany
| |
Collapse
|
6
|
Barbosa Pereira PJ, Manso JA, Macedo-Ribeiro S. The structural plasticity of polyglutamine repeats. Curr Opin Struct Biol 2023; 80:102607. [PMID: 37178477 DOI: 10.1016/j.sbi.2023.102607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 04/11/2023] [Accepted: 04/12/2023] [Indexed: 05/15/2023]
Abstract
From yeast to humans, polyglutamine (polyQ) repeat tracts are found frequently in the proteome and are particularly prominent in the activation domains of transcription factors. PolyQ is a polymorphic motif that modulates functional protein-protein interactions and aberrant self-assembly. Expansion of the polyQ repeated sequences beyond critical physiological repeat length thresholds triggers self-assembly and is linked to severe pathological implications. This review provides an overview of the current knowledge on the structures of polyQ tracts in the soluble and aggregated states and discusses the influence of neighboring regions on polyQ secondary structure, aggregation, and fibril morphologies. The influence of the genetic context of the polyQ-encoding trinucleotides is briefly discussed as a challenge for future endeavors in this field.
Collapse
Affiliation(s)
- Pedro José Barbosa Pereira
- IBMC - Instituto de Biologia Molecular e Celular, Universidade do Porto, 4200-135, Porto, Portugal; Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135, Porto, Portugal.
| | - José A Manso
- IBMC - Instituto de Biologia Molecular e Celular, Universidade do Porto, 4200-135, Porto, Portugal; Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135, Porto, Portugal
| | - Sandra Macedo-Ribeiro
- IBMC - Instituto de Biologia Molecular e Celular, Universidade do Porto, 4200-135, Porto, Portugal; Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135, Porto, Portugal
| |
Collapse
|
7
|
Zhu X, Guo L, Zhu R, Zhou X, Zhang J, Li D, He S, Qiao Y. Phytophthora sojae effector PsAvh113 associates with the soybean transcription factor GmDPB to inhibit catalase-mediated immunity. PLANT BIOTECHNOLOGY JOURNAL 2023. [PMID: 36972124 DOI: 10.1111/pbi.14043] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 02/17/2023] [Accepted: 02/28/2023] [Indexed: 06/18/2023]
Abstract
Phytophthora species are the most destructive plant pathogens worldwide and the main threat to agricultural and natural ecosystems; however, their pathogenic mechanism remains largely unknown. Here, we show that Avh113 effector is required for the virulence of Phytophthora sojae and is important for development of Phytophthora root and stem rot (PRSR) in soybean (Glycine max). Ectopic expression of PsAvh113 enhanced viral and Phytophthora infection in Nicotiana benthamiana. PsAvh113 directly associated with the soybean transcription factor GmDPB, inducing its degradation by the 26S proteasome. The internal repeat 2 (IR2) motif of PsAvh113 was important for its virulence and interaction with GmDPB, while silencing and overexpression of GmDPB in soybean hairy roots altered the resistance to P. sojae. Upon binding to GmDPB, PsAvh113 decreased the transcription of the downstream gene GmCAT1, which acts as a positive regulator of plant immunity. Furthermore, we revealed that PsAvh113 suppressed the GmCAT1-induced cell death by associating with GmDPB, thereby enhancing plant susceptibility to Phytophthora. Together, our findings reveal a vital role of PsAvh113 in inducing PRSR in soybean and offer a novel insight into the interplay between defence and counter-defence during the P. sojae infection of soybean.
Collapse
Affiliation(s)
- Xiaoguo Zhu
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Liang Guo
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Ruiqing Zhu
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Xiaoyi Zhou
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Jianing Zhang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Die Li
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Shidan He
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Yongli Qiao
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai, China
| |
Collapse
|
8
|
Mier P, Elena-Real CA, Cortés J, Bernadó P, Andrade-Navarro MA. The sequence context in poly-alanine regions: structure, function and conservation. Bioinformatics 2022; 38:4851-4858. [PMID: 36106994 PMCID: PMC9620824 DOI: 10.1093/bioinformatics/btac610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 07/07/2022] [Accepted: 09/05/2022] [Indexed: 11/24/2022] Open
Abstract
Motivation Poly-alanine (polyA) regions are protein stretches mostly composed of alanines. Despite their abundance in eukaryotic proteomes and their association to nine inherited human diseases, the structural and functional roles exerted by polyA stretches remain poorly understood. In this work we study how the amino acid context in which polyA regions are settled in proteins influences their structure and function. Results We identified glycine and proline as the most abundant amino acids within polyA and in the flanking regions of polyA tracts, in human proteins as well as in 17 additional eukaryotic species. Our analyses indicate that the non-structuring nature of these two amino acids influences the α-helical conformations predicted for polyA, suggesting a relevant role in reducing the inherent aggregation propensity of long polyA. Then, we show how polyA position in protein N-termini relates with their function as transit peptides. PolyA placed just after the initial methionine is often predicted as part of mitochondrial transit peptides, whereas when placed in downstream positions, polyA are part of signal peptides. A few examples from known structures suggest that short polyA can emerge by alanine substitutions in α-helices; but evolution by insertion is observed for longer polyA. Our results showcase the importance of studying the sequence context of homorepeats as a mechanism to shape their structure–function relationships. Availability and implementation The datasets used and/or analyzed during the current study are available from the corresponding author onreasonable request. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz , 55128 Mainz, Germany
| | - Carlos A Elena-Real
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS , 34090 Montpellier, France
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS , Toulouse, France
| | - Pau Bernadó
- Centre de Biologie Structurale (CBS), Université de Montpellier, INSERM, CNRS , 34090 Montpellier, France
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz , 55128 Mainz, Germany
| |
Collapse
|
9
|
Mier P, Andrade-Navarro MA. Regions with two amino acids in protein sequences: a step forward from homorepeats into the low complexity landscape. Comput Struct Biotechnol J 2022; 20:5516-5523. [PMID: 36249567 PMCID: PMC9550522 DOI: 10.1016/j.csbj.2022.09.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 09/07/2022] [Accepted: 09/07/2022] [Indexed: 11/17/2022] Open
Abstract
Low complexity regions (LCRs) differ in amino acid composition from the background provided by the corresponding proteomes. The simplest LCRs are homorepeats (or polyX), regions composed of mostly-one amino acid type. Extensive research has been done to characterize homorepeats, and their taxonomic, functional and structural features depend on the amino acid type and sequence context. From them, the next step towards the study of LCRs are the regions composed of two types of amino acids, which we call polyXY. We classify polyXY in three categories based on the arrangement of the two amino acid types ‘X’ and ‘Y’: direpeats (e.g. ‘XYXYXY’), joined (e.g. ‘XXXYYY’) and shuffled (e.g. ‘XYYXXY’). We developed a script to search for polyXY, and located them in a comprehensive set of 20,340 reference proteomes. These results are available in a dedicated web server called XYs, in which the user can also submit their own protein datasets to detect polyXY. We studied the distribution of polyXY types by amino acid pair XY and category, and show that polyXY in Eukaryota are mainly located within intrinsically disordered regions. Our study provides a first step towards the characterization of polyXY as protein motifs.
Collapse
Affiliation(s)
- Pablo Mier
- Corresponding author at: Hanns-Dieter-Hüsch-Weg 15 55118 Mainz (Germany).
| | | |
Collapse
|
10
|
Becerra A, Muñoz-Velasco I, Aguilar-Cámara A, Cottom-Salas W, Cruz-González A, Vázquez-Salazar A, Hernández-Morales R, Jácome R, Campillo-Balderas JA, Lazcano A. Two short low complexity regions (LCRs) are hallmark sequences of the Delta SARS-CoV-2 variant spike protein. Sci Rep 2022; 12:936. [PMID: 35042962 PMCID: PMC8766472 DOI: 10.1038/s41598-022-04976-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 01/04/2022] [Indexed: 11/24/2022] Open
Abstract
Low complexity regions (LCRs) are protein sequences formed by a set of compositionally biased residues. LCRs are extremely abundant in cellular proteins and have also been reported in viruses, where they may partake in evasion of the host immune system. Analyses of 28,231 SARS-CoV-2 whole proteomes and of 261,051 spike protein sequences revealed the presence of four extremely conserved LCRs in the spike protein of several SARS-CoV-2 variants. With the exception of Iota, where it is absent, the Spike LCR-1 is present in the signal peptide of 80.57% of the Delta variant sequences, and in other variants of concern and interest. The Spike LCR-2 is highly prevalent (79.87%) in Iota. Two distinctive LCRs are present in the Delta spike protein. The Delta Spike LCR-3 is present in 99.19% of the analyzed sequences, and the Delta Spike LCR-4 in 98.3% of the same set of proteins. These two LCRs are located in the furin cleavage site and HR1 domain, respectively, and may be considered hallmark traits of the Delta variant. The presence of the medically-important point mutations P681R and D950N in these LCRs, combined with the ubiquity of these regions in the highly contagious Delta variant opens the possibility that they may play a role in its rapid spread.
Collapse
Affiliation(s)
- Arturo Becerra
- Facultad de Ciencias, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| | - Israel Muñoz-Velasco
- Facultad de Ciencias, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| | | | - Wolfgang Cottom-Salas
- Facultad de Ciencias, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
- Escuela Nacional Preparatoria, Plantel 8 Miguel E. Schulz, Universidad Nacional Autónoma de México, 01600, Mexico City, Mexico
| | - Adrián Cruz-González
- Facultad de Ciencias, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| | - Alberto Vázquez-Salazar
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, CA, 90095, USA
| | | | - Rodrigo Jácome
- Facultad de Ciencias, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico
| | | | - Antonio Lazcano
- Facultad de Ciencias, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico.
- El Colegio Nacional, 06470, Mexico City, Mexico.
| |
Collapse
|
11
|
Deryusheva EI, Machulin AV, Galzitskaya OV. Structural, Functional, and Evolutionary Characteristics of Proteins with Repeats. Mol Biol 2021. [DOI: 10.1134/s0026893321040038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
12
|
Pigazzini ML, Lawrenz M, Margineanu A, Kaminski Schierle GS, Kirstein J. An Expanded Polyproline Domain Maintains Mutant Huntingtin Soluble in vivo and During Aging. Front Mol Neurosci 2021; 14:721749. [PMID: 34720872 PMCID: PMC8554126 DOI: 10.3389/fnmol.2021.721749] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 08/30/2021] [Indexed: 02/02/2023] Open
Abstract
Huntington's disease is a dominantly inherited neurodegenerative disorder caused by the expansion of a CAG repeat, encoding for the amino acid glutamine (Q), present in the first exon of the protein huntingtin. Over the threshold of Q39 HTT exon 1 (HTTEx1) tends to misfold and aggregate into large intracellular structures, but whether these end-stage aggregates or their on-pathway intermediates are responsible for cytotoxicity is still debated. HTTEx1 can be separated into three domains: an N-terminal 17 amino acid region, the polyglutamine (polyQ) expansion and a C-terminal proline rich domain (PRD). Alongside the expanded polyQ, these flanking domains influence the aggregation propensity of HTTEx1: with the N17 initiating and promoting aggregation, and the PRD modulating it. In this study we focus on the first 11 amino acids of the PRD, a stretch of pure prolines, which are an evolutionary recent addition to the expanding polyQ region. We hypothesize that this proline region is expanding alongside the polyQ to counteract its ability to misfold and cause toxicity, and that expanding this proline region would be overall beneficial. We generated HTTEx1 mutants lacking both flanking domains singularly, missing the first 11 prolines of the PRD, or with this stretch of prolines expanded. We then followed their aggregation landscape in vitro with a battery of biochemical assays, and in vivo in novel models of C. elegans expressing the HTTEx1 mutants pan-neuronally. Employing fluorescence lifetime imaging we could observe the aggregation propensity of all HTTEx1 mutants during aging and correlate this with toxicity via various phenotypic assays. We found that the presence of an expanded proline stretch is beneficial in maintaining HTTEx1 soluble over time, regardless of polyQ length. However, the expanded prolines were only advantageous in promoting the survival and fitness of an organism carrying a pathogenic stretch of Q48 but were extremely deleterious to the nematode expressing a physiological stretch of Q23. Our results reveal the unique importance of the prolines which have and still are evolving alongside expanding glutamines to promote the function of HTTEx1 and avoid pathology.
Collapse
Affiliation(s)
- Maria Lucia Pigazzini
- Department of Molecular Physiology and Cell Biology, Leibniz Research Institute for Molecular Pharmacology in the Forschungsverbund Berlin e.V. (FMP), Berlin, Germany
- NeuroCure Cluster of Excellence, Charité Universitätsmedizin Berlin, Berlin, Germany
| | - Mandy Lawrenz
- Department of Molecular Physiology and Cell Biology, Leibniz Research Institute for Molecular Pharmacology in the Forschungsverbund Berlin e.V. (FMP), Berlin, Germany
| | - Anca Margineanu
- Advanced Light Microscopy, Max-Delbrück Centrum for Molecular Medicine (MDC), Berlin, Germany
| | - Gabriele S. Kaminski Schierle
- Molecular Neuroscience Group, Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, United Kingdom
| | - Janine Kirstein
- Department of Molecular Physiology and Cell Biology, Leibniz Research Institute for Molecular Pharmacology in the Forschungsverbund Berlin e.V. (FMP), Berlin, Germany
- Department of Cell Biology, University of Bremen, Bremen, Germany
| |
Collapse
|
13
|
Mier P, Paladin L, Tamana S, Petrosian S, Hajdu-Soltész B, Urbanek A, Gruca A, Plewczynski D, Grynberg M, Bernadó P, Gáspári Z, Ouzounis CA, Promponas VJ, Kajava AV, Hancock JM, Tosatto SCE, Dosztanyi Z, Andrade-Navarro MA. Disentangling the complexity of low complexity proteins. Brief Bioinform 2021; 21:458-472. [PMID: 30698641 PMCID: PMC7299295 DOI: 10.1093/bib/bbz007] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 12/19/2018] [Accepted: 01/07/2019] [Indexed: 12/31/2022] Open
Abstract
There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, and more generally the overlaps between different properties related to LCRs, using examples. We argue that statistical measures alone cannot capture all structural aspects of LCRs and recommend the combined usage of a variety of predictive tools and measurements. While the methodologies available to study LCRs are already very advanced, we foresee that a more comprehensive annotation of sequences in the databases will enable the improvement of predictions and a better understanding of the evolution and the connection between structure and function of LCRs. This will require the use of standards for the generation and exchange of data describing all aspects of LCRs. Short abstract There are multiple definitions for low complexity regions (LCRs) in protein sequences. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, plus overlaps between different properties related to LCRs, using examples.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, Mainz, Germany
| | - Lisanna Paladin
- Department of Biomedical Science, University of Padova, Padova, Italy
| | - Stella Tamana
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Sophia Petrosian
- Biological Computation and Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica, Greece
| | - Borbála Hajdu-Soltész
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Annika Urbanek
- Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, Montpellier, France
| | - Aleksandra Gruca
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland
| | - Dariusz Plewczynski
- Center of New Technologies, University of Warsaw, Warsaw, Poland.,Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | | | - Pau Bernadó
- Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, Montpellier, France
| | - Zoltán Gáspári
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
| | - Christos A Ouzounis
- Biological Computation and Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica, Greece
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Andrey V Kajava
- Centre de Recherche en Biologie Cellulaire de Montpellier, CNRS-UMR, Institut de Biologie Computationnelle, Universite de Montpellier, Montpellier, France.,Institute of Bioengineering, University ITMO, St. Petersburg, Russia
| | - John M Hancock
- Earlham Institute, Norwich, UK.,ELIXIR Hub, Welcome Genome Campus, Hinxton, UK
| | - Silvio C E Tosatto
- Department of Biomedical Science, University of Padova, Padova, Italy.,CNR Institute of Neuroscience, Padova, Italy
| | - Zsuzsanna Dosztanyi
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, Mainz, Germany
| |
Collapse
|
14
|
Kastano K, Mier P, Andrade-Navarro MA. The Role of Low Complexity Regions in Protein Interaction Modes: An Illustration in Huntingtin. Int J Mol Sci 2021; 22:1727. [PMID: 33572172 PMCID: PMC7915032 DOI: 10.3390/ijms22041727] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 01/25/2021] [Accepted: 02/04/2021] [Indexed: 12/11/2022] Open
Abstract
Low complexity regions (LCRs) are very frequent in protein sequences, generally having a lower propensity to form structured domains and tending to be much less evolutionarily conserved than globular domains. Their higher abundance in eukaryotes and in species with more cellular types agrees with a growing number of reports on their function in protein interactions regulated by post-translational modifications. LCRs facilitate the increase of regulatory and network complexity required with the emergence of organisms with more complex tissue distribution and development. Although the low conservation and structural flexibility of LCRs complicate their study, evolutionary studies of proteins across species have been used to evaluate their significance and function. To investigate how to apply this evolutionary approach to the study of LCR function in protein-protein interactions, we performed a detailed analysis for Huntingtin (HTT), a large protein that is a hub for interaction with hundreds of proteins, has a variety of LCRs, and for which partial structural information (in complex with HAP40) is available. We hypothesize that proteins RASA1, SYN2, and KAT2B may compete with HAP40 for their attachment to the core of HTT using similar LCRs. Our results illustrate how evolution might favor the interplay of LCRs with domains, and the possibility of detecting multiple modes of LCR-mediated protein-protein interactions with a large hub such as HTT when enough protein interaction data is available.
Collapse
Affiliation(s)
| | | | - Miguel A. Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany; (K.K.); (P.M.)
| |
Collapse
|
15
|
Mier P, Andrade-Navarro MA. Assessing the low complexity of protein sequences via the low complexity triangle. PLoS One 2020; 15:e0239154. [PMID: 33378336 PMCID: PMC7773278 DOI: 10.1371/journal.pone.0239154] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Accepted: 08/31/2020] [Indexed: 11/24/2022] Open
Abstract
Background Proteins with low complexity regions (LCRs) have atypical sequence and structural features. Their amino acid composition varies from the expected, determined proteome-wise, and they do not follow the rules of structural folding that prevail in globular regions. One way to characterize these regions is by assessing the repeatability of a sequence, that is, calculating the local propensity of a region to be part of a repeat. Results We combine two local measures of low complexity, repeatability (using the RES algorithm) and fraction of the most frequent amino acid, to evaluate different proteomes, datasets of protein regions with specific features, and individual cases of proteins with extreme compositions. We apply a representation called ‘low complexity triangle’ as a proof-of-concept to represent the low complexity measured values. Results show that proteomes have distinct signatures in the low complexity triangle, and that these signatures are associated to complexity features of the sequences. We developed a web tool called LCT (http://cbdm-01.zdv.uni-mainz.de/~munoz/lct/) to allow users to calculate the low complexity triangle of a given protein or region of interest. Conclusions The low complexity triangle proves to be a suitable procedure to represent the general low complexity of a sequence or protein dataset. Homorepeats, direpeats, compositionally biased regions and globular regions occupy characteristic positions in the triangle. The described pipeline can be used to characterize LCRs and may help in quantifying the content of degenerated tandem repeats in proteins and proteomes.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Mainz, Germany
- * E-mail:
| | - Miguel A. Andrade-Navarro
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Mainz, Germany
| |
Collapse
|
16
|
Morató A, Elena-Real CA, Popovic M, Fournet A, Zhang K, Allemand F, Sibille N, Urbanek A, Bernadó P. Robust Cell-Free Expression of Sub-Pathological and Pathological Huntingtin Exon-1 for NMR Studies. General Approaches for the Isotopic Labeling of Low-Complexity Proteins. Biomolecules 2020; 10:E1458. [PMID: 33086646 PMCID: PMC7603387 DOI: 10.3390/biom10101458] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 10/07/2020] [Accepted: 10/16/2020] [Indexed: 12/23/2022] Open
Abstract
The high-resolution structural study of huntingtin exon-1 (HttEx1) has long been hampered by its intrinsic properties. In addition to being prone to aggregate, HttEx1 contains low-complexity regions (LCRs) and is intrinsically disordered, ruling out several standard structural biology approaches. Here, we use a cell-free (CF) protein expression system to robustly and rapidly synthesize (sub-) pathological HttEx1. The open nature of the CF reaction allows the application of different isotopic labeling schemes, making HttEx1 amenable for nuclear magnetic resonance studies. While uniform and selective labeling facilitate the sequential assignment of HttEx1, combining CF expression with nonsense suppression allows the site-specific incorporation of a single labeled residue, making possible the detailed investigation of the LCRs. To optimize CF suppression yields, we analyze the expression and suppression kinetics, revealing that high concentrations of loaded suppressor tRNA have a negative impact on the final reaction yield. The optimized CF protein expression and suppression system is very versatile and well suited to produce challenging proteins with LCRs in order to enable the characterization of their structure and dynamics.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Annika Urbanek
- Centre de Biochimie Structurale (CBS), INSERM, CNRS and Université de Montpellier. 29 rue de Navacelles, 34090 Montpellier, France; (A.M.); (C.A.E.-R.); (M.P.); (A.F.); (K.Z.); (F.A.); (N.S.)
| | - Pau Bernadó
- Centre de Biochimie Structurale (CBS), INSERM, CNRS and Université de Montpellier. 29 rue de Navacelles, 34090 Montpellier, France; (A.M.); (C.A.E.-R.); (M.P.); (A.F.); (K.Z.); (F.A.); (N.S.)
| |
Collapse
|
17
|
Chavali S, Singh AK, Santhanam B, Babu MM. Amino acid homorepeats in proteins. Nat Rev Chem 2020; 4:420-434. [PMID: 37127972 DOI: 10.1038/s41570-020-0204-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/04/2020] [Indexed: 12/16/2022]
Abstract
Amino acid homorepeats, or homorepeats, are polypeptide segments found in proteins that contain stretches of identical amino acid residues. Although abnormal homorepeat expansions are linked to pathologies such as neurodegenerative diseases, homorepeats are prevalent in eukaryotic proteomes, suggesting that they are important for normal physiology. In this Review, we discuss recent advances in our understanding of the biological functions of homorepeats, which range from facilitating subcellular protein localization to mediating interactions between proteins across diverse cellular pathways. We explore how the functional diversity of homorepeat-containing proteins could be linked to the ability of homorepeats to adopt different structural conformations, an ability influenced by repeat composition, repeat length and the nature of flanking sequences. We conclude by highlighting how an understanding of homorepeats will help us better characterize and develop therapeutics against the human diseases to which they contribute.
Collapse
Affiliation(s)
- Sreenivas Chavali
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, UK.
- Department of Biology, Indian Institute of Science Education and Research (IISER) Tirupati, Tirupati, India.
| | - Anjali K Singh
- Department of Biology, Indian Institute of Science Education and Research (IISER) Tirupati, Tirupati, India
| | - Balaji Santhanam
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, UK
- Department of Structural Biology and Center for Data Driven Discovery, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - M Madan Babu
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, UK.
- Department of Structural Biology and Center for Data Driven Discovery, St. Jude Children's Research Hospital, Memphis, TN, USA.
| |
Collapse
|
18
|
Mier P, Andrade-Navarro MA. The features of polyglutamine regions depend on their evolutionary stability. BMC Evol Biol 2020; 20:59. [PMID: 32448113 PMCID: PMC7247214 DOI: 10.1186/s12862-020-01626-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Accepted: 05/13/2020] [Indexed: 11/29/2022] Open
Abstract
Background Polyglutamine regions (polyQ) are one of the most studied and prevalent homorepeats in eukaryotes. They have a particular length-dependent codon usage, which relates to a characteristic CAG-slippage mechanism. Pathologically expanded tracts of polyQ are known to form aggregates and are involved in the development of several human neurodegenerative diseases. The non-pathogenic function of polyQ is to mediate protein-protein interactions via a coiled-coil pairing with an interactor. They are usually located in a helical context. Results Here we study the stability of polyQ regions in evolution, using a set of 60 proteomes from four distinct taxonomic groups (Insecta, Teleostei, Sauria and Mammalia). The polyQ regions can be distinctly grouped in three categories based on their evolutionary stability: stable, unstable by length variation (inserted), and unstable by mutations (mutated). PolyQ regions in these categories can be significantly distinguished by their glutamine codon usage, and we show that the CAG-slippage mechanism is predominant in inserted polyQ of Sauria and Mammalia. The polyQ amino acid context is also influenced by the polyQ stability, with a higher proportion of proline residues around inserted polyQ. By studying the secondary structure of the sequences surrounding polyQ regions, we found that regarding the structural conformation around a polyQ, its stability category is more relevant than its taxonomic information. The protein-protein interaction capacity of a polyQ is also affected by its stability, as stable polyQ have more interactors than unstable polyQ. Conclusions Our results show that apart from the sequence of a polyQ, information about its orthologous sequences is needed to assess its function. Codon usage, amino acid context, structural conformation and the protein-protein interaction capacity of polyQ from all studied taxa critically depend on the region stability. There are however some taxa-specific polyQ features that override this importance. We conclude that a taxa-driven evolutionary analysis is of the highest importance for the comprehensive study of any feature of polyglutamine regions.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany.
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
| |
Collapse
|
19
|
Urbanek A, Popovic M, Morató A, Estaña A, Elena-Real CA, Mier P, Fournet A, Allemand F, Delbecq S, Andrade-Navarro MA, Cortés J, Sibille N, Bernadó P. Flanking Regions Determine the Structure of the Poly-Glutamine in Huntingtin through Mechanisms Common among Glutamine-Rich Human Proteins. Structure 2020; 28:733-746.e5. [PMID: 32402249 DOI: 10.1016/j.str.2020.04.008] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 02/18/2020] [Accepted: 04/11/2020] [Indexed: 10/24/2022]
Abstract
The causative agent of Huntington's disease, the poly-Q homo-repeat in the N-terminal region of huntingtin (httex1), is flanked by a 17-residue-long fragment (N17) and a proline-rich region (PRR), which promote and inhibit the aggregation propensity of the protein, respectively, by poorly understood mechanisms. Based on experimental data obtained from site-specifically labeled NMR samples, we derived an ensemble model of httex1 that identified both flanking regions as opposing poly-Q secondary structure promoters. While N17 triggers helicity through a promiscuous hydrogen bond network involving the side chains of the first glutamines in the poly-Q tract, the PRR promotes extended conformations in neighboring glutamines. Furthermore, a bioinformatics analysis of the human proteome showed that these structural traits are present in many human glutamine-rich proteins and that they are more prevalent in proteins with longer poly-Q tracts. Taken together, these observations provide the structural bases to understand previous biophysical and functional data on httex1.
Collapse
Affiliation(s)
- Annika Urbanek
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Matija Popovic
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Anna Morató
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Alejandro Estaña
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France; LAAS-CNRS, Université de Toulouse, CNRS, 31400 Toulouse, France
| | - Carlos A Elena-Real
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | - Aurélie Fournet
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Frédéric Allemand
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Stephane Delbecq
- Laboratoire de Biologie Cellulaire et Moléculaire (LBCM-EA4558 Vaccination Antiparasitaire), UFR Pharmacie, Université de Montpellier, 34090 Montpellier, France
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, 31400 Toulouse, France
| | - Nathalie Sibille
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| | - Pau Bernadó
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France.
| |
Collapse
|
20
|
Urbanek A, Popovic M, Elena-Real CA, Morató A, Estaña A, Fournet A, Allemand F, Gil AM, Cativiela C, Cortés J, Jiménez AI, Sibille N, Bernadó P. Evidence of the Reduced Abundance of Proline cis Conformation in Protein Poly Proline Tracts. J Am Chem Soc 2020; 142:7976-7986. [PMID: 32266815 DOI: 10.1021/jacs.0c02263] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Proline is found in a cis conformation in proteins more often than other proteinogenic amino acids, where it influences structure and modulates function, being the focus of several high-resolution structural studies. However, until now, technical and methodological limitations have hampered the site-specific investigation of the conformational preferences of prolines present in poly proline (poly-P) homorepeats in their protein context. Here, we apply site-specific isotopic labeling to obtain high-resolution NMR data on the cis/trans equilibrium of prolines within the poly-P repeats of huntingtin exon 1, the causative agent of Huntington's disease. Screening prolines in different positions in long (poly-P11) and short (poly-P3) poly-P tracts, we found that, while the first proline of poly-P tracts adopts similar levels of cis conformation as isolated prolines, a length-dependent reduced abundance of cis conformers is observed for terminal prolines. Interestingly, the cis isomer could not be detected in inner prolines, in line with percentages derived from a large database of proline-centered tripeptides extracted from crystallographic structures. These results suggest a strong cooperative effect within poly-Ps that enhances their stiffness by diminishing the stability of the cis conformation. This rigidity is key to rationalizing the protection toward aggregation that the poly-P tract confers to huntingtin. Furthermore, the study provides new avenues to probe the structural properties of poly-P tracts in protein design as scaffolds or nanoscale rulers.
Collapse
Affiliation(s)
- Annika Urbanek
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier. 29, rue de Navacelles, 34090 Montpellier, France
| | - Matija Popovic
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier. 29, rue de Navacelles, 34090 Montpellier, France
| | - Carlos A Elena-Real
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier. 29, rue de Navacelles, 34090 Montpellier, France
| | - Anna Morató
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier. 29, rue de Navacelles, 34090 Montpellier, France
| | - Alejandro Estaña
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier. 29, rue de Navacelles, 34090 Montpellier, France.,LAAS-CNRS, Université de Toulouse, CNRS, 7 Avenue du Colonel Roche, 31400 Toulouse, France
| | - Aurélie Fournet
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier. 29, rue de Navacelles, 34090 Montpellier, France
| | - Frédéric Allemand
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier. 29, rue de Navacelles, 34090 Montpellier, France
| | - Ana M Gil
- Departamento de Quı́mica Orgánica, Instituto de Sı́ntesis Quı́mica y Catálisis Homogénea (ISQCH), CSIC-Universidad de Zaragoza, 50009 Zaragoza, Spain
| | - Carlos Cativiela
- Departamento de Quı́mica Orgánica, Instituto de Sı́ntesis Quı́mica y Catálisis Homogénea (ISQCH), CSIC-Universidad de Zaragoza, 50009 Zaragoza, Spain
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, 7 Avenue du Colonel Roche, 31400 Toulouse, France
| | - Ana I Jiménez
- Departamento de Quı́mica Orgánica, Instituto de Sı́ntesis Quı́mica y Catálisis Homogénea (ISQCH), CSIC-Universidad de Zaragoza, 50009 Zaragoza, Spain
| | - Nathalie Sibille
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier. 29, rue de Navacelles, 34090 Montpellier, France
| | - Pau Bernadó
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier. 29, rue de Navacelles, 34090 Montpellier, France
| |
Collapse
|
21
|
Mier P, Elena-Real C, Urbanek A, Bernadó P, Andrade-Navarro MA. The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context. Comput Struct Biotechnol J 2020; 18:306-313. [PMID: 32071707 PMCID: PMC7016039 DOI: 10.1016/j.csbj.2020.01.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 12/13/2019] [Accepted: 01/30/2020] [Indexed: 12/18/2022] Open
Abstract
Polyglutamine (polyQ) regions are one of the most prevalent homorepeats in eukaryotes. It is however difficult to evaluate their prevalence because various studies claim different results. The reason is the lack of a consensus to define what is indeed a polyQ region. We have tackled this issue by studying how the use of different thresholds (i.e., minimum number of glutamines required in a protein region of a given size), to detect polyQ regions in the human proteome influences not only their prevalence but also their general features and sequence context. Threshold definition shapes the length distribution of the polyQ dataset, and changes the observed number and position of impurities (amino acids other than glutamine) within polyQ regions. Irrespective of the chosen threshold, leucine and proline residues are enriched both within and around polyQ. While leucine is enriched at the N-terminus of polyQ and specially at position -1 (amino acid preceding the polyQ), proline is prevalent in the C-terminus (positions +1 to +5, that is, the first five amino acids after the polyQ). We also checked the suitability of these thresholds for other species, and compared their polyQ features with those found in humans. As the sequence context and features of polyQ regions are threshold-dependent, we propose a method to quickly scan the polyQ landscape of a proteome. We complement our results with a summarized overview about which biases are to be expected per threshold when studying polyQ regions.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Carlos Elena-Real
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090 Montpellier, France
| | - Annika Urbanek
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090 Montpellier, France
| | - Pau Bernadó
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090 Montpellier, France
| | - Miguel A. Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| |
Collapse
|
22
|
Urbanek A, Elena-Real CA, Popovic M, Morató A, Fournet A, Allemand F, Delbecq S, Sibille N, Bernadó P. Site-Specific Isotopic Labeling (SSIL): Access to High-Resolution Structural and Dynamic Information in Low-Complexity Proteins. Chembiochem 2019; 21:769-775. [PMID: 31697025 DOI: 10.1002/cbic.201900583] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 11/05/2019] [Indexed: 12/17/2022]
Abstract
Remarkable technical progress in the area of structural biology has paved the way to study previously inaccessible targets. For example, large protein complexes can now be easily investigated by cryo-electron microscopy, and modern high-field NMR magnets have challenged the limits of high-resolution characterization of proteins in solution. However, the structural and dynamic characteristics of certain proteins with important functions still cannot be probed by conventional methods. These proteins in question contain low-complexity regions (LCRs), compositionally biased sequences where only a limited number of amino acids is repeated multiple times, which hamper their characterization. This Concept article describes a site-specific isotopic labeling (SSIL) strategy, which combines nonsense suppression and cell-free protein synthesis to overcome these limitations. An overview on how poly-glutamine tracts were made amenable to high-resolution structural studies is used to illustrate the usefulness of SSIL. Furthermore, we discuss the potential of this methodology to give further insights into the roles of LCRs in human pathologies and liquid-liquid phase separation, as well as the challenges that must be addressed in the future for the popularization of SSIL.
Collapse
Affiliation(s)
- Annika Urbanek
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090, Montpellier, France
| | - Carlos A Elena-Real
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090, Montpellier, France
| | - Matija Popovic
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090, Montpellier, France
| | - Anna Morató
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090, Montpellier, France
| | - Aurélie Fournet
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090, Montpellier, France
| | - Frédéric Allemand
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090, Montpellier, France
| | - Stephane Delbecq
- Laboratoire de Biologie Cellulaire et Moléculaire, (LBCM-EA4558 Vaccination Antiparasitaire), UFR Pharmacie, Université de Montpellier, 15, Av. Charles Flahault, BP 14491, 34000, Montpellier, France
| | - Nathalie Sibille
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090, Montpellier, France
| | - Pau Bernadó
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090, Montpellier, France
| |
Collapse
|
23
|
Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P, Gruca A, Grynberg M, Kajava AV, Promponas VJ, Anisimova M, Jakobsen KS, Linke D. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 2019; 47:10994-11006. [PMID: 31584084 PMCID: PMC6868369 DOI: 10.1093/nar/gkz841] [Citation(s) in RCA: 153] [Impact Index Per Article: 30.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 09/03/2019] [Accepted: 10/01/2019] [Indexed: 12/13/2022] Open
Abstract
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.
Collapse
Affiliation(s)
- Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton. CB10 1SD, UK
| | - Patryk Jarnot
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Aleksandra Gruca
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics PAS, Pawińskiego 5A, 02-106 Warsaw, Poland
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Universite Montpellier 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
- Institut de Biologie Computationnelle, 34095 Montpellier, France
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, PO Box 20537, CY 1678 Nicosia, Cyprus
| | - Maria Anisimova
- Institute of Applied Simulations, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Wädenswil, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Dirk Linke
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| |
Collapse
|
24
|
Rajathei DM, Parthasarathy S, Selvaraj S. Identification and Analysis of Long Repeats of Proteins at the Domain Level. Front Bioeng Biotechnol 2019; 7:250. [PMID: 31649924 PMCID: PMC6795024 DOI: 10.3389/fbioe.2019.00250] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 09/16/2019] [Indexed: 12/27/2022] Open
Abstract
Amino acid repeats play an important role in the structure and function of proteins. Analysis of long repeats in protein sequences enables one to understand their abundance, structure and function in the protein universe. In the present study, amino acid repeats of length >50 (long repeats) were identified in a non-redundant set of UniProt sequences using the RADAR program. The underlying structures and functions of these long repeats were carried out using the Gene3D for structural domains, Pfam for functional domains and enzyme and non-enzyme functional classification for catalytic and binding of the proteins. From a structural perspective, these long repeats seem to predominantly occur in certain architectures such as sandwich, bundle, barrel, and roll and within these architectures abundant in the superfolds. The lengths of the repeats within each fold are not uniform exhibiting different structures for different functions. We also observed that long repeats are in the domain regions of the family and are involved in the function of the proteins. After grouping based on enzyme and non-enzyme classes, we observed the abundant occurrence of long repeats in specific catalytic and binding of the proteins. In this study, we have analyzed the occurrence of long repeats in the protein sequence universe apart from well-characterized short tandem repeats in sequences and their structures and functions of the proteins at the domain level. The present study suggests that long repeats may play an important role in the structure and function of domains of the proteins.
Collapse
Affiliation(s)
- David Mary Rajathei
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, India
| | - Subbiah Parthasarathy
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, India
| | - Samuel Selvaraj
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, India
| |
Collapse
|
25
|
Repeatability in protein sequences. J Struct Biol 2019; 208:86-91. [PMID: 31408700 DOI: 10.1016/j.jsb.2019.08.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 08/06/2019] [Accepted: 08/08/2019] [Indexed: 02/07/2023]
Abstract
Low complexity regions (LCRs) in protein sequences have special properties that are very different from those of globular proteins. The rules that define secondary structure elements do not apply when the distribution of amino acids becomes biased. While there is a tendency towards structural disorder in LCRs, various examples, and particularly homorepeats of single amino acids, suggest that very short repeats could adopt structures very difficult to predict. These structures are possibly variable and dependant on the context of intra- or inter-molecular interactions. In general, short repeats in LCRs can induce structure. This could explain the observation that very short (non-perfect) repeats are widespread and many define regions with a function in protein interactions. For these reasons, we have developed an algorithm to quickly analyze local repeatability along protein sequences, that is, how close a protein fragment is from a perfect repeat. Using this algorithm we identified that the proteins of the yeast Saccharomyces cerevisiae are depleted in short repeats (approximate or not) of odd-length, while the human proteins are not, that the fish Danio rerio has many proteins with repeats of length two and that the plant Arabidopsis thaliana has an unusually large amount of repeats of length seven. Our method (REpeatability Scanner, RES, accessible at http://cbdm-01.zdv.uni-mainz.de/~munoz/res/) allows to find regions with approximate short repeats in protein sequences, and helps to characterize the variable use of LCRs and compositional bias in different organisms.
Collapse
|
26
|
Her C, Yeh Y, Krishnan VV. The Ensemble of Conformations of Antifreeze Glycoproteins (AFGP8): A Study Using Nuclear Magnetic Resonance Spectroscopy. Biomolecules 2019; 9:biom9060235. [PMID: 31213033 PMCID: PMC6628104 DOI: 10.3390/biom9060235] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 05/28/2019] [Accepted: 05/30/2019] [Indexed: 12/13/2022] Open
Abstract
The primary sequence of antifreeze glycoproteins (AFGPs) is highly degenerate, consisting of multiple repeats of the same tripeptide, Ala–Ala–Thr*, in which Thr* is a glycosylated threonine with the disaccharide beta-d-galactosyl-(1,3)-alpha-N-acetyl-d-galactosamine. AFGPs seem to function as intrinsically disordered proteins, presenting challenges in determining their native structure. In this work, a different approach was used to elucidate the three-dimensional structure of AFGP8 from the Arctic cod Boreogadussaida and the Antarctic notothenioid Trematomusborchgrevinki. Dimethyl sulfoxide (DMSO), a non-native solvent, was used to make AFGP8 less dynamic in solution. Interestingly, DMSO induced a non-native structure, which could be determined via nuclear magnetic resonance (NMR) spectroscopy. The overall three-dimensional structures of the two AFGP8s from two different natural sources were different from a random coil ensemble, but their “compactness” was very similar, as deduced from NMR measurements. In addition to their similar compactness, the conserved motifs, Ala–Thr*–Pro–Ala and Ala–Thr*–Ala–Ala, present in both AFGP8s, seemed to have very similar three-dimensional structures, leading to a refined definition of local structural motifs. These local structural motifs allowed AFGPs to be considered functioning as effectors, making a transition from disordered to ordered upon binding to the ice surface. In addition, AFGPs could act as dynamic linkers, whereby a short segment folds into a structural motif, while the rest of the AFGPs could still be disordered, thus simultaneously interacting with bulk water molecules and the ice surface, preventing ice crystal growth.
Collapse
Affiliation(s)
- Cheenou Her
- Department of Chemistry, California State University, Fresno, CA 93740, USA.
| | - Yin Yeh
- Department of Applied Science, University of California, Davis, CA 95616, USA.
| | - Viswanathan V Krishnan
- Department of Chemistry, California State University, Fresno, CA 93740, USA.
- Department Medical Pathology and Laboratory Medicine, Davis School of Medicine, University of California, Davis, CA 95616, USA.
| |
Collapse
|
27
|
RepEx: A web server to extract sequence repeats from protein and DNA sequences. Comput Biol Chem 2018; 78:424-430. [PMID: 30598392 DOI: 10.1016/j.compbiolchem.2018.12.015] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Accepted: 12/25/2018] [Indexed: 11/20/2022]
Abstract
Evolution builds up new genetic material from existing ones, not in random, but in highly ordered and eloquent patterns. Most of these sequence repeats are revelatory of valuable information contributing to areas of disease research and function of macromolecules, to name a few. In the age of next generation genome sequencing, rapid and efficient extraction of all unbiased sequence repeats from macromolecules is the need of the hour. In view of this reckoning, an online web-based computing server, RepEx, has been developed to extract and display all possible repeats for DNA and protein sequences. Apart from exact or identical repeats, the server has been designed adeptly to identify and extract degenerate, inverted, everted and mirror repeats from both DNA and protein sequences. The server has striking output displays, featuring interactive graphs and comprehensive output files. In addition, RepEx has been accoutered with an easy-to-use interface and search filters to facilitate a user-defined query or search and is freely available and accessible via the World Wide Web at http://bioserver2.physics.iisc.ac.in/RepEx/.
Collapse
|
28
|
Urbanek A, Morató A, Allemand F, Delaforge E, Fournet A, Popovic M, Delbecq S, Sibille N, Bernadó P. A General Strategy to Access Structural Information at Atomic Resolution in Polyglutamine Homorepeats. Angew Chem Int Ed Engl 2018; 57:3598-3601. [PMID: 29359503 PMCID: PMC5901001 DOI: 10.1002/anie.201711530] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Revised: 12/28/2017] [Indexed: 12/31/2022]
Abstract
Homorepeat (HR) proteins are involved in key biological processes and multiple pathologies, however their high-resolution characterization has been impaired due to their homotypic nature. To overcome this problem, we have developed a strategy to isotopically label individual glutamines within HRs by combining nonsense suppression and cell-free expression. Our method has enabled the NMR investigation of huntingtin exon1 with a 16-residue polyglutamine (poly-Q) tract, and the results indicate the presence of an N-terminal α-helix at near neutral pH that vanishes towards the end of the HR. The generality of the strategy was demonstrated by introducing a labeled glutamine into a pathological version of huntingtin with 46 glutamines. This methodology paves the way to decipher the structural and dynamic perturbations induced by HR extensions in poly-Q-related diseases. Our approach can be extended to other amino acids to investigate biological processes involving proteins containing low-complexity regions (LCRs).
Collapse
Affiliation(s)
- Annika Urbanek
- Centre de Biochimie Structurale (CBS), INSERM, CNRSUniversité de Montpellier29 rue de Navacelles34090MontpellierFrance
| | - Anna Morató
- Centre de Biochimie Structurale (CBS), INSERM, CNRSUniversité de Montpellier29 rue de Navacelles34090MontpellierFrance
| | - Frédéric Allemand
- Centre de Biochimie Structurale (CBS), INSERM, CNRSUniversité de Montpellier29 rue de Navacelles34090MontpellierFrance
| | - Elise Delaforge
- Centre de Biochimie Structurale (CBS), INSERM, CNRSUniversité de Montpellier29 rue de Navacelles34090MontpellierFrance
| | - Aurélie Fournet
- Centre de Biochimie Structurale (CBS), INSERM, CNRSUniversité de Montpellier29 rue de Navacelles34090MontpellierFrance
| | - Matija Popovic
- Centre de Biochimie Structurale (CBS), INSERM, CNRSUniversité de Montpellier29 rue de Navacelles34090MontpellierFrance
| | - Stephane Delbecq
- Laboratoire de Biologie Cellulaire et Moléculaire, (LBCM-EA4558 Vaccination Antiparasitaire)UFR PharmacieUniversité de MontpellierMontpellierFrance
| | - Nathalie Sibille
- Centre de Biochimie Structurale (CBS), INSERM, CNRSUniversité de Montpellier29 rue de Navacelles34090MontpellierFrance
| | - Pau Bernadó
- Centre de Biochimie Structurale (CBS), INSERM, CNRSUniversité de Montpellier29 rue de Navacelles34090MontpellierFrance
| |
Collapse
|
29
|
Urbanek A, Morató A, Allemand F, Delaforge E, Fournet A, Popovic M, Delbecq S, Sibille N, Bernadó P. A General Strategy to Access Structural Information at Atomic Resolution in Polyglutamine Homorepeats. Angew Chem Int Ed Engl 2018. [DOI: 10.1002/ange.201711530] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Annika Urbanek
- Centre de Biochimie Structurale (CBS), INSERM, CNRS; Université de Montpellier; 29 rue de Navacelles 34090 Montpellier France
| | - Anna Morató
- Centre de Biochimie Structurale (CBS), INSERM, CNRS; Université de Montpellier; 29 rue de Navacelles 34090 Montpellier France
| | - Frédéric Allemand
- Centre de Biochimie Structurale (CBS), INSERM, CNRS; Université de Montpellier; 29 rue de Navacelles 34090 Montpellier France
| | - Elise Delaforge
- Centre de Biochimie Structurale (CBS), INSERM, CNRS; Université de Montpellier; 29 rue de Navacelles 34090 Montpellier France
| | - Aurélie Fournet
- Centre de Biochimie Structurale (CBS), INSERM, CNRS; Université de Montpellier; 29 rue de Navacelles 34090 Montpellier France
| | - Matija Popovic
- Centre de Biochimie Structurale (CBS), INSERM, CNRS; Université de Montpellier; 29 rue de Navacelles 34090 Montpellier France
| | - Stephane Delbecq
- Laboratoire de Biologie Cellulaire et Moléculaire, (LBCM-EA4558 Vaccination Antiparasitaire); UFR Pharmacie; Université de Montpellier; Montpellier France
| | - Nathalie Sibille
- Centre de Biochimie Structurale (CBS), INSERM, CNRS; Université de Montpellier; 29 rue de Navacelles 34090 Montpellier France
| | - Pau Bernadó
- Centre de Biochimie Structurale (CBS), INSERM, CNRS; Université de Montpellier; 29 rue de Navacelles 34090 Montpellier France
| |
Collapse
|
30
|
Mier P, Andrade-Navarro MA. Glutamine Codon Usage and polyQ Evolution in Primates Depend on the Q Stretch Length. Genome Biol Evol 2018; 10:816-825. [PMID: 29608721 PMCID: PMC5841385 DOI: 10.1093/gbe/evy046] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/19/2018] [Indexed: 12/16/2022] Open
Abstract
Amino acid usage in a proteome depends mostly on its taxonomy, as it does the codon usage in transcriptomes. Here, we explore the level of variation in the codon usage of a specific amino acid, glutamine, in relation to the number of consecutive glutamine residues. We show that CAG triplets are consistently more abundant in short glutamine homorepeats (polyQ, four to eight residues) than in shorter glutamine stretches (one to three residues), leading to the evolutionary growth of the repeat region in a CAG-dependent manner. The length of orthologous polyQ regions is mostly stable in primates, particularly the short ones. Interestingly, given a short polyQ the CAG usage is higher in unstable-in-length orthologous polyQ regions. This indicates that CAG triplets produce the necessary instability for a glutamine stretch to grow. Proteins related to polyQ-associated diseases behave in a more extreme way, with longer glutamine stretches in human and evolutionarily closer nonhuman primates, and an overall higher CAG usage. In the light of our results, we suggest an evolutionary model to explain the glutamine codon usage in polyQ regions.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Germany
- Institute of Molecular Biology, Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Germany
- Institute of Molecular Biology, Mainz, Germany
| |
Collapse
|
31
|
Understanding the antimicrobial properties/activity of an 11-residue Lys homopeptide by alanine and proline scan. Amino Acids 2018; 50:557-568. [DOI: 10.1007/s00726-018-2542-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 02/11/2018] [Indexed: 12/20/2022]
|
32
|
Polyglutamine expansion diseases: More than simple repeats. J Struct Biol 2017; 201:139-154. [PMID: 28928079 DOI: 10.1016/j.jsb.2017.09.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Revised: 08/24/2017] [Accepted: 09/15/2017] [Indexed: 12/27/2022]
Abstract
Polyglutamine (polyQ) repeat-containing proteins are widespread in the human proteome but only nine of them are associated with highly incapacitating neurodegenerative disorders. The genetic expansion of the polyQ tract in disease-related proteins triggers a series of events resulting in neurodegeneration. The polyQ tract plays the leading role in the aggregation mechanism, but other elements modulate the aggregation propensity in the context of the full-length proteins, as implied by variations in the length of the polyQ tract required to trigger the onset of a given polyQ disease. Intrinsic features such as the presence of aggregation-prone regions (APRs) outside the polyQ segments and polyQ-flanking sequences, which synergistically participate in the aggregation process, are emerging for several disease-related proteins. The inherent polymorphic structure of polyQ stretches places the polyQ proteins in a central position in protein-protein interaction networks, where interacting partners may additionally shield APRs or reshape the aggregation course. Expansion of the polyQ tract perturbs the cellular homeostasis and contributes to neuronal failure by modulating protein-protein interactions and enhancing toxic oligomerization. Post-translational modifications further regulate self-assembly either by directly altering the intrinsic aggregation propensity of polyQ proteins, by modulating their interaction with different macromolecules or by modifying their withdrawal by the cell quality control machinery. Here we review the recent data on the multifaceted aggregation pathways of disease-related polyQ proteins, focusing on ataxin-3, the protein mutated in Machado-Joseph disease. Further mechanistic understanding of this network of events is crucial for the development of effective therapies for polyQ diseases.
Collapse
|
33
|
Mier P, Andrade-Navarro MA. dAPE: a web server to detect homorepeats and follow their evolution. Bioinformatics 2017; 33:1221-1223. [PMID: 28031183 PMCID: PMC5408840 DOI: 10.1093/bioinformatics/btw790] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 12/09/2016] [Indexed: 01/10/2023] Open
Abstract
Summary Homorepeats are low complexity regions consisting of repetitions of a single amino acid residue. There is no current consensus on the minimum number of residues needed to define a functional homorepeat, nor even if mismatches are allowed. Here we present dAPE, a web server that helps following the evolution of homorepeats based on orthology information, using a sensitive but tunable cutoff to help in the identification of emerging homorepeats. Availability and Implementation dAPE can be accessed from http://cbdm-01.zdv.uni-mainz.de/∼munoz/polyx. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Johannes Gutenberg Universität, Institute of Molecular Biology, Mainz, Germany
- To whom correspondence should be addressed.
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg Universität, Institute of Molecular Biology, Mainz, Germany
| |
Collapse
|
34
|
Voet ARD, Simoncini D, Tame JRH, Zhang KYJ. Evolution-Inspired Computational Design of Symmetric Proteins. Methods Mol Biol 2017; 1529:309-322. [PMID: 27914059 DOI: 10.1007/978-1-4939-6637-0_16] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Monomeric proteins with a number of identical repeats creating symmetrical structures are potentially very valuable building blocks with a variety of bionanotechnological applications. As such proteins do not occur naturally, the emerging field of computational protein design serves as an excellent tool to create them from nonsymmetrical templates. Existing pseudo-symmetrical proteins are believed to have evolved from oligomeric precursors by duplication and fusion of identical repeats. Here we describe a computational workflow to reverse-engineer this evolutionary process in order to create stable proteins consisting of identical sequence repeats.
Collapse
Affiliation(s)
- Arnout R D Voet
- Laboratory for Biomolecular Modelling and Design, KU Leuven, Celestijnenlaan 200G, Leuven, 3000, Belgium.
| | - David Simoncini
- Structural Bioinformatics Team, Division of Structural and Synthetic Biology, Center for Life Science Technologies, RIKEN, 1-7-22 Suehiro, Yokohama, Kanagawa, 230-0045, Japan
- MIAT, UR-875, INRA, F-31320, Castanet Tolosan, France
| | - Jeremy R H Tame
- Drug Design Laboratory, Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro, Yokohama, Kanagawa, 230-0045, Japan
| | - Kam Y J Zhang
- Structural Bioinformatics Team, Division of Structural and Synthetic Biology, Center for Life Science Technologies, 1-7-22 Suehiro, Yokohama, Kanagawa, 230-0045, Japan
| |
Collapse
|
35
|
Carvajal-Rondanelli P, Aróstica M, Marshall SH, Albericio F, Álvarez CA, Ojeda C, Aguilar LF, Guzmán F. Inhibitory effect of short cationic homopeptides against Gram-negative bacteria. Amino Acids 2016; 48:1445-56. [DOI: 10.1007/s00726-016-2198-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Accepted: 02/08/2016] [Indexed: 12/19/2022]
|
36
|
Pellegrini M. Tandem Repeats in Proteins: Prediction Algorithms and Biological Role. Front Bioeng Biotechnol 2015; 3:143. [PMID: 26442257 PMCID: PMC4585158 DOI: 10.3389/fbioe.2015.00143] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 09/07/2015] [Indexed: 12/30/2022] Open
Abstract
Tandem repetitions in protein sequence and structure is a fascinating subject of research which has been a focus of study since the late 1990s. In this survey, we give an overview on the multi-faceted aspects of research on protein tandem repeats (PTR for short), including prediction algorithms, databases, early classification efforts, mechanisms of PTR formation and evolution, and synthetic PTR design. We also touch on the rather open issue of the relationship between PTR and flexibility (or disorder) in proteins. Detection of PTR either from protein sequence or structure data is challenging due to inherent high (biological) signal-to-noise ratio that is a key feature of this problem. As early in silico analytic tools have been key enablers for starting this field of study, we expect that current and future algorithmic and statistical breakthroughs will have a high impact on the investigations of the biological role of PTR.
Collapse
Affiliation(s)
- Marco Pellegrini
- Laboratory for Integrative Systems Medicine (LISM), Istituto di Informatica e Telematica, and Istituto di Fisiologia Clinica, Consiglio Nazionale delle Ricerche , Pisa , Italy
| |
Collapse
|
37
|
Norouzy A, Assaf KI, Zhang S, Jacob MH, Nau WM. Coulomb Repulsion in Short Polypeptides. J Phys Chem B 2014; 119:33-43. [DOI: 10.1021/jp508263a] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Affiliation(s)
- Amir Norouzy
- Department of Life Sciences
and Chemistry, Jacobs University Bremen, Campus Ring 1, D-28759 Bremen, Germany
| | - Khaleel I. Assaf
- Department of Life Sciences
and Chemistry, Jacobs University Bremen, Campus Ring 1, D-28759 Bremen, Germany
| | - Shuai Zhang
- Department of Life Sciences
and Chemistry, Jacobs University Bremen, Campus Ring 1, D-28759 Bremen, Germany
| | - Maik H. Jacob
- Department of Life Sciences
and Chemistry, Jacobs University Bremen, Campus Ring 1, D-28759 Bremen, Germany
| | - Werner M. Nau
- Department of Life Sciences
and Chemistry, Jacobs University Bremen, Campus Ring 1, D-28759 Bremen, Germany
| |
Collapse
|
38
|
Huang RK, Baxa U, Aldrian G, Ahmed AB, Wall JS, Mizuno N, Antzutkin O, Steven AC, Kajava AV. Conformational switching in PolyGln amyloid fibrils resulting from a single amino acid insertion. Biophys J 2014; 106:2134-42. [PMID: 24853742 PMCID: PMC4052364 DOI: 10.1016/j.bpj.2014.03.047] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2014] [Revised: 03/18/2014] [Accepted: 03/25/2014] [Indexed: 11/16/2022] Open
Abstract
The established correlation between neurodegenerative disorders and intracerebral deposition of polyglutamine aggregates motivates attempts to better understand their fibrillar structure. We designed polyglutamines with a few lysines inserted to overcome the hindrance of extreme insolubility and two D-lysines to limit the lengths of β-strands. One is 33 amino acids long (PolyQKd-33) and the other has one fewer glutamine (PolyQKd-32). Both form well-dispersed fibrils suitable for analysis by electron microscopy. Electron diffraction confirmed cross-β structures in both fibrils. Remarkably, the deletion of just one glutamine residue from the middle of the peptide leads to substantially different amyloid structures. PolyQKd-32 fibrils are consistently 10-20% wider than PolyQKd-33, as measured by negative staining, cryo-electron microscopy, and scanning transmission electron microscopy. Scanning transmission electron microscopy analysis revealed that the PolyQKd-32 fibrils have 50% higher mass-per-length than PolyQKd-33. This distinction can be explained by a superpleated β-structure model for PolyQKd-33 and a model with two β-solenoid protofibrils for PolyQKd-32. These data provide evidence for β-arch-containing structures in polyglutamine fibrils and open future possibilities for structure-based drug design.
Collapse
Affiliation(s)
- Rick K Huang
- Laboratory of Structural Biology, National Institute of Arthritis, Musculoskeletal, and Skin Diseases, National Institutes of Health, Bethesda, Maryland
| | - Ulrich Baxa
- Laboratory of Structural Biology, National Institute of Arthritis, Musculoskeletal, and Skin Diseases, National Institutes of Health, Bethesda, Maryland; Electron Microscopy Laboratory, Cancer Research Technology Program, Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, Maryland
| | - Gudrun Aldrian
- Centre de Recherches de Biochimie Macromoléculaire, CNRS, University of Montpellier 1 and 2, Montpellier, France
| | - Abdullah B Ahmed
- Centre de Recherches de Biochimie Macromoléculaire, CNRS, University of Montpellier 1 and 2, Montpellier, France
| | - Joseph S Wall
- Department of Biology, Brookhaven National Laboratory, Upton New York
| | - Naoko Mizuno
- Laboratory of Structural Biology, National Institute of Arthritis, Musculoskeletal, and Skin Diseases, National Institutes of Health, Bethesda, Maryland; Department of Structural Cell Biology, Max-Planck-Institute of Biochemistry, Am Klopferspitz 18, Martinsried, Germany
| | - Oleg Antzutkin
- Chemistry of Interfaces, Luleå University of Technology, Luleå, Sweden; Department of Physics, Warwick University, Coventry, United Kingdom
| | - Alasdair C Steven
- Laboratory of Structural Biology, National Institute of Arthritis, Musculoskeletal, and Skin Diseases, National Institutes of Health, Bethesda, Maryland.
| | - Andrey V Kajava
- Centre de Recherches de Biochimie Macromoléculaire, CNRS, University of Montpellier 1 and 2, Montpellier, France; Institut de Biologie Computationnelle, Montpellier, France; University ITMO, 197101 St. Petersburg, Russia.
| |
Collapse
|
39
|
Di Domenico T, Potenza E, Walsh I, Parra RG, Giollo M, Minervini G, Piovesan D, Ihsan A, Ferrari C, Kajava AV, Tosatto SCE. RepeatsDB: a database of tandem repeat protein structures. Nucleic Acids Res 2013; 42:D352-7. [PMID: 24311564 PMCID: PMC3964956 DOI: 10.1093/nar/gkt1175] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
RepeatsDB (http://repeatsdb.bio.unipd.it/) is a database of annotated tandem repeat protein structures. Tandem repeats pose a difficult problem for the analysis of protein structures, as the underlying sequence can be highly degenerate. Several repeat types haven been studied over the years, but their annotation was done in a case-by-case basis, thus making large-scale analysis difficult. We developed RepeatsDB to fill this gap. Using state-of-the-art repeat detection methods and manual curation, we systematically annotated the Protein Data Bank, predicting 10 745 repeat structures. In all, 2797 structures were classified according to a recently proposed classification schema, which was expanded to accommodate new findings. In addition, detailed annotations were performed in a subset of 321 proteins. These annotations feature information on start and end positions for the repeat regions and units. RepeatsDB is an ongoing effort to systematically classify and annotate structural protein repeats in a consistent way. It provides users with the possibility to access and download high-quality datasets either interactively or programmatically through web services.
Collapse
Affiliation(s)
- Tomás Di Domenico
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy, Department of Biological Chemistry, Universidad de Buenos Aires, Buenos Aires C1428EGA, Argentina, Department of Information Engineering, University of Padua, 35121 Padova, Italy, Department of Biosciences, COMSATS Institute of Information Technology, Sahiwal, Pakistan, Centre de Recherches de Biochimie Macromoléculaire, CNRS, 34293 Montpellier Cedex 5, France and Institut de Biologie Computationnelle, 34293 Montpellier Cedex 5, France
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Guzmán F, Marshall S, Ojeda C, Albericio F, Carvajal-Rondanelli P. Inhibitory effect of short cationic homopeptides against gram-positive bacteria. J Pept Sci 2013; 19:792-800. [PMID: 24243601 DOI: 10.1002/psc.2578] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2013] [Revised: 09/25/2013] [Accepted: 10/01/2013] [Indexed: 01/26/2023]
Abstract
In the selection or design of antimicrobial peptides, the key role played by cationic amino acids and chain length on the inhibitory potency and specificity is not clear. A fundamental study was conducted using chemically synthesized homopeptides of L-Lys and L-Arg ranging from 7 to 14 residues. Their effect on growth inhibition was evaluated over a wide range of Gram-positive bacteria at different levels of concentration. Interestingly, at lower concentrations (10 μM), Lys homopeptides with odd number of residues, especially with 11 residues, showed a broader inhibitory activity than those with even number of residues. At higher peptide concentrations (>20 μM), the inhibitory activity of Lys homopeptides was directly related to the number of residues in the chain. In contrast, Arg homopeptides, at lower concentrations, did not exhibit a defined pattern of bacterial inhibition related to the number of residues; however, at higher concentrations (>20 μM), the inhibitory effects were more pronounced. Lys homopeptides at concentrations up to 300 μM showed a remarkably lower toxicity against CHSE-214 cells. Arg homopeptides exhibited negligible cytotoxicity up to chain length of 11 residues at concentrations lower than 100 μM, but an abrupt increase in toxicity resulted when the peptide chain length reached 12 amino acid residues and higher concentrations. All synthesized homopeptides displayed characteristic polyproline II helix conformation in both buffer and liposomes, as shown by CD spectroscopy. This result suggests that short Lys homopeptides with an odd number of residues (9 and 11) have a broad spectrum of activity against Gram-positive bacterial cells compared with Arg homopeptides, which in turn showed a considerably higher selectivity toward those cells. By investigating the differences between Lys and Arg homopeptides, this study contributes to the understanding of their mechanism of growth inhibition and selectivity. Thus, it provides further guidelines for a rational design of short antimicrobial peptides.
Collapse
Affiliation(s)
- Fanny Guzmán
- Núcleo de Biotecnología de Curauma, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2950, Valparaíso, Chile and Fraunhofer Chile Research Foundation, Santiago, Chile
| | | | | | | | | |
Collapse
|
41
|
Lobanov MY, Sokolovskiy IV, Galzitskaya OV. HRaP: database of occurrence of HomoRepeats and patterns in proteomes. Nucleic Acids Res 2013; 42:D273-8. [PMID: 24150944 PMCID: PMC3965023 DOI: 10.1093/nar/gkt927] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We focus our attention on multiple repeats of one amino acid (homorepeats) and create a new database (named HRaP, at http://bioinfo.protres.ru/hrap/) of occurrence of homorepeats and disordered patterns in different proteomes. HRaP is aimed at understanding the amino acid tandem repeat function in different proteomes. Therefore, the database includes 122 proteomes, 97 eukaryotic and 25 bacterial ones that can be divided into 9 kingdoms and 5 phyla of bacteria. The database includes 1,449,561 protein sequences and 771,786 sequences of proteins with GO annotations. We have determined homorepeats and patterns that are associated with some function. Through our web server, the user can do the following: (i) search for proteins with the given homorepeat in 122 proteomes, including GO annotation for these proteins; (ii) search for proteins with the given disordered pattern from the library of disordered patterns constructed on the clustered Protein Data Bank in 122 proteomes, including GO annotations for these proteins; (iii) analyze lengths of homorepeats in different proteomes; (iv) investigate disordered regions in the chosen proteins in 122 proteomes; (v) study the coupling of different homorepeats in one protein; (vi) determine longest runs for each amino acid inside each proteome; and (vii) download the full list of proteins with the given length of a homorepeat.
Collapse
Affiliation(s)
- Mikhail Yu Lobanov
- Group of Bioinformatics, Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow Region 142290, Russia
| | | | | |
Collapse
|
42
|
Mary Rajathei D, Selvaraj S. Analysis of sequence repeats of proteins in the PDB. Comput Biol Chem 2013; 47:156-66. [PMID: 24121644 DOI: 10.1016/j.compbiolchem.2013.09.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2013] [Revised: 08/27/2013] [Accepted: 09/05/2013] [Indexed: 10/26/2022]
Abstract
Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20-40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain.
Collapse
Affiliation(s)
- David Mary Rajathei
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli 620024, Tamilnadu, India
| | | |
Collapse
|
43
|
Derevyanko AG, Endutkin AV, Ishchenko AA, Saparbaev MK, Zharkov DO. Initiation of 8-oxoguanine base excision repair within trinucleotide tandem repeats. BIOCHEMISTRY (MOSCOW) 2013; 77:270-9. [PMID: 22803944 DOI: 10.1134/s0006297912030054] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Trinucleotide repeat expansion provides a molecular basis for several devastating neurodegenerative diseases. In particular, expansion of a CAG run in the human HTT gene causes Huntington's disease. One of the main reasons for triplet repeat expansion in somatic cells is base excision repair (BER), involving damaged base excision and repair DNA synthesis that may be accompanied by expansion of the repaired strand due to formation of noncanonical DNA structures. We have analyzed the kinetics of excision of a ubiquitously found oxidized purine base, 8-oxoguanine (oxoG), by DNA glycosylase OGG1 from the substrates containing a CAG run flanked by AT-rich sequences. The values of k(2) rate constant for the removal of oxoG from triplets in the middle of the run were higher than for oxoG at the flanks of the run. The value of k(3) rate constant dropped starting from the third CAG-triplet in the run and remained stable until the 3'-terminal triplet, where it decreased even more. In nuclear extracts, the profile of oxoG removal rate along the run resembled the profile of k(2) constant, suggesting that the reaction rate in the extracts is limited by base excision. The fully reconstituted BER was efficient with all substrates unless oxoG was near the 3'-flank of the run, interfering with the initiation of the repair. DNA polymerase β was able to perform a strand-displacement DNA synthesis, which may be important for CAG run expansion initiated by BER.
Collapse
Affiliation(s)
- A G Derevyanko
- Institute of Chemical Biology and Fundamental Medicine, Siberian Division of the Russian Academy of Sciences, Novosibirsk, 630090, Russia
| | | | | | | | | |
Collapse
|
44
|
Walsh I, Sirocco FG, Minervini G, Di Domenico T, Ferrari C, Tosatto SCE. RAPHAEL: recognition, periodicity and insertion assignment of solenoid protein structures. ACTA ACUST UNITED AC 2012; 28:3257-64. [PMID: 22962341 DOI: 10.1093/bioinformatics/bts550] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Repeat proteins form a distinct class of structures where folding is greatly simplified. Several classes have been defined, with solenoid repeats of periodicity between ca. 5 and 40 being the most challenging to detect. Such proteins evolve quickly and their periodicity may be rapidly hidden at sequence level. From a structural point of view, finding solenoids may be complicated by the presence of insertions or multiple domains. To the best of our knowledge, no automated methods are available to characterize solenoid repeats from structure. RESULTS Here we introduce RAPHAEL, a novel method for the detection of solenoids in protein structures. It reliably solves three problems of increasing difficulty: (1) recognition of solenoid domains, (2) determination of their periodicity and (3) assignment of insertions. RAPHAEL uses a geometric approach mimicking manual classification, producing several numeric parameters that are optimized for maximum performance. The resulting method is very accurate, with 89.5% of solenoid proteins and 97.2% of non-solenoid proteins correctly classified. RAPHAEL periodicities have a Spearman correlation coefficient of 0.877 against the manually established ones. A baseline algorithm for insertion detection in identified solenoids has a Q(2) value of 79.8%, suggesting room for further improvement. RAPHAEL finds 1931 highly confident repeat structures not previously annotated as solenoids in the Protein Data Bank records.
Collapse
Affiliation(s)
- Ian Walsh
- Department of Biology, University of Padua, Viale G. Colombo 3, 35131 Padova, Italy
| | | | | | | | | | | |
Collapse
|
45
|
Affiliation(s)
- Julien Jorda
- Centre de Recherches de Biochimie Macromoléculaire UMR 5237, CNRS; University of Montpellier; 1 and 2 Montpellier France
- UCLA-DOE Institute for Genomics and Proteomics; Los Angeles CA USA
| | - Thierry Baudrand
- Centre de Recherches de Biochimie Macromoléculaire UMR 5237, CNRS; University of Montpellier; 1 and 2 Montpellier France
| | - Andrey V. Kajava
- Centre de Recherches de Biochimie Macromoléculaire UMR 5237, CNRS; University of Montpellier; 1 and 2 Montpellier France
| |
Collapse
|
46
|
Lobanov MY, Bogatyreva NS, Galzitskaya OV. Occurrence of six-amino-acid motifs in three eukaryotic proteomes. Mol Biol 2012. [DOI: 10.1134/s0026893312010128] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
47
|
Lobanov MY, Galzitskaya OV. Occurrence of disordered patterns and homorepeats in eukaryotic and bacterial proteomes. ACTA ACUST UNITED AC 2012; 8:327-37. [DOI: 10.1039/c1mb05318c] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
48
|
Faux N. Single amino acid and trinucleotide repeats: function and evolution. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2012; 769:26-40. [PMID: 23560303 DOI: 10.1007/978-1-4614-5434-2_3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The most well known effect of single amino acid repeat expansion, beyond a certain threshold, is the development of a specific disease, depending on the protein in which the expansion has occurred. For example, the expansion of the glutamine repeat in huntingtin leads to the debilitating neurodegenerative disease, Huntington's disease. Similarly, there are a range of other disorders caused by trinucleotide repeat expansions encoding polyglutamine or polyalanine tracts. The age of onset of the polyglutamine-induced neurodegenerative diseases is usually negatively correlated with the length of expanded CAG/glutamine repeat. However, recent studies have given evidence that single amino acid repeats may also play critical roles in normal protein function and that changes in the length of single amino acid repeats is likely to play a beneficial role in evolution. This chapter will look at the prevalence, function and possible role single amino acid repeats have in evolution and other biological processes.
Collapse
Affiliation(s)
- Noel Faux
- Mental Health Research Institute, The University of Melbourne, Parkville, Victoria, Australia.
| |
Collapse
|
49
|
Kajava AV. Tandem repeats in proteins: from sequence to structure. J Struct Biol 2011; 179:279-88. [PMID: 21884799 DOI: 10.1016/j.jsb.2011.08.009] [Citation(s) in RCA: 152] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Revised: 08/15/2011] [Accepted: 08/17/2011] [Indexed: 10/17/2022]
Abstract
The bioinformatics analysis of proteins containing tandem repeats requires special computer programs and databases, since the conventional approaches predominantly developed for globular domains have limited success. Here, I survey bioinformatics tools which have been developed recently for identification and proteome-wide analysis of protein repeats. The last few years have also been marked by an emergence of new 3D structures of these proteins. Appraisal of the known structures and their classification uncovers a straightforward relationship between their architecture and the length of the repetitive units. This relationship and the repetitive character of structural folds suggest rules for better prediction of the 3D structures of such proteins. Furthermore, bioinformatics approaches combined with low resolution structural data, from biophysical techniques, especially, the recently emerged cryo-electron microscopy, lead to reliable prediction of the protein repeat structures and their mode of binding with partners within molecular complexes. This hybrid approach can actively be used for structural and functional annotations of proteomes.
Collapse
Affiliation(s)
- Andrey V Kajava
- Centre de Recherches de Biochimie Macromoléculaire, CNRS, Université Montpellier 1 et 2, 1919 Route de Mende, 34293 Montpellier, Cedex 5, France.
| |
Collapse
|