51
|
Non-random distribution of homo-repeats: links with biological functions and human diseases. Sci Rep 2016; 6:26941. [PMID: 27256590 PMCID: PMC4891720 DOI: 10.1038/srep26941] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Accepted: 05/06/2016] [Indexed: 12/22/2022] Open
Abstract
The biological function of multiple repetitions of single amino acids, or homo-repeats, is largely unknown, but their occurrence in proteins has been associated with more than 20 hereditary diseases. Analysing 122 bacterial and eukaryotic genomes, we observed that the number of proteins containing homo-repeats is significantly larger than expected from theoretical estimates. Analysis of statistical significance indicates that the minimal size of homo-repeats varies with amino acid type and proteome. In an attempt to characterize proteins harbouring long homo-repeats, we found that those containing polar or small amino acids S, P, H, E, D, K, Q and N are enriched in structural disorder as well as protein- and RNA-interactions. We observed that E, S, Q, G, L, P, D, A and H homo-repeats are strongly linked with occurrence in human diseases. Moreover, S, E, P, A, Q, D and T homo-repeats are significantly enriched in neuronal proteins associated with autism and other disorders. We release a webserver for further exploration of homo-repeats occurrence in human pathology at http://bioinfo.protres.ru/hradis/.
Collapse
|
52
|
Zheng X, Li Y, Zhao J, Wang D, Xia H, Mao Q. Production and Characterization of Monoclonal Antibodies against Human Nuclear Protein FAM76B. PLoS One 2016; 11:e0152237. [PMID: 27018871 PMCID: PMC4809503 DOI: 10.1371/journal.pone.0152237] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 03/10/2016] [Indexed: 11/18/2022] Open
Abstract
Human FAM76B (hFAM76B) is a 39 kDa protein that contains homopolymeric histidine tracts, a targeting signal for nuclear speckles. FAM76B is highly conserved among different species, suggesting that it may play an important physiological role in normal cellular functions. However, a lack of appropriate tools has hampered study of this potentially important protein. To facilitate research into the biological function(s) of FAM76B, murine monoclonal antibodies (MAbs) against hFAM76B were generated by using purified, prokaryotically expressed hFAM76B protein. Six strains of MAbs specific for hFAM76B were obtained and characterized. The specificity of MAbs was validated by using FAM76B-/- HEK 293 cell line. Double immunofluorescence followed by laser confocal microscopy confirmed the nuclear speckle localization of hFAM76B, and the specific domains recognized by different MAbs were further elucidated by Western blot. Due to the high conservation of protein sequences between mouse and human FAM76B, MAbs against hFAM76B were shown to react with mouse FAM76B (mFAM76B) specifically. Lastly, FAM76B was found to be expressed in the normal tissues of most human organs, though to different extents. The MAbs produced in this study should provide a useful tool for investigating the biological function(s) of FAM76B.
Collapse
Affiliation(s)
- Xiaojing Zheng
- Co-Innovation Center for Qinba Regions’ Sustainable Development, College of Life Sciences, Shaanxi Normal University, Xi’an, 710062, Shaanxi, P. R. China
| | - Yanqing Li
- Co-Innovation Center for Qinba Regions’ Sustainable Development, College of Life Sciences, Shaanxi Normal University, Xi’an, 710062, Shaanxi, P. R. China
| | - Junli Zhao
- Co-Innovation Center for Qinba Regions’ Sustainable Development, College of Life Sciences, Shaanxi Normal University, Xi’an, 710062, Shaanxi, P. R. China
| | - Dongyang Wang
- Co-Innovation Center for Qinba Regions’ Sustainable Development, College of Life Sciences, Shaanxi Normal University, Xi’an, 710062, Shaanxi, P. R. China
| | - Haibin Xia
- Co-Innovation Center for Qinba Regions’ Sustainable Development, College of Life Sciences, Shaanxi Normal University, Xi’an, 710062, Shaanxi, P. R. China
- * E-mail:
| | - Qinwen Mao
- Department of Pathology, Northwestern University Feinberg School of Medicine, Chicago, Illinois, 60611, United States of America
| |
Collapse
|
53
|
Wu R, Liu Q, Zhang P, Liang D. Tandem amino acid repeats in the green anole (Anolis carolinensis) and other squamates may have a role in increasing genetic variability. BMC Genomics 2016; 17:109. [PMID: 26868501 PMCID: PMC4751654 DOI: 10.1186/s12864-016-2430-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2015] [Accepted: 02/02/2016] [Indexed: 01/04/2023] Open
Abstract
Background Tandem amino acid repeats are characterised by the consecutive recurrence of a single amino acid. They exhibit high rates of length mutations in addition to point mutations and have been proposed to be involved in genetic plasticity. Squamate reptiles (lizards and snakes) diversify in both morphology and physiology. The underlying mechanism is yet to be understood. In a previous phylogenomic analysis of reptiles, the density of tandem repeats in an anole lizard diverged heavily from that of the other reptiles. To gain further insight into the tandem amino acid repeats in squamates, we analysed the repeat content in the green anole (Anolis carolinensis) proteome and compared the amino acid repeats in a large orthologous protein data set from six vertebrates (the Western clawed frog, the green anole, the Chinese softshell turtle, the zebra finch, mouse and human). Results Our results revealed that the number of amino acid repeats in the green anole exceeded those found in the other five species studied. Species-only repeats were found in high proportion in the green anole but not in the other five species, suggesting that the green anole had gained many amino acid repeats in either the Anolis or the squamate lineage. Since the amino acid repeat containing genes in the green anole were highly enriched in genes related to transcription and development, an important family of developmental genes, i.e., the Hox family, was further studied in a wide collection of squamates. Abundant amino acid repeats were also observed, implying the general high tolerance of amino acid repeats in squamates. A particular enrichment of amino acid repeats was observed in the central class Hox genes that are known to be responsible for defining cervical to lumbar regions. Conclusions Our study suggests that the abundant amino acid repeats in the green anole, and possibly in other squamates, may play a role in increasing the genetic variability, and contribute to the evolutionary diversity of this clade. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2430-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Riga Wu
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, People's Republic of China.
| | - Qingfeng Liu
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, People's Republic of China.
| | - Peng Zhang
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, People's Republic of China.
| | - Dan Liang
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, People's Republic of China.
| |
Collapse
|
54
|
Wu LZ, Xu XY, Liu YF, Ge X, Wang XJ. Expansion of polyalanine tracts in the QA domain may play a critical role in the clavicular development of cleidocranial dysplasia. J Genet 2015; 94:551-3. [PMID: 26440098 DOI: 10.1007/s12041-015-0551-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Li-Zheng Wu
- State Key Laboratory of Military Stomatology, Department of Pediatric Dentistry, School of Stomatology, The Fourth Military Medical University, Xi'an, Shaanxi 710032, People's Republic of China.
| | | | | | | | | |
Collapse
|
55
|
Martins F, Gonçalves R, Oliveira J, Cruz-Monteagudo M, Nieto-Villar JM, Paz-y-Miño C, Rebelo I, Tejera E. Unravelling the relationship between protein sequence and low-complexity regions entropies: Interactome implications. J Theor Biol 2015; 382:320-7. [PMID: 26164061 DOI: 10.1016/j.jtbi.2015.06.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 06/12/2015] [Accepted: 06/28/2015] [Indexed: 10/23/2022]
Abstract
Low-complexity regions are sub-sequences of biased composition in a protein sequence. The influence of these regions over protein evolution, specific functions and highly interactive capacities is well known. Although protein sequence entropy has been largely studied, its relationship with low-complexity regions and the subsequent effects on protein function remains unclear. In this work we propose a theoretical and empirical model integrating the sequence entropy with local complexity parameters. Our results indicate that the protein sequence entropy is related with the protein length, the entropies inside and outside the low-complexity regions as well as their number and average size. We found a small but significant increment in the sequence entropy of hubs proteins. In agreement with our theoretical model, this increment is highly dependent of the balance between the increment of protein length and average size of the low-complexity regions. Finally, our models and proteins analysis provide evidence supporting that modifications in the average size is more relevant in hubs proteins than changes in the number of low-complexity regions.
Collapse
Affiliation(s)
- F Martins
- Department of Biochemistry, Faculty of Pharmacy, University of Porto, Portugal
| | - R Gonçalves
- Department of Biochemistry, Faculty of Pharmacy, University of Porto, Portugal
| | - J Oliveira
- Department of Biochemistry, Faculty of Pharmacy, University of Porto, Portugal
| | - M Cruz-Monteagudo
- Instituto de Investigaciones Biomédicas, Universidad de las Américas, Quito, Ecuador
| | - J M Nieto-Villar
- Dpto. de Química-Física, Fac. de Química, Universidad de La Habana, Cuba. Cátedra de Sistemas Complejos "H. Poincaré", Universidad de La Habana, Cuba
| | - C Paz-y-Miño
- Instituto de Investigaciones Biomédicas, Universidad de las Américas, Quito, Ecuador
| | - I Rebelo
- Department of Biochemistry, Faculty of Pharmacy, University of Porto, Portugal; UCIBIO@REQUIMTE, Portugal.
| | - E Tejera
- Instituto de Investigaciones Biomédicas, Universidad de las Américas, Quito, Ecuador
| |
Collapse
|
56
|
Radó-Trilla N, Arató K, Pegueroles C, Raya A, de la Luna S, Albà MM. Key Role of Amino Acid Repeat Expansions in the Functional Diversification of Duplicated Transcription Factors. Mol Biol Evol 2015; 32:2263-72. [PMID: 25931513 PMCID: PMC4540963 DOI: 10.1093/molbev/msv103] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
The high regulatory complexity of vertebrates has been related to two rounds of whole genome duplication (2R-WGD) that occurred before the divergence of the major vertebrate groups. Following these events, many developmental transcription factors (TFs) were retained in multiple copies and subsequently specialized in diverse functions, whereas others reverted to their singleton state. TFs are known to be generally rich in amino acid repeats or low-complexity regions (LCRs), such as polyalanine or polyglutamine runs, which can evolve rapidly and potentially influence the transcriptional activity of the protein. Here we test the hypothesis that LCRs have played a major role in the diversification of TF gene duplicates. We find that nearly half of the TF gene families originated during the 2R-WGD contains LCRs. The number of gene duplicates with LCRs is 155 out of 550 analyzed (28%), about twice as many as the number of single copy genes with LCRs (15 out of 115, 13%). In addition, duplicated TFs preferentially accumulate certain LCR types, the most prominent of which are alanine repeats. We experimentally test the role of alanine-rich LCRs in two different TF gene families, PHOX2A/PHOX2B and LHX2/LHX9. In both cases, the presence of the alanine-rich LCR in one of the copies (PHOX2B and LHX2) significantly increases the capacity of the TF to activate transcription. Taken together, the results provide strong evidence that LCRs are important driving forces of evolutionary change in duplicated genes.
Collapse
Affiliation(s)
- Núria Radó-Trilla
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Barcelona, Spain
| | - Krisztina Arató
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain Centre for Genomic Regulation (CRG), Barcelona, Spain Centro de Investigación Biomèdica en Red en Enfermedades Raras (CIBERER), Barcelona, Spain
| | - Cinta Pegueroles
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Barcelona, Spain Centre for Genomic Regulation (CRG), Barcelona, Spain
| | - Alicia Raya
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain Centre for Genomic Regulation (CRG), Barcelona, Spain Centro de Investigación Biomèdica en Red en Enfermedades Raras (CIBERER), Barcelona, Spain
| | - Susana de la Luna
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain Centre for Genomic Regulation (CRG), Barcelona, Spain Centro de Investigación Biomèdica en Red en Enfermedades Raras (CIBERER), Barcelona, Spain Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - M Mar Albà
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Barcelona, Spain Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
57
|
Wear MP, Kryndushkin D, O’Meally R, Sonnenberg JL, Cole RN, Shewmaker FP. Proteins with Intrinsically Disordered Domains Are Preferentially Recruited to Polyglutamine Aggregates. PLoS One 2015; 10:e0136362. [PMID: 26317359 PMCID: PMC4552826 DOI: 10.1371/journal.pone.0136362] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2015] [Accepted: 07/31/2015] [Indexed: 12/12/2022] Open
Abstract
Intracellular protein aggregation is the hallmark of several neurodegenerative diseases. Aggregates formed by polyglutamine (polyQ)-expanded proteins, such as Huntingtin, adopt amyloid-like structures that are resistant to denaturation. We used a novel purification strategy to isolate aggregates formed by human Huntingtin N-terminal fragments with expanded polyQ tracts from both yeast and mammalian (PC-12) cells. Using mass spectrometry we identified the protein species that are trapped within these polyQ aggregates. We found that proteins with very long intrinsically-disordered (ID) domains (≥100 amino acids) and RNA-binding proteins were disproportionately recruited into aggregates. The removal of the ID domains from selected proteins was sufficient to eliminate their recruitment into polyQ aggregates. We also observed that several neurodegenerative disease-linked proteins were reproducibly trapped within the polyQ aggregates purified from mammalian cells. Many of these proteins have large ID domains and are found in neuronal inclusions in their respective diseases. Our study indicates that neurodegenerative disease-associated proteins are particularly vulnerable to recruitment into polyQ aggregates via their ID domains. Also, the high frequency of ID domains in RNA-binding proteins may explain why RNA-binding proteins are frequently found in pathological inclusions in various neurodegenerative diseases.
Collapse
Affiliation(s)
- Maggie P. Wear
- Department of Pharmacology, Uniformed Services University of the Heath Sciences, Bethesda, Maryland, 20814, United States of America
| | - Dmitry Kryndushkin
- Department of Pharmacology, Uniformed Services University of the Heath Sciences, Bethesda, Maryland, 20814, United States of America
| | - Robert O’Meally
- Johns Hopkins Mass Spectrometry and Proteomic Facility, Johns Hopkins University, Baltimore, Maryland, 21218, United States of America
| | - Jason L. Sonnenberg
- Chemistry department, School of Sciences, Stevenson University, Stevenson, Maryland, 21153, United States of America
| | - Robert N. Cole
- Johns Hopkins Mass Spectrometry and Proteomic Facility, Johns Hopkins University, Baltimore, Maryland, 21218, United States of America
| | - Frank P. Shewmaker
- Department of Pharmacology, Uniformed Services University of the Heath Sciences, Bethesda, Maryland, 20814, United States of America
- * E-mail:
| |
Collapse
|
58
|
Wei W, Davis RE, Suo X, Zhao Y. Occurrence, distribution and possible functional roles of simple sequence repeats in phytoplasma genomes. Int J Syst Evol Microbiol 2015; 65:2748-2760. [DOI: 10.1099/ijs.0.000273] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Phytoplasmas are unculturable, cell-wall-less bacteria that parasitize plants and insects. This transkingdom life cycle requires rapid responses to vastly different environments, including transitions from plant phloem sieve elements to various insect tissues and alternations among diverse plant hosts. Features that enable such flexibility in other microbes include simple sequence repeats (SSRs) — mutation-prone, phase-variable short DNA tracts that function as ‘evolutionary rheostats’ and enhance rapid adaptations. To gain insights into the occurrence, distribution and potentially functional roles of SSRs in phytoplasmas, we performed computational analysis on the genomes of five completely sequenced phytoplasma strains, ‘Candidatus Phytoplasma asteris’-related strains OYM and AYWB, ‘Candidatus Phytoplasma australiense’-related strains CBWB and SLY and ‘Candidatus Phytoplasma mali’-related strain AP-AT. The overall density of SSRs in phytoplasma genomes was higher than in representative strains of other prokaryotes. While mono- and trinucleotide SSRs were significantly overrepresented in the phytoplasma genomes, dinucleotide SSRs and other higher-order SSRs were underrepresented. The occurrence and distribution of long SSRs in the prophage islands and phytoplasma-unique genetic loci indicated that SSRs played a role in compounding the complexity of sequence mosaics in individual genomes and in increasing allelic diversity among genomes. Findings from computational analyses were further complemented by an examination of SSRs in varied additional phytoplasma strains, with a focus on potential contingency genes. Some SSRs were located in regions that could profoundly alter the regulation of transcription and translation of affected genes and/or the composition of protein products.
Collapse
Affiliation(s)
- Wei Wei
- Molecular Plant Pathology Laboratory, USDA-Agricultural Research Service, Beltsville, MD, 20705, USA
| | - Robert E. Davis
- Molecular Plant Pathology Laboratory, USDA-Agricultural Research Service, Beltsville, MD, 20705, USA
| | - Xiaobing Suo
- Molecular Plant Pathology Laboratory, USDA-Agricultural Research Service, Beltsville, MD, 20705, USA
| | - Yan Zhao
- Molecular Plant Pathology Laboratory, USDA-Agricultural Research Service, Beltsville, MD, 20705, USA
| |
Collapse
|
59
|
Lu X, Murphy RM. Asparagine Repeat Peptides: Aggregation Kinetics and Comparison with Glutamine Repeats. Biochemistry 2015. [PMID: 26204228 DOI: 10.1021/acs.biochem.5b00644] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Amino acid repeat runs are common occurrences in eukaryotic proteins, with glutamine (Q) and asparagine (N) as particularly frequent repeats. Abnormal expansion of Q-repeat domains causes at least nine neurodegenerative disorders, most likely because expansion leads to protein misfolding, aggregation, and toxicity. The linkage between Q-repeats and disease has motivated several investigations into the mechanism of aggregation and the role of Q-repeat length in aggregation. Curiously, glutamine repeats are common in vertebrates, whereas N-repeats are virtually absent in vertebrates, but common in invertebrates. One hypothesis for the lack of N-repeats in vertebrates is biophysical; that is, there is strong selective pressure in higher organisms against aggregation-prone proteins. If true, then asparagine and glutamine repeats must differ substantially in their aggregation properties despite their chemical similarities. In this work, aggregation of peptides with asparagine repeats of variable length (12-24) were characterized and compared to that of similar peptides with glutamine repeats. As with glutamine, aggregation of N-repeat peptides was strongly length-dependent. Replacement of glutamine with asparagine caused a subtle shift in the conformation of the monomer, which strongly affected the rate of aggregation. Specifically, N-repeat peptides adopted β-turn structural elements, leading to faster self-assembly into globular oligomers and much more rapid conversion into fibrillar aggregates, compared to Q-repeat peptides. These biophysical differences may account for the differing biological roles of N- versus Q-repeat domains.
Collapse
Affiliation(s)
- Xiaomeng Lu
- †Biophysics Program and ‡Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Regina M Murphy
- †Biophysics Program and ‡Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
60
|
Banerji J. Asparaginase treatment side-effects may be due to genes with homopolymeric Asn codons (Review-Hypothesis). Int J Mol Med 2015; 36:607-26. [PMID: 26178806 PMCID: PMC4533780 DOI: 10.3892/ijmm.2015.2285] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Accepted: 07/15/2015] [Indexed: 12/14/2022] Open
Abstract
The present treatment of childhood T-cell leukemias involves the systemic administration of prokary-otic L-asparaginase (ASNase), which depletes plasma Asparagine (Asn) and inhibits protein synthesis. The mechanism of therapeutic action of ASNase is poorly understood, as are the etiologies of the side-effects incurred by treatment. Protein expression from genes bearing Asn homopolymeric coding regions (N-hCR) may be particularly susceptible to Asn level fluctuation. In mammals, N-hCR are rare, short and conserved. In humans, misfunctions of genes encoding N-hCR are associated with a cluster of disorders that mimic ASNase therapy side-effects which include impaired glycemic control, dislipidemia, pancreatitis, compromised vascular integrity, and neurological dysfunction. This paper proposes that dysregulation of Asn homeostasis, potentially even by ASNase produced by the microbiome, may contribute to several clinically important syndromes by altering expression of N-hCR bearing genes. By altering amino acid abundance and modulating ribosome translocation rates at codon repeats, the microbiomic environment may contribute to genome decoding and to shaping the proteome. We suggest that impaired translation at poly Asn codons elevates diabetes risk and severity.
Collapse
Affiliation(s)
- Julian Banerji
- Center for Computational and Integrative Biology, MGH, Simches Research Center, Boston, MA 02114, USA
| |
Collapse
|
61
|
Arthur LL, Pavlovic-Djuranovic S, Koutmou KS, Green R, Szczesny P, Djuranovic S. Translational control by lysine-encoding A-rich sequences. SCIENCE ADVANCES 2015; 1:e1500154. [PMID: 26322332 PMCID: PMC4552401 DOI: 10.1126/sciadv.1500154] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Regulation of gene expression involves a wide array of cellular mechanisms that control the abundance of the RNA or protein products of that gene. Here we describe a gene-regulatory mechanism that is based on poly(A) tracks that stall the translation apparatus. We show that creating longer or shorter runs of adenosine nucleotides, without changes in the amino acid sequence, alters the protein output and the stability of mRNA. Sometimes these changes result in the production of an alternative "frame-shifted" protein product. These observations are corroborated using reporter constructs and in the context of recombinant gene sequences. Approximately two percent of genes in the human genome may be subject to this uncharacterized, yet fundamental form of gene regulation. The potential pool of regulated genes encodes many proteins involved in nucleic acid binding. We hypothesize that the genes we identify are part of a large network whose expression is fine-tuned by poly(A)-tracks, and we provide a mechanism through which synonymous mutations may influence gene expression in pathological states.
Collapse
Affiliation(s)
- Laura L. Arthur
- Department of Cell Biology and Physiology, Washington University School of Medicine, 600 South Euclid Avenue, Campus Box 8228, St. Louis, MO 63110, USA
| | - Slavica Pavlovic-Djuranovic
- Department of Cell Biology and Physiology, Washington University School of Medicine, 600 South Euclid Avenue, Campus Box 8228, St. Louis, MO 63110, USA
| | - Kristin S. Koutmou
- Department of Molecular Biology and Genetics, Johns Hopkins School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, USA
| | - Rachel Green
- Department of Molecular Biology and Genetics, Johns Hopkins School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, USA
- Howard Hughes Medical Institute
| | - Pawel Szczesny
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Pawińskiego 5a, 02-106 Warsaw, Poland
- Corresponding author. E-mail: (P.S.); (S.D.)
| | - Sergej Djuranovic
- Department of Cell Biology and Physiology, Washington University School of Medicine, 600 South Euclid Avenue, Campus Box 8228, St. Louis, MO 63110, USA
- Corresponding author. E-mail: (P.S.); (S.D.)
| |
Collapse
|
62
|
Bina M, Wyss P. Impact of the MLL1 morphemes on codon utilization and preservation in CpG Islands. Biopolymers 2015; 103:480-90. [PMID: 25991579 DOI: 10.1002/bip.22681] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2014] [Revised: 05/04/2015] [Accepted: 05/13/2015] [Indexed: 11/07/2022]
Affiliation(s)
- Minou Bina
- Department of Chemistry, Purdue University, West Lafayette, IN, 47907
| | - Phillip Wyss
- Department of Chemistry, Purdue University, West Lafayette, IN, 47907
| |
Collapse
|
63
|
Pandya S, Struck TJ, Mannakee BK, Paniscus M, Gutenkunst RN. Testing whether metazoan tyrosine loss was driven by selection against promiscuous phosphorylation. Mol Biol Evol 2015; 32:144-52. [PMID: 25312910 PMCID: PMC4271526 DOI: 10.1093/molbev/msu284] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Protein tyrosine phosphorylation is a key regulatory modification in metazoans, and the corresponding kinase enzymes have diversified dramatically. This diversification is correlated with a genome-wide reduction in protein tyrosine content, and it was recently suggested that this reduction was driven by selection to avoid promiscuous phosphorylation that might be deleterious. We tested three predictions of this intriguing hypothesis. 1) Selection should be stronger on residues that are more likely to be phosphorylated due to local solvent accessibility or structural disorder. 2) Selection should be stronger on proteins that are more likely to be promiscuously phosphorylated because they are abundant. We tested these predictions by comparing distributions of tyrosine within and among human and yeast orthologous proteins. 3) Selection should be stronger against mutations that create tyrosine versus remove tyrosine. We tested this prediction using human population genomic variation data. We found that all three predicted effects are modest for tyrosine when compared with the other amino acids, suggesting that selection against deleterious phosphorylation was not dominant in driving metazoan tyrosine loss.
Collapse
Affiliation(s)
- Siddharth Pandya
- Department of Molecular and Cellular Biology, University of Arizona
| | - Travis J Struck
- Department of Molecular and Cellular Biology, University of Arizona
| | - Brian K Mannakee
- Department of Molecular and Cellular Biology, University of Arizona Division of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona
| | - Mary Paniscus
- Department of Molecular and Cellular Biology, University of Arizona Graduate Interdisciplinary Program in Genetics, University of Arizona
| | | |
Collapse
|
64
|
Kumari B, Kumar R, Kumar M. Low complexity and disordered regions of proteins have different structural and amino acid preferences. MOLECULAR BIOSYSTEMS 2014; 11:585-94. [PMID: 25468592 DOI: 10.1039/c4mb00425f] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Low complexity regions (LCRs) or non-random regions of a few amino acids are abundantly present in proteins. LCRs are traditionally considered as floppy structures with high solvent accessibility. Thus little attention was paid to them for structural studies. However LCRs have been found to contain information relevant to protein structure and various important functions. The present study is an attempt to understand the structural trend of LCRs. Here we report a study conducted to understand the structural trend, solvent accessibility and amino acid preferences of LCRs. The results show that LCRs might attain any type of secondary structure; however, the helix is frequently seen, whereas sheets occur rarely. We also found that LCRs are not always exposed on the surface. We found insignificant contribution of trans-membrane helices to the overall helix content. The LCRs having a secondary structure have different enrichment and depletion of amino acids from LCRs without a secondary structure and disordered protein sequences. However, LCRs of NMR structures showed compositional and functional similarity to the disordered regions of proteins. We also noted that in ∼3/4 LCRs, the entire amino acid did not have a single structural class, but rather an ensemble of more than one secondary structure, which indicates that they are found at places where structure transition occurs. Overall analysis suggests that the overall protein sequence has a greater influence on the structural and sequence enrichment rather than only the local amino acid composition of LCRs.
Collapse
Affiliation(s)
- Bandana Kumari
- Department of Biophysics, University of Delhi South Campus, New Delhi, India.
| | | | | |
Collapse
|
65
|
Mandal A, Mandal S, Park MH. Genome-wide analyses and functional classification of proline repeat-rich proteins: potential role of eIF5A in eukaryotic evolution. PLoS One 2014; 9:e111800. [PMID: 25364902 PMCID: PMC4218817 DOI: 10.1371/journal.pone.0111800] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 10/06/2014] [Indexed: 12/16/2022] Open
Abstract
The eukaryotic translation factor, eIF5A has been recently reported as a sequence-specific elongation factor that facilitates peptide bond formation at consecutive prolines in Saccharomyces cerevisiae, as its ortholog elongation factor P (EF-P) does in bacteria. We have searched the genome databases of 35 representative organisms from six kingdoms of life for PPP (Pro-Pro-Pro) and/or PPG (Pro-Pro-Gly)-encoding genes whose expression is expected to depend on eIF5A. We have made detailed analyses of proteome data of 5 selected species, Escherichia coli, Saccharomyces cerevisiae, Drosophila melanogaster, Mus musculus and Homo sapiens. The PPP and PPG motifs are low in the prokaryotic proteomes. However, their frequencies markedly increase with the biological complexity of eukaryotic organisms, and are higher in newly derived proteins than in those orthologous proteins commonly shared in all species. Ontology classifications of S. cerevisiae and human genes encoding the highest level of polyprolines reveal their strong association with several specific biological processes, including actin/cytoskeletal associated functions, RNA splicing/turnover, DNA binding/transcription and cell signaling. Previously reported phenotypic defects in actin polarity and mRNA decay of eIF5A mutant strains are consistent with the proposed role for eIF5A in the translation of the polyproline-containing proteins. Of all the amino acid tandem repeats (≥3 amino acids), only the proline repeat frequency correlates with functional complexity of the five organisms examined. Taken together, these findings suggest the importance of proline repeat-rich proteins and a potential role for eIF5A and its hypusine modification pathway in the course of eukaryotic evolution.
Collapse
Affiliation(s)
- Ajeet Mandal
- Oral and Pharyngeal Cancer Branch, NIDCR, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Swati Mandal
- Oral and Pharyngeal Cancer Branch, NIDCR, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Myung Hee Park
- Oral and Pharyngeal Cancer Branch, NIDCR, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
66
|
Imprasittichail W, Roytrakul S, Krungkrai SR, Krungkrail J. A unique insertion of low complexity amino acid sequence underlies protein-protein interaction in human malaria parasite orotate phosphoribosyltransferase and orotidine 5'-monophosphate decarboxylase. ASIAN PAC J TROP MED 2014; 7:184-92. [PMID: 24507637 DOI: 10.1016/s1995-7645(14)60018-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2013] [Revised: 09/15/2013] [Accepted: 01/15/2014] [Indexed: 11/17/2022] Open
Abstract
OBJECTIVE To investigate the multienzyme complex formation of human malaria parasite Plasmodium falciparum (P. falciparum) orotate phosphoribosyltransferase (OPRT) and orotidine 5'-monophosphate decarboxylase (OMPDC), the fifth and sixth enzyme of the de novo pyrimidine biosynthetic pathway. Previously, we have clearly established that the two enzymes in the malaria parasite exist physically as a heterotetrameric (OPRT)2(OMPDC)2 complex containing two subunits each of OPRT and OMPDC, and that the complex have catalytic kinetic advantages over the monofunctional enzyme. METHODS Both enzymes were cloned and expressed as recombinant proteins. The protein-protein interaction in the enzyme complex was identified using bifunctional chemical cross-linker, liquid chromatography-mass spectrometric analysis and homology modeling. RESULTS The unique insertions of low complexity region at the α 2 and α 5 helices of the parasite OMPDC, characterized by single amino acid repeat sequence which was not found in homologous proteins from other organisms, was located on the OPRT-OMPDC interface. The structural models for the protein-protein interaction of the heterotetrameric (OPRT)2(OMPDC)2 multienzyme complex were proposed. CONCLUSIONS Based on the proteomic data and structural modeling, it is surmised that the human malaria parasite low complexity region is responsible for the OPRT-OMPDC interaction. The structural complex of the parasite enzymes, thus, represents an efficient functional kinetic advantage, which in line with co-localization principles of evolutional origin, and allosteric control in protein-protein-interactions.
Collapse
Affiliation(s)
- Waranya Imprasittichail
- Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
| | - Sittiruk Roytrakul
- National Center for Genetic Engineering and Biotechnology, Pathumthani 12120, Thailand
| | - Sudaratana R Krungkrai
- Unit of Biochemistry, Department of Medical Science, Faculty of Science, Rangsit University, Pathumthani 12000, Thailand
| | - Jerapan Krungkrail
- Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand.
| |
Collapse
|
67
|
Lu X, Murphy RM. Synthesis and disaggregation of asparagine repeat-containing peptides. J Pept Sci 2014; 20:860-7. [PMID: 25044797 DOI: 10.1002/psc.2677] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2014] [Revised: 06/12/2014] [Accepted: 06/26/2014] [Indexed: 01/21/2023]
Abstract
Of all amino acid repeats in eukaryotes, polyglutamine (polyQ) is the most frequent, followed by polyasparagine (polyN). Glutamine repeats are expanded in proteins associated with several neurodegenerative disorders. The expanded polyQ domain is known to induce aggregation, and it is hypothesized that aggregation is directly causative of pathology. Despite the widespread presence of asparagine repeats in invertebrate eukaryotes, polyN is curiously quite rare in vertebrates. Several investigators have characterized the conformational and aggregation properties of polyQ-containing peptides and proteins, and to a lesser extent, peptides containing mixed glutamine and asparagine, but to our knowledge, there is no detailed characterization of polyN-containing peptides. Such a comparison could elucidate reasons for the paucity of asparagine repeats in humans. In this study, we synthesized a peptide containing a 24-asparagine repeat (N24). For aggregation studies, it is critical to start with monomeric unaggregated peptide. A protocol involving dissolution in mixed trifluoroacetic acid and hexafluoroisopropanol (TFA + HFIP) solvents is widely used for disaggregation of polyQ peptides. We used the same protocol for N24 but discovered that there was both oxidative damage and insufficient disaggregation. Oxidation of tryptophan, used as a flanking residue, was common. Moreover, we found evidence of Förster resonance energy transfer between Trp and its oxidation product N-formylkynurenine, even in chemical denaturants. This suggested that N24 was insufficiently disaggregated, a conclusion that was further supported by gel electrophoresis analysis. Oxidation was reduced, but not eliminated, by addition of methionine to the buffer. Formic acid proved to be a better disaggregator and caused no oxidative damage. The glutamine repeat peptide Q24 also underwent some oxidation after extended incubation in TFA + HFIP, but there was no evidence of Förster resonance energy transfer, and samples appeared monomeric by gel electrophoresis. This result indicates that polyN-containing peptides self-associate more strongly than polyQ-containing peptides. Circular dichroism spectra reveal a greater propensity for β-turn formation in polyN than polyQ, providing an explanation for the increased stability of polyN aggregates relative to polyQ.
Collapse
Affiliation(s)
- Xiaomeng Lu
- Biophysics Program, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | | |
Collapse
|
68
|
Perticaroli S, Nickels JD, Ehlers G, Mamontov E, Sokolov AP. Dynamics and rigidity in an intrinsically disordered protein, β-casein. J Phys Chem B 2014; 118:7317-26. [PMID: 24918971 DOI: 10.1021/jp503788r] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The emergence of intrinsically disordered proteins (IDPs) as a recognized structural class has forced the community to confront a new paradigm of structure, dynamics, and mechanical properties for proteins. We present novel data on the similarities and differences in the dynamics and nanomechanical properties of IDPs and other biomacromolecules on the picosecond time scale. An IDP, β-casein (CAS), has been studied in a calcium bound and unbound state using neutron and light scattering techniques. We show that CAS partially folds and stiffens upon calcium binding, but in the unfolded state, it is softer than folded proteins such as green fluorescence protein (GFP). We also see that some localized diffusive motions in CAS have a larger amplitude than in GFP at this time scale but are still smaller than those observed in tRNA. In spite of these differences, CAS dynamics are consistent with the classes of motions seen in folded protein on this time scale.
Collapse
Affiliation(s)
- Stefania Perticaroli
- Joint Institute for Neutron Sciences, Oak Ridge National Laboratory , Oak Ridge, Tennessee 37831, United States
| | | | | | | | | |
Collapse
|
69
|
Zorgani MA, Patron K, Desvaux M. New insight in the structural features of haloadaptation in α-amylases from halophilic Archaea following homology modeling strategy: folded and stable conformation maintained through low hydrophobicity and highly negative charged surface. J Comput Aided Mol Des 2014; 28:721-34. [DOI: 10.1007/s10822-014-9754-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Accepted: 05/16/2014] [Indexed: 11/24/2022]
|
70
|
Wolfe KJ, Ren HY, Trepte P, Cyr DM. Polyglutamine-rich suppressors of huntingtin toxicity act upstream of Hsp70 and Sti1 in spatial quality control of amyloid-like proteins. PLoS One 2014; 9:e95914. [PMID: 24828240 PMCID: PMC4020751 DOI: 10.1371/journal.pone.0095914] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2013] [Accepted: 04/01/2014] [Indexed: 11/30/2022] Open
Abstract
Protein conformational maladies such as Huntington Disease are characterized by accumulation of intracellular and extracellular protein inclusions containing amyloid-like proteins. There is an inverse correlation between proteotoxicity and aggregation, so facilitated protein aggregation appears cytoprotective. To define mechanisms for protective protein aggregation, a screen for suppressors of nuclear huntingtin (Htt103Q) toxicity was conducted. Nuclear Htt103Q is highly toxic and less aggregation prone than its cytosolic form, so we identified suppressors of cytotoxicity caused by Htt103Q tagged with a nuclear localization signal (NLS). High copy suppressors of Htt103Q-NLS toxicity include the polyQ-domain containing proteins Nab3, Pop2, and Cbk1, and each suppresses Htt toxicity via a different mechanism. Htt103Q-NLS appears to inactivate the essential functions of Nab3 in RNA processing in the nucleus. Function of Pop2 and Cbk1 is not impaired by nuclear Htt103Q, as their respective polyQ-rich domains are sufficient to suppress Htt103Q toxicity. Pop2 is a subunit of an RNA processing complex and is localized throughout the cytoplasm. Expression of just the Pop2 polyQ domain and an adjacent proline-rich stretch is sufficient to suppress Htt103Q toxicity. The proline-rich domain in Pop2 resembles an aggresome targeting signal, so Pop2 may act in trans to positively impact spatial quality control of Htt103Q. Cbk1 accumulates in discrete perinuclear foci and overexpression of the Cbk1 polyQ domain concentrates diffuse Htt103Q into these foci, which correlates with suppression of Htt toxicity. Protective action of Pop2 and Cbk1 in spatial quality control is dependent upon the Hsp70 co-chaperone Sti1, which packages amyloid-like proteins into benign foci. Protein:protein interactions between Htt103Q and its intracellular neighbors lead to toxic and protective outcomes. A subset of polyQ-rich proteins buffer amyloid toxicity by funneling toxic aggregation intermediates to the Hsp70/Sti1 system for spatial organization into benign species.
Collapse
Affiliation(s)
- Katie J. Wolfe
- Department of Cell Biology and Physiology, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Hong Yu Ren
- Department of Cell Biology and Physiology, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Philipp Trepte
- Neuroproteomics, Max Delbrueck Center for Molecular Medicine, Berlin, Germany
| | - Douglas M. Cyr
- Department of Cell Biology and Physiology, University of North Carolina, Chapel Hill, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
71
|
Ahmed Z, Gurusaran M, Narayana P, Kumar KSD, Mohanapriya J, Vaishnavi MK, Sekar K. PPS: A computing engine to find Palindromes in all Protein sequences. Bioinformation 2014; 10:48-51. [PMID: 24516327 PMCID: PMC3916820 DOI: 10.6026/97320630010048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2014] [Revised: 01/23/2014] [Accepted: 01/24/2014] [Indexed: 11/23/2022] Open
Abstract
UNLABELLED The primary structure of a protein molecule comprises a linear chain of amino acid residues. Certain parts of this linear chain are unique in nature and function. They can be classified under different categories and their roles studied in detail. Two such unique categories are the palindromic sequences and the Single Amino Acid Repeats (SAARs), which plays a major role in the structure, function and evolution of the protein molecule. In spite of their presence in various protein sequences, palindromes have not yet been investigated in detail. Thus, to enable a comprehensive understanding of these sequences, a computing engine, PPS, has been developed. The users can search the occurrences of palindromes and SAARs in all the protein sequences available in various databases and can view the three-dimensional structures (in case it is available in the known three-dimensional protein structures deposited to the Protein Data Bank) using the graphics plug-in Jmol. The proposed server is the first of its kind and can be freely accessed through the World Wide Web. AVAILABILITY URL http://pranag.physics.iisc.ernet.in/pps/
Collapse
Affiliation(s)
- Zameer Ahmed
- Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore 560 012, India
| | - Manickam Gurusaran
- Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore 560 012, India
| | - Prasanth Narayana
- Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore 560 012, India
| | - Kala Sekar Dinesh Kumar
- Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore 560 012, India
| | - Jayapal Mohanapriya
- Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore 560 012, India
| | | | - Kanagaraj Sekar
- Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore 560 012, India
| |
Collapse
|
72
|
Persi E, Horn D. Systematic analysis of compositional order of proteins reveals new characteristics of biological functions and a universal correlate of macroevolution. PLoS Comput Biol 2013; 9:e1003346. [PMID: 24278003 PMCID: PMC3836704 DOI: 10.1371/journal.pcbi.1003346] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2013] [Accepted: 10/03/2013] [Indexed: 01/01/2023] Open
Abstract
We present a novel analysis of compositional order (CO) based on the occurrence of Frequent amino-acid Triplets (FTs) that appear much more than random in protein sequences. The method captures all types of proteomic compositional order including single amino-acid runs, tandem repeats, periodic structure of motifs and otherwise low complexity amino-acid regions. We introduce new order measures, distinguishing between ‘regularity’, ‘periodicity’ and ‘vocabulary’, to quantify these phenomena and to facilitate the identification of evolutionary effects. Detailed analysis of representative species across the tree-of-life demonstrates that CO proteins exhibit numerous functional enrichments, including a wide repertoire of particular patterns of dependencies on regularity and periodicity. Comparison between human and mouse proteomes further reveals the interplay of CO with evolutionary trends, such as faster substitution rate in mouse leading to decrease of periodicity, while innovation along the human lineage leads to larger regularity. Large-scale analysis of 94 proteomes leads to systematic ordering of all major taxonomic groups according to FT-vocabulary size. This is measured by the count of Different Frequent Triplets (DFT) in proteomes. The latter provides a clear hierarchical delineation of vertebrates, invertebrates, plants, fungi and prokaryotes, with thermophiles showing the lowest level of FT-vocabulary. Among eukaryotes, this ordering correlates with phylogenetic proximity. Interestingly, in all kingdoms CO accumulation in the proteome has universal characteristics. We suggest that CO is a genomic-information correlate of both macroevolution and various protein functions. The results indicate a mechanism of genomic ‘innovation’ at the peptide level, involved in protein elongation, shaped in a universal manner by mutational and selective forces. Variations in compositionally ordered (CO) sections of proteins, such as amino acid runs, tandem repeats and low complexity regions, are often considered as a third type of genomic variation along with SNP and CNV. At the microevolutionary scale, they are involved in the rapid evolution of numerous biological functions and the development of novel phenotypic complex traits, including disease in human, in particular neurodegeneration and cancer. At the macroevolutionary scale, the best discriminating proteomic factor between super-kingdoms is the prevalence of CO proteins in eukaryotes. The analysis of CO structures has so far been quite eclectic. Here we introduce a novel unifying methodology, accounting for all types of low-complexity regions and repetitive phenomena, including the existence of large periodic structures in protein sequences. We define new CO measures providing insights into the correlation of CO with protein function and with evolution. In particular, a large-scale analysis of 94 proteomes shows that the CO vocabulary of frequently appearing amino acid triplets serves as a measure of taxonomic ordering separating major clades from each other. It unravels a missing genomic correlate of macroevolution and serves as a novel phylogenetic tool. This suggests that major CO generation occurs during the creation of a completely new species, i.e. during macroevolutionary events.
Collapse
Affiliation(s)
- Erez Persi
- School of Physics and Astronomy, Tel Aviv University, Tel Aviv, Israel
| | - David Horn
- School of Physics and Astronomy, Tel Aviv University, Tel Aviv, Israel
- * E-mail:
| |
Collapse
|
73
|
Filisetti D, Théobald-Dietrich A, Mahmoudi N, Rudinger-Thirion J, Candolfi E, Frugier M. Aminoacylation of Plasmodium falciparum tRNA(Asn) and insights in the synthesis of asparagine repeats. J Biol Chem 2013; 288:36361-71. [PMID: 24196969 DOI: 10.1074/jbc.m113.522896] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Genome sequencing revealed an extreme AT-rich genome and a profusion of asparagine repeats associated with low complexity regions (LCRs) in proteins of the malarial parasite Plasmodium falciparum. Despite their abundance, the function of these LCRs remains unclear. Because they occur in almost all families of plasmodial proteins, the occurrence of LCRs cannot be associated with any specific metabolic pathway; yet their accumulation must have given selective advantages to the parasite. Translation of these asparagine-rich LCRs demands extraordinarily high amounts of asparaginylated tRNA(Asn). However, unlike other organisms, Plasmodium codon bias is not correlated to tRNA gene copy number. Here, we studied tRNA(Asn) accumulation as well as the catalytic capacities of the asparaginyl-tRNA synthetase of the parasite in vitro. We observed that asparaginylation in this parasite can be considered standard, which is expected to limit the availability of asparaginylated tRNA(Asn) in the cell and, in turn, slow down the ribosomal translation rate when decoding asparagine repeats. This observation strengthens our earlier hypothesis considering that asparagine rich sequences act as "tRNA sponges" and help cotranslational folding of parasite proteins. However, it also raises many questions about the mechanistic aspects of the synthesis of asparagine repeats and about their implications in the global control of protein expression throughout Plasmodium life cycle.
Collapse
Affiliation(s)
- Denis Filisetti
- From the Architecture et Réactivité de l'ARN, Université de Strasbourg, CNRS, Institut de Biologie Moléculaire et Cellulaire, 15 rue René Descartes, 67084 Strasbourg cedex, France and
| | | | | | | | | | | |
Collapse
|
74
|
|
75
|
Willadsen K, Cao MD, Wiles J, Balasubramanian S, Bodén M. Repeat-encoded poly-Q tracts show statistical commonalities across species. BMC Genomics 2013; 14:76. [PMID: 23374135 PMCID: PMC3617014 DOI: 10.1186/1471-2164-14-76] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2012] [Accepted: 01/18/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Among repetitive genomic sequence, the class of tri-nucleotide repeats has received much attention due to their association with human diseases. Tri-nucleotide repeat diseases are caused by excessive sequence length variability; diseases such as Huntington's disease and Fragile X syndrome are tied to an increase in the number of repeat units in a tract. Motivated by the recent discovery of a tri-nucleotide repeat associated genetic defect in Arabidopsis thaliana, this study takes a cross-species approach to investigating these repeat tracts, with the goal of using commonalities between species to identify potential disease-related properties. RESULTS We find that statistical enrichment in regulatory function associations for coding region repeats - previously observed in human - is consistent across multiple organisms. By distinguishing between homo-amino acid tracts that are encoded by tri-nucleotide repeats, and those encoded by varying codons, we show that amino acid repeats - not tri-nucleotide repeats - fully explain these regulatory associations. Using this same separation between repeat- and non-repeat-encoded homo-amino acid tracts, we show that poly-glutamine tracts are disproportionately encoded by tri-nucleotide repeats, and those tracts that are encoded by tri-nucleotide repeats are also significantly longer; these results are consistent across multiple species. CONCLUSION These findings establish similarities in tri-nucleotide repeats across species at the level of protein functionality and protein sequence. The tendency of tri-nucleotide repeats to encode longer poly-glutamine tracts indicates a link with the poly-glutamine repeat diseases. The cross-species nature of this tendency suggests that unknown repeat diseases are yet to be uncovered in other species. Future discoveries of new non-human repeat associated defects may provide the breadth of information needed to unravel the mechanisms that underpin this class of human disease.
Collapse
Affiliation(s)
- Kai Willadsen
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane QLD 4072, Australia
| | | | | | | | | |
Collapse
|
76
|
Tompa P. Hydrogel formation by multivalent IDPs: A reincarnation of the microtrabecular lattice? INTRINSICALLY DISORDERED PROTEINS 2013; 1:e24068. [PMID: 28516006 PMCID: PMC5424804 DOI: 10.4161/idp.24068] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/08/2013] [Revised: 01/31/2013] [Accepted: 02/21/2013] [Indexed: 02/03/2023]
Abstract
Based on high-voltage electron microscopic (HVEM) data of fixed cultured cells, an elaborate three-dimensional network of filaments, including and interconnecting other elements of the cytoskeleton, was observed in cells some half a century ago. Despite many attempts and comparative studies, this “microtrabecular lattice” (MTL) of the cytoplasmic ground substance could not be established as a genuine component of the eukaryotic cell, and is mostly considered today as a sample-preparation artifact of protein adherence and cross-linking to the cytoskeleton. Here we elaborate on the provocative idea that recent observations of hydrogel-forming phase transitions of repetitive regions of intrinsically disordered proteins (IDPs) bear resemblance in creation, organization and physical appearance to the MTL. We review this phenomenon in detail, and suggest that phase transitions of actin regulatory proteins, neurofilament side-arms and other proteins could generate non-uniform spatial distribution of cytoplasmic material in the vicinity of the cytoskeleton that might even give rise to fixation phenomena resembling the MTL. Whether such hydrogel formation by IDPs is a general physical phenomenon, will remain to be seen, nevertheless, the underlying organizational principle provokes novel experimental studies to uncover the ensuing higher-level regulation of cell physiology, in which the despised and long-forgotten concept of MTL might give some interesting leads.
Collapse
Affiliation(s)
- Peter Tompa
- VIB Department of Structural Biology; Vrije Universiteit Brussel; Brussels, Belgium.,Institute of Enzymology; Research Centre for Natural Sciences; Hungarian Academy of Sciences; Budapest, Hungary
| |
Collapse
|
77
|
Exploring charged biased regions in the human proteome. Gene 2012; 515:277-80. [PMID: 23266628 DOI: 10.1016/j.gene.2012.11.077] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Revised: 11/12/2012] [Accepted: 11/28/2012] [Indexed: 11/23/2022]
Abstract
There has been an increasing interest in biased regions in proteins especially ever since it was shown that such regions are frequently associated with a structural role in the cell, or with protein disorder. In this study, we focus on charged biased protein sequences in human genome. We have identified 446 charged biased proteins within human proteome, 70% of them constitute proteins harboring negative run that correspond to transcription factor zinc finger proteins, importins and some protein kinases involving acidic activating domains. Basic charge clusters are often associated with DNA-binding, zinc-finger, basic-leucine zipper and homeobox domains. The data show that significant positive clusters correspond to ribosomal proteins. Most of proteins with zinc-binding fingers have a mixed positive and negative charged biased regions. Altogether, the Gene Ontology analysis revealed that the charged proteins are involved mainly in regulatory functions.
Collapse
|
78
|
Khan MKA, Bowler BE. Conformational properties of polyglutamine sequences in guanidine hydrochloride solutions. Biophys J 2012. [PMID: 23199927 DOI: 10.1016/j.bpj.2012.09.041] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Two sets of iso-1-cytochrome c variants have been prepared with N-terminal insertions of pure polyglutamine, i.e., PolyQ variants, or polyglutamine interrupted with lysine every sixth residue, i.e., Gln-rich variants. The polymer properties of these pure polyGln or Gln-rich sequences have been evaluated using equilibrium and kinetic His-heme loop formation methods for loop sizes ranging from 22 to 46 in 1.5, 3.0, and 6.0 M guanidine hydrochloride (GdnHCl). In 6.0 M GdnHCl, the scaling exponent, ν(3), for the pure polyGln sequences, is ~1.7--significantly less than ν(3) ≈ 2.15 for the Gln-rich sequences. The stability of the His-heme loops becomes progressively greater for the pure polyGln sequences relative to the Gln-rich sequences as GdnHCl concentration decreases from 6.0 to 1.5 M. Thus, the context of the sequence effects the polymer properties of Gln repeats even in denaturing concentrations of GdnHCl. Comparison of data for the Gln-rich variants with previous results for Gly-rich and Ala-rich variants shows that ν(3) ~ 2.2 for the Gln-rich, Gly-rich, and Ala-rich sequences in 6.0 M GdnHCl, whereas ν(3) remains unchanged at 3.0 M GdnHCl concentration for the Gln-rich and Ala-rich sequences but decreases to ~1.7 for the Gly-rich sequences. Thus, the polymer properties of Gln-rich and Ala-rich sequences are less sensitive to solvent quality in denaturing solutions of GdnHCl than Gly-rich sequences. Evaluation of Flory's characteristic ratio, C(n), for the Gln-rich and Ala-rich sequences relative to the Gly-rich sequences shows that Gln-rich sequences are stiffer than Ala-rich sequences at both 3.0 and 6.0 M GdnHCl.
Collapse
Affiliation(s)
- Md Khurshid Alam Khan
- Department of Chemistry and Biochemistry, and Center for Biomolecular Structure and Dynamics, University of Montana, Missoula, Montana, USA
| | | |
Collapse
|
79
|
Background-dependent effects of polyglutamine variation in the Arabidopsis thaliana gene ELF3. Proc Natl Acad Sci U S A 2012; 109:19363-7. [PMID: 23129635 DOI: 10.1073/pnas.1211021109] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Tandem repeats (TRs) have extremely high mutation rates and are often considered to be neutrally evolving DNA. However, in coding regions, TR copy number mutations can significantly affect phenotype and may facilitate rapid adaptation to new environments. In several human genes, TR copy number mutations that expand polyglutamine (polyQ) tracts beyond a certain threshold cause incurable neurodegenerative diseases. PolyQ-containing proteins exist at a considerable frequency in eukaryotes, yet the phenotypic consequences of natural variation in polyQ tracts that are not associated with disease remain largely unknown. Here, we use Arabidopsis thaliana to dissect the phenotypic consequences of natural variation in the polyQ tract encoded by EARLY FLOWERING 3 (ELF3), a key developmental gene. Changing ELF3 polyQ tract length affected complex ELF3-dependent phenotypes in a striking and nonlinear manner. Some natural ELF3 polyQ variants phenocopied elf3 loss-of-function mutants in a common reference background, although they are functional in their native genetic backgrounds. To test the existence of background-specific modifiers, we compared the phenotypic effects of ELF3 polyQ variants between two divergent backgrounds, Col and Ws, and found dramatic differences. In fact, the Col-ELF3 allele, encoding the shortest known ELF3 polyQ tract, was haploinsufficient in Ws × Col F(1) hybrids. Our data support a model in which variable polyQ tracts drive adaptation to internal genetic environments.
Collapse
|
80
|
Radó-Trilla N, Albà M. Dissecting the role of low-complexity regions in the evolution of vertebrate proteins. BMC Evol Biol 2012; 12:155. [PMID: 22920595 PMCID: PMC3523016 DOI: 10.1186/1471-2148-12-155] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2012] [Accepted: 07/30/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Low-complexity regions (LCRs) in proteins are tracts that are highly enriched in one or a few amino acids. Given their high abundance, and their capacity to expand in relatively short periods of time through replication slippage, they can greatly contribute to increase protein sequence space and generate novel protein functions. However, little is known about the global impact of LCRs on protein evolution. RESULTS We have traced back the evolutionary history of 2,802 LCRs from a large set of homologous protein families from H.sapiens, M.musculus, G.gallus, D.rerio and C.intestinalis. Transcriptional factors and other regulatory functions are overrepresented in proteins containing LCRs. We have found that the gain of novel LCRs is frequently associated with repeat expansion whereas the loss of LCRs is more often due to accumulation of amino acid substitutions as opposed to deletions. This dichotomy results in net protein sequence gain over time. We have detected a significant increase in the rate of accumulation of novel LCRs in the ancestral Amniota and mammalian branches, and a reduction in the chicken branch. Alanine and/or glycine-rich LCRs are overrepresented in recently emerged LCR sets from all branches, suggesting that their expansion is better tolerated than for other LCR types. LCRs enriched in positively charged amino acids show the contrary pattern, indicating an important effect of purifying selection in their maintenance. CONCLUSION We have performed the first large-scale study on the evolutionary dynamics of LCRs in protein families. The study has shown that the composition of an LCR is an important determinant of its evolutionary pattern.
Collapse
Affiliation(s)
- Núria Radó-Trilla
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics - IMIM Hospital del Mar Research Institute, Universitat Pompeu Fabra, Dr. Aiguader 88, Barcelona 08003, Spain
| | | |
Collapse
|
81
|
Affiliation(s)
- Julien Jorda
- Centre de Recherches de Biochimie Macromoléculaire UMR 5237, CNRS; University of Montpellier; 1 and 2 Montpellier France
- UCLA-DOE Institute for Genomics and Proteomics; Los Angeles CA USA
| | - Thierry Baudrand
- Centre de Recherches de Biochimie Macromoléculaire UMR 5237, CNRS; University of Montpellier; 1 and 2 Montpellier France
| | - Andrey V. Kajava
- Centre de Recherches de Biochimie Macromoléculaire UMR 5237, CNRS; University of Montpellier; 1 and 2 Montpellier France
| |
Collapse
|
82
|
Lobanov MY, Bogatyreva NS, Galzitskaya OV. Occurrence of six-amino-acid motifs in three eukaryotic proteomes. Mol Biol 2012. [DOI: 10.1134/s0026893312010128] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
83
|
Ramazzotti M, Monsellier E, Kamoun C, Degl'Innocenti D, Melki R. Polyglutamine repeats are associated to specific sequence biases that are conserved among eukaryotes. PLoS One 2012; 7:e30824. [PMID: 22312432 PMCID: PMC3270027 DOI: 10.1371/journal.pone.0030824] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2011] [Accepted: 12/23/2011] [Indexed: 12/20/2022] Open
Abstract
Nine human neurodegenerative diseases, including Huntington's disease and several spinocerebellar ataxia, are associated to the aggregation of proteins comprising an extended tract of consecutive glutamine residues (polyQs) once it exceeds a certain length threshold. This event is believed to be the consequence of the expansion of polyCAG codons during the replication process. This is in apparent contradiction with the fact that many polyQs-containing proteins remain soluble and are encoded by invariant genes in a number of eukaryotes. The latter suggests that polyQs expansion and/or aggregation might be counter-selected through a genetic and/or protein context. To identify this context, we designed a software that scrutinize entire proteomes in search for imperfect polyQs. The nature of residues flanking the polyQs and that of residues other than Gln within polyQs (insertions) were assessed. We discovered strong amino acid residue biases robustly associated to polyQs in the 15 eukaryotic proteomes we examined, with an over-representation of Pro, Leu and His and an under-representation of Asp, Cys and Gly amino acid residues. These biases are conserved amongst unrelated proteins and are independent of specific functional classes. Our findings suggest that specific residues have been co-selected with polyQs during evolution. We discuss the possible selective pressures responsible of the observed biases.
Collapse
Affiliation(s)
- Matteo Ramazzotti
- Dipartimento di Scienze Biochimiche, Università degli Studi di Firenze, Florence, Italy
- * E-mail: (MR); (EM)
| | - Elodie Monsellier
- Laboratoire d'Enzymologie et de Biochimie Structurales, UPR 3082 CNRS, Gif sur Yvette, France
- * E-mail: (MR); (EM)
| | - Choumouss Kamoun
- Laboratoire d'Enzymologie et de Biochimie Structurales, UPR 3082 CNRS, Gif sur Yvette, France
| | | | - Ronald Melki
- Laboratoire d'Enzymologie et de Biochimie Structurales, UPR 3082 CNRS, Gif sur Yvette, France
| |
Collapse
|
84
|
Schaefer MH, Wanker EE, Andrade-Navarro MA. Evolution and function of CAG/polyglutamine repeats in protein-protein interaction networks. Nucleic Acids Res 2012; 40:4273-87. [PMID: 22287626 PMCID: PMC3378862 DOI: 10.1093/nar/gks011] [Citation(s) in RCA: 151] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Expanded runs of consecutive trinucleotide CAG repeats encoding polyglutamine (polyQ) stretches are observed in the genes of a large number of patients with different genetic diseases such as Huntington's and several Ataxias. Protein aggregation, which is a key feature of most of these diseases, is thought to be triggered by these expanded polyQ sequences in disease-related proteins. However, polyQ tracts are a normal feature of many human proteins, suggesting that they have an important cellular function. To clarify the potential function of polyQ repeats in biological systems, we systematically analyzed available information stored in sequence and protein interaction databases. By integrating genomic, phylogenetic, protein interaction network and functional information, we obtained evidence that polyQ tracts in proteins stabilize protein interactions. This happens most likely through structural changes whereby the polyQ sequence extends a neighboring coiled-coil region to facilitate its interaction with a coiled-coil region in another protein. Alteration of this important biological function due to polyQ expansion results in gain of abnormal interactions, leading to pathological effects like protein aggregation. Our analyses suggest that research on polyQ proteins should shift focus from expanded polyQ proteins into the characterization of the influence of the wild-type polyQ on protein interactions.
Collapse
Affiliation(s)
- Martin H. Schaefer
- Computational Biology and Data Mining and Neuroproteomics, Max Delbrück Center for Molecular Medicine, Robert-Rössle-Strasse 10, 13125 Berlin, Germany
| | - Erich E. Wanker
- Computational Biology and Data Mining and Neuroproteomics, Max Delbrück Center for Molecular Medicine, Robert-Rössle-Strasse 10, 13125 Berlin, Germany
| | - Miguel A. Andrade-Navarro
- Computational Biology and Data Mining and Neuroproteomics, Max Delbrück Center for Molecular Medicine, Robert-Rössle-Strasse 10, 13125 Berlin, Germany
- *To whom correspondence should be addressed. Tel: +49 30 9406 4250; Fax: +49 30 9406 4240;
| |
Collapse
|
85
|
Lobanov MY, Galzitskaya OV. Occurrence of disordered patterns and homorepeats in eukaryotic and bacterial proteomes. ACTA ACUST UNITED AC 2012; 8:327-37. [DOI: 10.1039/c1mb05318c] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
86
|
Faux N. Single amino acid and trinucleotide repeats: function and evolution. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2012; 769:26-40. [PMID: 23560303 DOI: 10.1007/978-1-4614-5434-2_3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The most well known effect of single amino acid repeat expansion, beyond a certain threshold, is the development of a specific disease, depending on the protein in which the expansion has occurred. For example, the expansion of the glutamine repeat in huntingtin leads to the debilitating neurodegenerative disease, Huntington's disease. Similarly, there are a range of other disorders caused by trinucleotide repeat expansions encoding polyglutamine or polyalanine tracts. The age of onset of the polyglutamine-induced neurodegenerative diseases is usually negatively correlated with the length of expanded CAG/glutamine repeat. However, recent studies have given evidence that single amino acid repeats may also play critical roles in normal protein function and that changes in the length of single amino acid repeats is likely to play a beneficial role in evolution. This chapter will look at the prevalence, function and possible role single amino acid repeats have in evolution and other biological processes.
Collapse
Affiliation(s)
- Noel Faux
- Mental Health Research Institute, The University of Melbourne, Parkville, Victoria, Australia.
| |
Collapse
|
87
|
Zhou Y, Liu J, Han L, Li ZG, Zhang Z. Comprehensive analysis of tandem amino acid repeats from ten angiosperm genomes. BMC Genomics 2011; 12:632. [PMID: 22195734 PMCID: PMC3283746 DOI: 10.1186/1471-2164-12-632] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2011] [Accepted: 12/23/2011] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The presence of tandem amino acid repeats (AARs) is one of the signatures of eukaryotic proteins. AARs were thought to be frequently involved in bio-molecular interactions. Comprehensive studies that primarily focused on metazoan AARs have suggested that AARs are evolving rapidly and are highly variable among species. However, there is still controversy over causal factors of this inter-species variation. In this work, we attempted to investigate this topic mainly by comparing AARs in orthologous proteins from ten angiosperm genomes. RESULTS Angiosperm AAR content is positively correlated with the GC content of the protein coding sequence. However, based on observations from fungal AARs and insect AARs, we argue that the applicability of this kind of correlation is limited by AAR residue composition and species' life history traits. Angiosperm AARs also tend to be fast evolving and structurally disordered, supporting the results of comprehensive analyses of metazoans. The functions of conserved long AARs are summarized. Finally, we propose that the rapid mRNA decay rate, alternative splicing and tissue specificity are regulatory processes that are associated with angiosperm proteins harboring AARs. CONCLUSIONS Our investigation suggests that GC content is a predictor of AAR content in the protein coding sequence under certain conditions. Although angiosperm AARs lack conservation and 3D structure, a fraction of the proteins that contain AARs may be functionally important and are under extensive regulation in plant cells.
Collapse
Affiliation(s)
- Yuan Zhou
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Jing Liu
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Lei Han
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Zhi-Gang Li
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|
88
|
Laurie S, Toll-Riera M, Radó-Trilla N, Albà MM. Sequence shortening in the rodent ancestor. Genome Res 2011; 22:478-85. [PMID: 22128134 DOI: 10.1101/gr.121897.111] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Insertions and deletions (indels), together with nucleotide substitutions, are major drivers of sequence evolution. An excess of deletions over insertions in genomic sequences-the so-called deletional bias-has been reported in a wide range of species, including mammals. However, this bias has not been found in the coding sequences of some mammalian species, such as human and mouse. To determine the strength of the deletional bias in mammals, and the influence of mutation and selection, we have quantified indels in both neutrally evolving noncoding sequences and protein-coding sequences, in six mammalian branches: human, macaque, ancestral primate, mouse, rat, and ancestral rodent. The results obtained with an improved algorithm for the placement of insertions in multiple alignments, Prank(+F), indicate that contrary to previous results, the only mammalian branch with a strong deletional bias is the rodent ancestral branch. We estimate that such a bias has resulted in an ~2.5% sequence loss of mammalian syntenic region in the ancestor of the mouse and rat. Further, a comparison of coding and noncoding sequences shows that negative selection is acting more strongly against mutations generating amino acid insertions than against mutations resulting in amino acid deletions. The strength of selection against indels is found to be higher in the rodent branches than in the primate branches, consistent with the larger effective population sizes of the rodents.
Collapse
Affiliation(s)
- Steve Laurie
- Evolutionary Genomics Group, Pompeu Fabra University (UPF) and Municipal Institute of Medical Research (FIMIM), Barcelona, Spain
| | | | | | | |
Collapse
|
89
|
Luo H, Lin K, David A, Nijveen H, Leunissen JAM. ProRepeat: an integrated repository for studying amino acid tandem repeats in proteins. Nucleic Acids Res 2011; 40:D394-9. [PMID: 22102581 PMCID: PMC3245022 DOI: 10.1093/nar/gkr1019] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
ProRepeat (http://prorepeat.bioinformatics.nl/) is an integrated curated repository and analysis platform for in-depth research on the biological characteristics of amino acid tandem repeats. ProRepeat collects repeats from all proteins included in the UniProt knowledgebase, together with 85 completely sequenced eukaryotic proteomes contained within the RefSeq collection. It contains non-redundant perfect tandem repeats, approximate tandem repeats and simple, low-complexity sequences, covering the majority of the amino acid tandem repeat patterns found in proteins. The ProRepeat web interface allows querying the repeat database using repeat characteristics like repeat unit and length, number of repetitions of the repeat unit and position of the repeat in the protein. Users can also search for repeats by the characteristics of repeat containing proteins, such as entry ID, protein description, sequence length, gene name and taxon. ProRepeat offers powerful analysis tools for finding biological interesting properties of repeats, such as the strong position bias of leucine repeats in the N-terminus of eukaryotic protein sequences, the differences of repeat abundance among proteomes, the functional classification of repeat containing proteins and GC content constrains of repeats’ corresponding codons.
Collapse
Affiliation(s)
- Hong Luo
- Laboratory of Bioinformatics, Wageningen University and Research Centre, PO Box 569, 6700 AN Wageningen, Netherlands
| | | | | | | | | |
Collapse
|
90
|
Toll-Riera M, Radó-Trilla N, Martys F, Albà MM. Role of low-complexity sequences in the formation of novel protein coding sequences. Mol Biol Evol 2011; 29:883-6. [PMID: 22045997 DOI: 10.1093/molbev/msr263] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Low-complexity sequences are extremely abundant in eukaryotic proteins for reasons that remain unclear. One hypothesis is that they contribute to the formation of novel coding sequences, facilitating the generation of novel protein functions. Here, we test this hypothesis by examining the content of low-complexity sequences in proteins of different age. We show that recently emerged proteins contain more low-complexity sequences than older proteins and that these sequences often form functional domains. These data are consistent with the idea that low-complexity sequences may play a key role in the emergence of novel genes.
Collapse
Affiliation(s)
- Macarena Toll-Riera
- Evolutionary Genomics Group, Research Programme in Biomedical Informatics, Universitat Pompeu Fabra (UPF)-Institute Municipal d'Investigació Mèdica (IMIM), Barcelona, Spain
| | | | | | | |
Collapse
|
91
|
Behura SK, Haugen M, Flannery E, Sarro J, Tessier CR, Severson DW, Duman-Scheel M. Comparative genomic analysis of Drosophila melanogaster and vector mosquito developmental genes. PLoS One 2011; 6:e21504. [PMID: 21754989 PMCID: PMC3130749 DOI: 10.1371/journal.pone.0021504] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2011] [Accepted: 05/30/2011] [Indexed: 11/18/2022] Open
Abstract
Genome sequencing projects have presented the opportunity for analysis of developmental genes in three vector mosquito species: Aedes aegypti, Culex quinquefasciatus, and Anopheles gambiae. A comparative genomic analysis of developmental genes in Drosophila melanogaster and these three important vectors of human disease was performed in this investigation. While the study was comprehensive, special emphasis centered on genes that 1) are components of developmental signaling pathways, 2) regulate fundamental developmental processes, 3) are critical for the development of tissues of vector importance, 4) function in developmental processes known to have diverged within insects, and 5) encode microRNAs (miRNAs) that regulate developmental transcripts in Drosophila. While most fruit fly developmental genes are conserved in the three vector mosquito species, several genes known to be critical for Drosophila development were not identified in one or more mosquito genomes. In other cases, mosquito lineage-specific gene gains with respect to D. melanogaster were noted. Sequence analyses also revealed that numerous repetitive sequences are a common structural feature of Drosophila and mosquito developmental genes. Finally, analysis of predicted miRNA binding sites in fruit fly and mosquito developmental genes suggests that the repertoire of developmental genes targeted by miRNAs is species-specific. The results of this study provide insight into the evolution of developmental genes and processes in dipterans and other arthropods, serve as a resource for those pursuing analysis of mosquito development, and will promote the design and refinement of functional analysis experiments.
Collapse
Affiliation(s)
- Susanta K. Behura
- Department of Biological Sciences and Eck Institute for Global Health, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Morgan Haugen
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, South Bend, Indiana, United States of America
| | - Ellen Flannery
- Department of Biological Sciences and Eck Institute for Global Health, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Joseph Sarro
- Department of Biological Sciences and Eck Institute for Global Health, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Charles R. Tessier
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, South Bend, Indiana, United States of America
| | - David W. Severson
- Department of Biological Sciences and Eck Institute for Global Health, University of Notre Dame, Notre Dame, Indiana, United States of America
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, South Bend, Indiana, United States of America
| | - Molly Duman-Scheel
- Department of Biological Sciences and Eck Institute for Global Health, University of Notre Dame, Notre Dame, Indiana, United States of America
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, South Bend, Indiana, United States of America
- * E-mail:
| |
Collapse
|
92
|
Łabaj PP, Sykacek P, Kreil DP. An analysis of single amino acid repeats as use case for application specific background models. BMC Bioinformatics 2011; 12:173. [PMID: 21595908 PMCID: PMC3124433 DOI: 10.1186/1471-2105-12-173] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 05/19/2011] [Indexed: 11/30/2022] Open
Abstract
Background Sequence analysis aims to identify biologically relevant signals against a backdrop of functionally meaningless variation. Increasingly, it is recognized that the quality of the background model directly affects the performance of analyses. State-of-the-art approaches rely on classical sequence models that are adapted to the studied dataset. Although performing well in the analysis of globular protein domains, these models break down in regions of stronger compositional bias or low complexity. While these regions are typically filtered, there is increasing anecdotal evidence of functional roles. This motivates an exploration of more complex sequence models and application-specific approaches for the investigation of biased regions. Results Traditional Markov-chains and application-specific regression models are compared using the example of predicting runs of single amino acids, a particularly simple class of biased regions. Cross-fold validation experiments reveal that the alternative regression models capture the multi-variate trends well, despite their low dimensionality and in contrast even to higher-order Markov-predictors. We show how the significance of unusual observations can be computed for such empirical models. The power of a dedicated model in the detection of biologically interesting signals is then demonstrated in an analysis identifying the unexpected enrichment of contiguous leucine-repeats in signal-peptides. Considering different reference sets, we show how the question examined actually defines what constitutes the 'background'. Results can thus be highly sensitive to the choice of appropriate model training sets. Conversely, the choice of reference data determines the questions that can be investigated in an analysis. Conclusions Using a specific case of studying biased regions as an example, we have demonstrated that the construction of application-specific background models is both necessary and feasible in a challenging sequence analysis situation.
Collapse
Affiliation(s)
- Paweł P Łabaj
- Chair of Bioinformatics, Boku University Vienna, Muthgasse 18, 1190 Vienna, Austria.
| | | | | |
Collapse
|
93
|
Jorda J, Kajava AV. Protein homorepeats sequences, structures, evolution, and functions. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2011; 79:59-88. [PMID: 20621281 DOI: 10.1016/s1876-1623(10)79002-7] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The vast majority of protein sequences are aperiodic; they do not have any strong bias in the amino acid composition, and they use a subtle mixture of all or most of the 20 amino acid residues to code a great number of various structures and functions. In this context, homorepeats, runs of a single amino acid residue, represent unusual, eye-catching motifs in proteins. Despite the sequence simplicity and relatively small size, the homorepeat runs have a strong potential for molecular interactions due to the excessively high local concentration of a certain physico-chemical property. Appearance of such runs within proteins may give them new structural and functional features. An increasing number of studies demonstrate the abundance of these motifs in proteins, their important roles in biological processes, and their link to a number of hereditary and age-related diseases. In this chapter, we summarize data on the distribution of homorepeats in proteomes and on their structural properties, evolution, and functions.
Collapse
Affiliation(s)
- Julien Jorda
- Centre de Recherches de Biochimie Macromoléculaire UMR 5237, CNRS, University of Montpellier 1 and 2, Montpellier, France
| | | |
Collapse
|
94
|
Haerty W, Golding GB. Low-complexity sequences and single amino acid repeats: not just "junk" peptide sequences. Genome 2011; 53:753-62. [PMID: 20962881 DOI: 10.1139/g10-063] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
For decades proteins were thought to interact in a "lock and key" system, which led to the definition of a paradigm linking stable three-dimensional structure to biological function. As a consequence, any non-structured peptide was considered to be nonfunctional and to evolve neutrally. Surprisingly, the most commonly shared peptides between eukaryotic proteomes are low-complexity sequences that in most conditions do not present a stable three-dimensional structure. However, because these sequences evolve rapidly and because the size variation of a few of them can have deleterious effects, low-complexity sequences have been suggested to be the target of selection. Here we review evidence that supports the idea that these simple sequences should not be considered just "junk" peptides and that selection drives the evolution of many of them.
Collapse
Affiliation(s)
- Wilfried Haerty
- Biology Department, McMaster University, Hamilton, ON, Canada
| | | |
Collapse
|
95
|
Role of Everlasting Triplet Expansions in Protein Evolution. J Mol Evol 2010; 72:232-9. [DOI: 10.1007/s00239-010-9425-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2010] [Accepted: 12/01/2010] [Indexed: 02/05/2023]
|
96
|
Tian X, Strassmann JE, Queller DC. Genome nucleotide composition shapes variation in simple sequence repeats. Mol Biol Evol 2010; 28:899-909. [PMID: 20943830 DOI: 10.1093/molbev/msq266] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Simple sequence repeats (SSRs) or microsatellites are a common component of genomes but vary greatly across species in their abundance. We tested the hypothesis that this variation is due in part to AT/GC content of genomes, with genomes biased toward either high AT or high CG generating more short random repeats that are long enough to enhance expansion through slippage during replication. To test this hypothesis, we identified repeats with perfect tandem iterations of 1-6 bp from 25 protists with complete or near-complete genome sequences. As expected, the density and the frequency are highly related to genome AT content, with excellent fits to quadratic regressions with minima near a 50% AT content and rising toward both extremes. Within species, the same trends hold, except the limited variation in AT content within each species places each mainly on the descending (GC rich), middle, or ascending (AT rich) part of the curve. The base usages of repeat motifs are also significantly correlated with genome nucleotide compositions: Percentages of AT-rich motifs rise with the increase of genome AT content but vice versa for GC-rich subgroups. Amino acid homopolymer repeats also show the expected quadratic relationship, with higher abundance in species with AT content biased in either direction. Our results show that genome nucleotide composition explains up to half of the variance in the abundance and motif constitution of SSRs.
Collapse
Affiliation(s)
- Xiangjun Tian
- Department of Ecology and Evolutionary Biology, Rice University, USA
| | | | | |
Collapse
|
97
|
Kaundal R, Saini R, Zhao PX. Combining machine learning and homology-based approaches to accurately predict subcellular localization in Arabidopsis. PLANT PHYSIOLOGY 2010; 154:36-54. [PMID: 20647376 PMCID: PMC2938157 DOI: 10.1104/pp.110.156851] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2010] [Accepted: 07/13/2010] [Indexed: 05/20/2023]
Abstract
A complete map of the Arabidopsis (Arabidopsis thaliana) proteome is clearly a major goal for the plant research community in terms of determining the function and regulation of each encoded protein. Developing genome-wide prediction tools such as for localizing gene products at the subcellular level will substantially advance Arabidopsis gene annotation. To this end, we performed a comprehensive study in Arabidopsis and created an integrative support vector machine-based localization predictor called AtSubP (for Arabidopsis subcellular localization predictor) that is based on the combinatorial presence of diverse protein features, such as its amino acid composition, sequence-order effects, terminal information, Position-Specific Scoring Matrix, and similarity search-based Position-Specific Iterated-Basic Local Alignment Search Tool information. When used to predict seven subcellular compartments through a 5-fold cross-validation test, our hybrid-based best classifier achieved an overall sensitivity of 91% with high-confidence precision and Matthews correlation coefficient values of 90.9% and 0.89, respectively. Benchmarking AtSubP on two independent data sets, one from Swiss-Prot and another containing green fluorescent protein- and mass spectrometry-determined proteins, showed a significant improvement in the prediction accuracy of species-specific AtSubP over some widely used "general" tools such as TargetP, LOCtree, PA-SUB, MultiLoc, WoLF PSORT, Plant-PLoc, and our newly created All-Plant method. Cross-comparison of AtSubP on six nontrained eukaryotic organisms (rice [Oryza sativa], soybean [Glycine max], human [Homo sapiens], yeast [Saccharomyces cerevisiae], fruit fly [Drosophila melanogaster], and worm [Caenorhabditis elegans]) revealed inferior predictions. AtSubP significantly outperformed all the prediction tools being currently used for Arabidopsis proteome annotation and, therefore, may serve as a better complement for the plant research community. A supplemental Web site that hosts all the training/testing data sets and whole proteome predictions is available at http://bioinfo3.noble.org/AtSubP/.
Collapse
|
98
|
Francis DM, Page R. Strategies to optimize protein expression in E. coli. CURRENT PROTOCOLS IN PROTEIN SCIENCE 2010; Chapter 5:5.24.1-5.24.29. [PMID: 20814932 PMCID: PMC7162232 DOI: 10.1002/0471140864.ps0524s61] [Citation(s) in RCA: 109] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Recombinant protein expression in Escherichia coli (E. coli) is simple, fast, inexpensive, and robust, with the expressed protein comprising up to 50 percent of the total cellular protein. However, it also has disadvantages. For example, the rapidity of bacterial protein expression often results in unfolded/misfolded proteins, especially for heterologous proteins that require longer times and/or molecular chaperones to fold correctly. In addition, the highly reductive environment of the bacterial cytosol and the inability of E. coli to perform several eukaryotic post-translational modifications results in the insoluble expression of proteins that require these modifications for folding and activity. Fortunately, multiple, novel reagents and techniques have been developed that allow for the efficient, soluble production of a diverse range of heterologous proteins in E. coli. This overview describes variables at each stage of a protein expression experiment that can influence solubility and offers a summary of strategies used to optimize soluble expression in E. coli.
Collapse
|
99
|
Łabaj PP, Leparc GG, Bardet AF, Kreil G, Kreil DP. Single amino acid repeats in signal peptides. FEBS J 2010; 277:3147-57. [DOI: 10.1111/j.1742-4658.2010.07720.x] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
100
|
Mojsin M, Kovacevic-Grujicic N, Krstic A, Popovic J, Milivojevic M, Stevanovic M. Comparative analysis of SOX3 protein orthologs: Expansion of homopolymeric amino acid tracts during vertebrate evolution. Biochem Genet 2010; 48:612-23. [PMID: 20495863 DOI: 10.1007/s10528-010-9343-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2009] [Accepted: 01/25/2010] [Indexed: 10/19/2022]
Abstract
To understand more fully the structure and evolution of the SOX3 protein, we comparatively analyzed its orthologs in vertebrates. Since complex disorders are associated with human SOX3 polyalanine expansions, our investigation focused on both compositional and evolutionary analysis of various homopolymeric amino acid tracts observed in SOX3 orthologs. Our analysis revealed that the observed homopolymeric alanine, glycine, and proline tracts are mammal-specific, except for one polyglycine tract present in birds. Since it is likely that the SOX3 protein acquired additional roles in brain development in Eutheria, we might speculate that development of novel brain functions during the course of evolution was affected, at least in part, by such structural-functional changes in the SOX3 protein.
Collapse
Affiliation(s)
- Marija Mojsin
- Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, Serbia
| | | | | | | | | | | |
Collapse
|