1
|
Teekas L, Sharma S, Vijay N. Terminal regions of a protein are a hotspot for low complexity regions and selection. Open Biol 2024; 14:230439. [PMID: 38862022 DOI: 10.1098/rsob.230439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 05/13/2024] [Indexed: 06/13/2024] Open
Abstract
Volatile low complexity regions (LCRs) are a novel source of adaptive variation, functional diversification and evolutionary novelty. An interplay of selection and mutation governs the composition and length of low complexity regions. High %GC and mutations provide length variability because of mechanisms like replication slippage. Owing to the complex dynamics between selection and mutation, we need a better understanding of their coexistence. Our findings underscore that positively selected sites (PSS) and low complexity regions prefer the terminal regions of genes, co-occurring in most Tetrapoda clades. We observed that positively selected sites within a gene have position-specific roles. Central-positively selected site genes primarily participate in defence responses, whereas terminal-positively selected site genes exhibit non-specific functions. Low complexity region-containing genes in the Tetrapoda clade exhibit a significantly higher %GC and lower ω (dN/dS: non-synonymous substitution rate/synonymous substitution rate) compared with genes without low complexity regions. This lower ω implies that despite providing rapid functional diversity, low complexity region-containing genes are subjected to intense purifying selection. Furthermore, we observe that low complexity regions consistently display ubiquitous prevalence at lower purity levels, but exhibit a preference for specific positions within a gene as the purity of the low complexity region stretch increases, implying a composition-dependent evolutionary role. Our findings collectively contribute to the understanding of how genetic diversity and adaptation are shaped by the interplay of selection and low complexity regions in the Tetrapoda clade.
Collapse
Affiliation(s)
- Lokdeep Teekas
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| | - Sandhya Sharma
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| | - Nagarjun Vijay
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| |
Collapse
|
2
|
White LJ, Russell AJ, Pizzey AR, Dasmahapatra KK, Pownall ME. The Presence of Two MyoD Genes in a Subset of Acanthopterygii Fish Is Associated with a Polyserine Insert in MyoD1. J Dev Biol 2023; 11:jdb11020019. [PMID: 37218813 DOI: 10.3390/jdb11020019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 04/20/2023] [Accepted: 04/26/2023] [Indexed: 05/24/2023] Open
Abstract
The MyoD gene was duplicated during the teleost whole genome duplication and, while a second MyoD gene (MyoD2) was subsequently lost from the genomes of some lineages (including zebrafish), many fish lineages (including Alcolapia species) have retained both MyoD paralogues. Here we reveal the expression patterns of the two MyoD genes in Oreochromis (Alcolapia) alcalica using in situ hybridisation. We report our analysis of MyoD1 and MyoD2 protein sequences from 54 teleost species, and show that O. alcalica, along with some other teleosts, include a polyserine repeat between the amino terminal transactivation domains (TAD) and the cysteine-histidine rich region (H/C) in MyoD1. The evolutionary history of MyoD1 and MyoD2 is compared to the presence of this polyserine region using phylogenetics, and its functional relevance is tested using overexpression in a heterologous system to investigate subcellular localisation, stability, and activity of MyoD proteins that include and do not include the polyserine region.
Collapse
Affiliation(s)
- Lewis J White
- Biology Department, University of York, York YO10 5DD, UK
| | | | | | | | - Mary E Pownall
- Biology Department, University of York, York YO10 5DD, UK
| |
Collapse
|
3
|
Ntountoumi C, Vlastaridis P, Mossialos D, Stathopoulos C, Iliopoulos I, Promponas V, Oliver SG, Amoutzias GD. Low complexity regions in the proteins of prokaryotes perform important functional roles and are highly conserved. Nucleic Acids Res 2019; 47:9998-10009. [PMID: 31504783 PMCID: PMC6821194 DOI: 10.1093/nar/gkz730] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 07/16/2019] [Accepted: 08/15/2019] [Indexed: 01/27/2023] Open
Abstract
We provide the first high-throughput analysis of the properties and functional role of Low Complexity Regions (LCRs) in more than 1500 prokaryotic and phage proteomes. We observe that, contrary to a widespread belief based on older and sparse data, LCRs actually have a significant, persistent and highly conserved presence and role in many and diverse prokaryotes. Their specific amino acid content is linked to proteins with certain molecular functions, such as the binding of RNA, DNA, metal-ions and polysaccharides. In addition, LCRs have been repeatedly identified in very ancient, and usually highly expressed proteins of the translation machinery. At last, based on the amino acid content enriched in certain categories, we have developed a neural network web server to identify LCRs and accurately predict whether they can bind nucleic acids, metal-ions or are involved in chaperone functions. An evaluation of the tool showed that it is highly accurate for eukaryotic proteins as well.
Collapse
Affiliation(s)
- Chrysa Ntountoumi
- Bioinformatics Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, 41500, Greece
| | - Panayotis Vlastaridis
- Bioinformatics Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, 41500, Greece
| | - Dimitris Mossialos
- Microbial Biotechnology-Molecular Bacteriology-Virology Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, 41500, Greece
| | | | | | - Vasilios Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, New Campus, University of Cyprus, PO Box 20537, CY-1678 Nicosia, Cyprus
| | - Stephen G Oliver
- Cambridge Systems Biology Centre & Department of Biochemistry, University of Cambridge, CB2 1GA, UK
| | - Grigoris D Amoutzias
- Bioinformatics Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, 41500, Greece
| |
Collapse
|
4
|
Shimada MK, Sanbonmatsu R, Yamaguchi-Kabata Y, Yamasaki C, Suzuki Y, Chakraborty R, Gojobori T, Imanishi T. Selection pressure on human STR loci and its relevance in repeat expansion disease. Mol Genet Genomics 2016; 291:1851-69. [PMID: 27290643 DOI: 10.1007/s00438-016-1219-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2015] [Accepted: 05/21/2016] [Indexed: 12/30/2022]
Abstract
Short Tandem Repeats (STRs) comprise repeats of one to several base pairs. Because of the high mutability due to strand slippage during DNA synthesis, rapid evolutionary change in the number of repeating units directly shapes the range of repeat-number variation according to selection pressure. However, the remaining questions include: Why are STRs causing repeat expansion diseases maintained in the human population; and why are these limited to neurodegenerative diseases? By evaluating the genome-wide selection pressure on STRs using the database we constructed, we identified two different patterns of relationship in repeat-number polymorphisms between DNA and amino-acid sequences, although both patterns are evolutionary consequences of avoiding the formation of harmful long STRs. First, a mixture of degenerate codons is represented in poly-proline (poly-P) repeats. Second, long poly-glutamine (poly-Q) repeats are favored at the protein level; however, at the DNA level, STRs encoding long poly-Qs are frequently divided by synonymous SNPs. Furthermore, significant enrichments of apoptosis and neurodevelopment were biological processes found specifically in genes encoding poly-Qs with repeat polymorphism. This suggests the existence of a specific molecular function for polymorphic and/or long poly-Q stretches. Given that the poly-Qs causing expansion diseases were longer than other poly-Qs, even in healthy subjects, our results indicate that the evolutionary benefits of long and/or polymorphic poly-Q stretches outweigh the risks of long CAG repeats predisposing to pathological hyper-expansions. Molecular pathways in neurodevelopment requiring long and polymorphic poly-Q stretches may provide a clue to understanding why poly-Q expansion diseases are limited to neurodegenerative diseases.
Collapse
Affiliation(s)
- Makoto K Shimada
- Institute for Comprehensive Medical Science, Fujita Health University, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake, Aichi, 470-1192, Japan. .,National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan. .,Japan Biological Informatics Consortium, 10F TIME24 Building, 2-4-32 Aomi, Koto-ku, Tokyo, 135-8073, Japan.
| | - Ryoko Sanbonmatsu
- Japan Biological Informatics Consortium, 10F TIME24 Building, 2-4-32 Aomi, Koto-ku, Tokyo, 135-8073, Japan
| | - Yumi Yamaguchi-Kabata
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Tohoku Medical Megabank Organization, Tohoku University, 2-1 Seiryo-machi, Aoba-ku, Sendai, 980-8573, Japan
| | - Chisato Yamasaki
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Japan Biological Informatics Consortium, 10F TIME24 Building, 2-4-32 Aomi, Koto-ku, Tokyo, 135-8073, Japan
| | - Yoshiyuki Suzuki
- Graduate School of Natural Sciences, Nagoya City University, 1 Yamanohata, Mizuho-cho, Mizuho-ku, Nagoya, Aichi, 467-8501, Japan
| | - Ranajit Chakraborty
- Health Science Center, University of North Texas, 3500 Camp Bowie Blvd., Fort Worth, TX, 76107, USA
| | - Takashi Gojobori
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Computational Bioscience Research Center, King Abdullah University of Science and Technology, Ibn Al-Haytham Building (West), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Tadashi Imanishi
- National Institute of Advanced Industrial Science and Technology, 2-3-26 Aomi Koto-ku, Tokyo, 135-0064, Japan.,Department of Molecular Life Science, Tokai University School of Medicine, 143 Shimokasuya, Isehara, Kanagawa, 259-1193, Japan
| |
Collapse
|
5
|
Wu R, Liu Q, Zhang P, Liang D. Tandem amino acid repeats in the green anole (Anolis carolinensis) and other squamates may have a role in increasing genetic variability. BMC Genomics 2016; 17:109. [PMID: 26868501 PMCID: PMC4751654 DOI: 10.1186/s12864-016-2430-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2015] [Accepted: 02/02/2016] [Indexed: 01/04/2023] Open
Abstract
Background Tandem amino acid repeats are characterised by the consecutive recurrence of a single amino acid. They exhibit high rates of length mutations in addition to point mutations and have been proposed to be involved in genetic plasticity. Squamate reptiles (lizards and snakes) diversify in both morphology and physiology. The underlying mechanism is yet to be understood. In a previous phylogenomic analysis of reptiles, the density of tandem repeats in an anole lizard diverged heavily from that of the other reptiles. To gain further insight into the tandem amino acid repeats in squamates, we analysed the repeat content in the green anole (Anolis carolinensis) proteome and compared the amino acid repeats in a large orthologous protein data set from six vertebrates (the Western clawed frog, the green anole, the Chinese softshell turtle, the zebra finch, mouse and human). Results Our results revealed that the number of amino acid repeats in the green anole exceeded those found in the other five species studied. Species-only repeats were found in high proportion in the green anole but not in the other five species, suggesting that the green anole had gained many amino acid repeats in either the Anolis or the squamate lineage. Since the amino acid repeat containing genes in the green anole were highly enriched in genes related to transcription and development, an important family of developmental genes, i.e., the Hox family, was further studied in a wide collection of squamates. Abundant amino acid repeats were also observed, implying the general high tolerance of amino acid repeats in squamates. A particular enrichment of amino acid repeats was observed in the central class Hox genes that are known to be responsible for defining cervical to lumbar regions. Conclusions Our study suggests that the abundant amino acid repeats in the green anole, and possibly in other squamates, may play a role in increasing the genetic variability, and contribute to the evolutionary diversity of this clade. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2430-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Riga Wu
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, People's Republic of China.
| | - Qingfeng Liu
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, People's Republic of China.
| | - Peng Zhang
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, People's Republic of China.
| | - Dan Liang
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, People's Republic of China.
| |
Collapse
|
6
|
Functional gene diversity and migration timing in reintroduced Chinook salmon. CONSERV GENET 2015. [DOI: 10.1007/s10592-015-0753-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
7
|
Wu LZ, Xu XY, Liu YF, Ge X, Wang XJ. Expansion of polyalanine tracts in the QA domain may play a critical role in the clavicular development of cleidocranial dysplasia. J Genet 2015; 94:551-3. [PMID: 26440098 DOI: 10.1007/s12041-015-0551-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Li-Zheng Wu
- State Key Laboratory of Military Stomatology, Department of Pediatric Dentistry, School of Stomatology, The Fourth Military Medical University, Xi'an, Shaanxi 710032, People's Republic of China.
| | | | | | | | | |
Collapse
|
8
|
Pelassa I, Corà D, Cesano F, Monje FJ, Montarolo PG, Fiumara F. Association of polyalanine and polyglutamine coiled coils mediates expansion disease-related protein aggregation and dysfunction. Hum Mol Genet 2014; 23:3402-20. [PMID: 24497578 PMCID: PMC4049302 DOI: 10.1093/hmg/ddu049] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The expansion of homopolymeric glutamine (polyQ) or alanine (polyA) repeats in certain proteins owing to genetic mutations induces protein aggregation and toxicity, causing at least 18 human diseases. PolyQ and polyA repeats can also associate in the same proteins, but the general extent of their association in proteomes is unknown. Furthermore, the structural mechanisms by which their expansion causes disease are not well understood, and these repeats are generally thought to misfold upon expansion into aggregation-prone β-sheet structures like amyloids. However, recent evidence indicates a critical role for coiled-coil (CC) structures in triggering aggregation and toxicity of polyQ-expanded proteins, raising the possibility that polyA repeats may as well form these structures, by themselves or in association with polyQ. We found through bioinformatics screenings that polyA, polyQ and polyQA repeats have a phylogenetically graded association in human and non-human proteomes and associate/overlap with CC domains. Circular dichroism and cross-linking experiments revealed that polyA repeats can form—alone or with polyQ and polyQA—CC structures that increase in stability with polyA length, forming higher-order multimers and polymers in vitro. Using structure-guided mutagenesis, we studied the relevance of polyA CCs to the in vivo aggregation and toxicity of RUNX2—a polyQ/polyA protein associated with cleidocranial dysplasia upon polyA expansion—and found that the stability of its polyQ/polyA CC controls its aggregation, localization and toxicity. These findings indicate that, like polyQ, polyA repeats form CC structures that can trigger protein aggregation and toxicity upon expansion in human genetic diseases.
Collapse
Affiliation(s)
| | - Davide Corà
- Center for Molecular Systems Biology, University of Torino, Torino 10123, Italy
| | - Federico Cesano
- Department of Chemistry, University of Torino, Torino 10125, Italy
| | - Francisco J. Monje
- Department of Neurophysiology and Neuropharmacology,Medical University of Vienna, Vienna 1090, Austria
| | - Pier Giorgio Montarolo
- Department of Neuroscience and
- National Institute of Neuroscience (INN), Torino 10125, Italy
| | - Ferdinando Fiumara
- Department of Neuroscience and
- To whom correspondence should be addressed at: Department of Neuroscience, University of Torino, Corso Raffaello 30, Torino 10125, Italy. Tel: +39-0116708486;
| |
Collapse
|
9
|
O'Malley KG, Jacobson DP, Kurth R, Dill AJ, Banks MA. Adaptive genetic markers discriminate migratory runs of Chinook salmon (Oncorhynchus tshawytscha) amid continued gene flow. Evol Appl 2013; 6:1184-94. [PMID: 24478800 PMCID: PMC3901548 DOI: 10.1111/eva.12095] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 07/08/2013] [Indexed: 12/12/2022] Open
Abstract
Neutral genetic markers are routinely used to define distinct units within species that warrant discrete management. Human-induced changes to gene flow however may reduce the power of such an approach. We tested the efficiency of adaptive versus neutral genetic markers in differentiating temporally divergent migratory runs of Chinook salmon (Oncorhynchus tshawytscha) amid high gene flow owing to artificial propagation and habitat alteration. We compared seven putative migration timing genes to ten microsatellite loci in delineating three migratory groups of Chinook in the Feather River, CA: offspring of fall-run hatchery broodstock that returned as adults to freshwater in fall (fall run), spring-run offspring that returned in spring (spring run), and fall-run offspring that returned in spring (FRS). We found evidence for significant differentiation between the fall and federally listed threatened spring groups based on divergence at three circadian clock genes (OtsClock1b, OmyFbxw11, and Omy1009UW), but not neutral markers. We thus demonstrate the importance of genetic marker choice in resolving complex life history types. These findings directly impact conservation management strategies and add to previous evidence from Pacific and Atlantic salmon indicating that circadian clock genes influence migration timing.
Collapse
Affiliation(s)
- Kathleen G O'Malley
- Department of Fisheries and Wildlife, Coastal Oregon Marine Experiment Station, Hatfield Marine Science Center, Oregon State University Newport, OR, USA
| | - Dave P Jacobson
- Department of Fisheries and Wildlife, Coastal Oregon Marine Experiment Station, Hatfield Marine Science Center, Oregon State University Newport, OR, USA
| | - Ryon Kurth
- California Department of Water Resources, Division of Environmental Services Oroville, CA, USA
| | - Allen J Dill
- California Department of Fish and Game, Feather River Hatchery Oroville, CA, USA
| | - Michael A Banks
- Department of Fisheries and Wildlife, Coastal Oregon Marine Experiment Station, Hatfield Marine Science Center, Oregon State University Newport, OR, USA
| |
Collapse
|
10
|
Ramazzotti M, Monsellier E, Kamoun C, Degl'Innocenti D, Melki R. Polyglutamine repeats are associated to specific sequence biases that are conserved among eukaryotes. PLoS One 2012; 7:e30824. [PMID: 22312432 PMCID: PMC3270027 DOI: 10.1371/journal.pone.0030824] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2011] [Accepted: 12/23/2011] [Indexed: 12/20/2022] Open
Abstract
Nine human neurodegenerative diseases, including Huntington's disease and several spinocerebellar ataxia, are associated to the aggregation of proteins comprising an extended tract of consecutive glutamine residues (polyQs) once it exceeds a certain length threshold. This event is believed to be the consequence of the expansion of polyCAG codons during the replication process. This is in apparent contradiction with the fact that many polyQs-containing proteins remain soluble and are encoded by invariant genes in a number of eukaryotes. The latter suggests that polyQs expansion and/or aggregation might be counter-selected through a genetic and/or protein context. To identify this context, we designed a software that scrutinize entire proteomes in search for imperfect polyQs. The nature of residues flanking the polyQs and that of residues other than Gln within polyQs (insertions) were assessed. We discovered strong amino acid residue biases robustly associated to polyQs in the 15 eukaryotic proteomes we examined, with an over-representation of Pro, Leu and His and an under-representation of Asp, Cys and Gly amino acid residues. These biases are conserved amongst unrelated proteins and are independent of specific functional classes. Our findings suggest that specific residues have been co-selected with polyQs during evolution. We discuss the possible selective pressures responsible of the observed biases.
Collapse
Affiliation(s)
- Matteo Ramazzotti
- Dipartimento di Scienze Biochimiche, Università degli Studi di Firenze, Florence, Italy
- * E-mail: (MR); (EM)
| | - Elodie Monsellier
- Laboratoire d'Enzymologie et de Biochimie Structurales, UPR 3082 CNRS, Gif sur Yvette, France
- * E-mail: (MR); (EM)
| | - Choumouss Kamoun
- Laboratoire d'Enzymologie et de Biochimie Structurales, UPR 3082 CNRS, Gif sur Yvette, France
| | | | - Ronald Melki
- Laboratoire d'Enzymologie et de Biochimie Structurales, UPR 3082 CNRS, Gif sur Yvette, France
| |
Collapse
|
11
|
Location trumps length: polyglutamine-mediated changes in folding and aggregation of a host protein. Biophys J 2011; 100:2773-82. [PMID: 21641323 DOI: 10.1016/j.bpj.2011.04.028] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2011] [Revised: 04/05/2011] [Accepted: 04/08/2011] [Indexed: 11/21/2022] Open
Abstract
Expanded CAG diseases are progressive neurodegenerative disorders in which specific proteins have an unusually long polyglutamine stretch. Although these proteins share no other sequence or structural homologies, they all aggregate into intracellular inclusions that are believed to be pathological. We sought to determine what impact the position and number of glutamines have on the structure and aggregation of the host protein, apomyoglobin. Variable-length polyQ tracts were inserted either into the loop between the C- and D-helices (Q(n)CD) or at the N-terminus (Q(n)NT). The Q(n)CD mutants lost some α-helix and gained unordered and/or β-sheet in a length-dependent manner. These mutants were partially unfolded and rapidly assembled into soluble chain-like oligomers. In sharp contrast, the Q(n)NT mutants largely retained wild-type tertiary structure but associated into long, fibrillar aggregates. Control proteins with glycine-serine repeats (GS(8)CD and GS(8)NT) were produced. GS(8)CD exhibited similar structural perturbations and aggregation characteristics to an analogously sized Q(16)CD, indicating that the observed effects are independent of amino acid composition. In contrast to Q(16)NT, GS(8)NT did not form fibrillar aggregates. Thus, soluble oligomers are produced through structural perturbation and do not require polyQ, whereas classic fibrils arise from specific polyQ intermolecular interactions in the absence of misfolding.
Collapse
|
12
|
Siwach P, Sengupta S, Parihar R, Ganesh S. Proline repeats, in cis- and trans-positions, confer protection against the toxicity of misfolded proteins in a mammalian cellular model. Neurosci Res 2011; 70:435-41. [PMID: 21616100 DOI: 10.1016/j.neures.2011.05.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2010] [Revised: 04/14/2011] [Accepted: 05/02/2011] [Indexed: 11/28/2022]
Abstract
A broad range of neurodegenerative disorders result from the cytotoxicity conferred by aberrantly folded mutant proteins. Intriguingly, the cytotoxicity and aggregation property of a few mutant proteins are known to be modulated by the flanking sequences. One of such modulators is the proline repeat tract. Using a mammalian cellular model, we show here that proline repeat tract, both in cis- and in trans-positions, ameliorate the cytotoxicity of wide range of misfolded proteins coded by synthetic constructs. We further show that the proline repeat tract could possibly confer protection against the cytotoxicity of misfolded proteins by altering their conformation at the time of their synthesis. Thus, our study elucidates the mechanism by which the proline repeat tract might ameliorate the toxicity of misfolded proteins, and opens up new therapeutic modalities for disorders caused by cytotoxic misfolded proteins.
Collapse
Affiliation(s)
- Pratibha Siwach
- Department of Biological Sciences and Bioengineering, Indian Institute of Technology, Kalyanpur, Kanpur, UP 208016, India
| | | | | | | |
Collapse
|
13
|
Haerty W, Golding GB. Low-complexity sequences and single amino acid repeats: not just "junk" peptide sequences. Genome 2011; 53:753-62. [PMID: 20962881 DOI: 10.1139/g10-063] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
For decades proteins were thought to interact in a "lock and key" system, which led to the definition of a paradigm linking stable three-dimensional structure to biological function. As a consequence, any non-structured peptide was considered to be nonfunctional and to evolve neutrally. Surprisingly, the most commonly shared peptides between eukaryotic proteomes are low-complexity sequences that in most conditions do not present a stable three-dimensional structure. However, because these sequences evolve rapidly and because the size variation of a few of them can have deleterious effects, low-complexity sequences have been suggested to be the target of selection. Here we review evidence that supports the idea that these simple sequences should not be considered just "junk" peptides and that selection drives the evolution of many of them.
Collapse
Affiliation(s)
- Wilfried Haerty
- Biology Department, McMaster University, Hamilton, ON, Canada
| | | |
Collapse
|
14
|
Role of Everlasting Triplet Expansions in Protein Evolution. J Mol Evol 2010; 72:232-9. [DOI: 10.1007/s00239-010-9425-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2010] [Accepted: 12/01/2010] [Indexed: 02/05/2023]
|
15
|
Li F, Guo S, Zhao Y, Chen D, Chong K, Xu Y. Overexpression of a homopeptide repeat-containing bHLH protein gene (OrbHLH001) from Dongxiang Wild Rice confers freezing and salt tolerance in transgenic Arabidopsis. PLANT CELL REPORTS 2010; 29:977-86. [PMID: 20559833 DOI: 10.1007/s00299-010-0883-z] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2010] [Revised: 05/07/2010] [Accepted: 05/24/2010] [Indexed: 05/02/2023]
Abstract
Dongxiang Wild Rice (Oryza rufipogon) is the northernmost wild rice in the world known to date and has extremely high cold tolerance and many other adversity-resistant properties. To identify the genes responsible for the high stress tolerance, we isolated and characterized a basic helix-loop-helix (bHLH) protein gene OrbHLH001 from Dongxiang Wild Rice. The gene encodes an ICE1-like protein containing multiple homopeptide repeats. Expression of OrbHLH001 is induced by salt stress and is predominant in the shoots of wild rice seedlings. Overexpression of OrbHLH001 enhanced the tolerance to freezing and salt stresses in transgenic Arabidopsis. Examination of the expression of cold-responsive genes in transgenic Arabidopsis showed that the function of OrbHLH001 differs from that of ICE1 and is independent of a CBF/DREB1 cold-response pathway.
Collapse
Affiliation(s)
- Fei Li
- Graduate University of the Chinese Academy of Sciences, 100093, Beijing, P. R. China
| | | | | | | | | | | |
Collapse
|
16
|
Tan JC, Tan A, Checkley L, Honsa CM, Ferdig MT. Variable numbers of tandem repeats in Plasmodium falciparum genes. J Mol Evol 2010; 71:268-78. [PMID: 20730584 DOI: 10.1007/s00239-010-9381-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2009] [Accepted: 08/09/2010] [Indexed: 11/29/2022]
Abstract
Genome variation studies in Plasmodium falciparum have focused on SNPs and, more recently, large-scale copy number polymorphisms and ectopic rearrangements. Here, we examine another source of variation: variable number tandem repeats (VNTRs). Interspersed low complexity features, including the well-studied P. falciparum microsatellite sequences, are commonly classified as VNTRs; however, this study is focused on longer coding VNTR polymorphisms, a small class of copy number variations. Selection against frameshift mutation is a main constraint on tandem repeats (TRs) in coding regions, while limited propagation of TRs longer than 975 nt total length is a minor restriction in coding regions. Comparative analysis of three P. falciparum genomes reveals that more than 9% of all P. falciparum ORFs harbor VNTRs, much more than has been reported for any other species. Moreover, genotyping of VNTR loci in a drug-selected line, progeny of a genetic cross, and 334 field isolates demonstrates broad variability in these sequences. Functional enrichment analysis of ORFs harboring VNTRs identifies stress and DNA damage responses along with chromatin modification activities, suggesting an influence on genome mutability and functional variation. Analysis of the repeat units and their flanking regions in both P. falciparum and Plasmodium reichenowi sequences implicates a replication slippage mechanism in the generation of TRs from an initially unrepeated sequence. VNTRs can contribute to rapid adaptation by localized sequence duplication. They also can confound SNP-typing microarrays or mapping short-sequence reads and therefore must be accounted for in such analyses.
Collapse
Affiliation(s)
- John C Tan
- The Eck Institute for Global Health, University of Notre Dame, 100 Galvin Life Sciences, Notre Dame, IN, 46556, USA.
| | | | | | | | | |
Collapse
|
17
|
Birge LM, Pitts ML, Richard BH, Wilkinson GS. Length polymorphism and head shape association among genes with polyglutamine repeats in the stalk-eyed fly, Teleopsis dalmanni. BMC Evol Biol 2010; 10:227. [PMID: 20663190 PMCID: PMC3055267 DOI: 10.1186/1471-2148-10-227] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2010] [Accepted: 07/27/2010] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Polymorphisms of single amino acid repeats (SARPs) are a potential source of genetic variation for rapidly evolving morphological traits. Here, we characterize variation in and test for an association between SARPs and head shape, a trait under strong sexual selection, in the stalk-eyed fly, Teleopsis dalmanni. Using an annotated expressed sequence tag database developed from eye-antennal imaginal disc tissues in T. dalmanni we identified 98 genes containing nine or more consecutive copies of a single amino acid. We then quantify variation in length and allelic diversity for 32 codon and 15 noncodon repeat regions in a large outbred population. We also assessed the frequency with which amino acid repeats are either gained or lost by identifying sequence similarities between T. dalmanni SARP loci and their orthologs in Drosophila melanogaster. Finally, to identify SARP containing genes that may influence head development we conducted a two-generation association study after assortatively mating for extreme relative eyespan. RESULTS We found that glutamine repeats occur more often than expected by amino acid abundance among 3,400 head development genes in T. dalmanni and D. melanogaster. Furthermore, glutamine repeats occur disproportionately in transcription factors. Loci with glutamine repeats exhibit heterozygosities and allelic diversities that do not differ from noncoding dinucleotide microsatellites, including greater variation among X-linked than autosomal regions. In the majority of cases, repeat tracts did not overlap between T. dalmanni and D. melanogaster indicating that large glutamine repeats are gained or lost frequently during Dipteran evolution. Analysis of covariance reveals a significant effect of parental genotype on mean progeny eyespan, with body length as a covariate, at six SARP loci [CG33692, ptip, band4.1 inhibitor LRP interactor, corto, 3531953:1, and ecdysone-induced protein 75B (Eip75B)]. Mixed model analysis of covariance using the eyespan of siblings segregating for repeat length variation confirms that significant genotype-phenotype associations exist for at least one sex at five of these loci and for one gene, CG33692, longer repeats were associated with longer relative eyespan in both sexes. CONCLUSION Among genes expressed during head development in stalk-eyed flies, long codon repeats typically contain glutamine, occur in transcription factors and exhibit high levels of heterozygosity. Furthermore, the presence of significant associations within families between repeat length and head shape indicates that six genes, or genes linked to them, contribute genetic variation to the development of this extremely sexually dimorphic trait.
Collapse
Affiliation(s)
- Leanna M Birge
- Department of Biology, University of Maryland, College Park, MD 20742 USA
- University College London, Research Department of Genetics, Evolution and Environment, Wolfson House, 4 Stephenson Way, London, NW1 2HE, UK
| | - Marie L Pitts
- Department of Biology, The College of William and Mary, Williamsburg, VA 23187 USA
| | - Baker H Richard
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY, 10024 USA
| | - Gerald S Wilkinson
- Department of Biology, University of Maryland, College Park, MD 20742 USA
| |
Collapse
|
18
|
Łabaj PP, Leparc GG, Bardet AF, Kreil G, Kreil DP. Single amino acid repeats in signal peptides. FEBS J 2010; 277:3147-57. [DOI: 10.1111/j.1742-4658.2010.07720.x] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
19
|
Chaley MB, Nazipova NN, Kutyrkin VA. Statistical methods for detecting latent periodicity patterns in biological sequences: The case of small-size samples. PATTERN RECOGNITION AND IMAGE ANALYSIS 2009. [DOI: 10.1134/s1054661809020217] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
20
|
Salichs E, Ledda A, Mularoni L, Albà MM, de la Luna S. Genome-wide analysis of histidine repeats reveals their role in the localization of human proteins to the nuclear speckles compartment. PLoS Genet 2009; 5:e1000397. [PMID: 19266028 PMCID: PMC2644819 DOI: 10.1371/journal.pgen.1000397] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2008] [Accepted: 01/30/2009] [Indexed: 12/20/2022] Open
Abstract
Single amino acid repeats are prevalent in eukaryote organisms, although the role of many such sequences is still poorly understood. We have performed a comprehensive analysis of the proteins containing homopolymeric histidine tracts in the human genome and identified 86 human proteins that contain stretches of five or more histidines. Most of them are endowed with DNA- and RNA-related functions, and, in addition, there is an overrepresentation of proteins expressed in the brain and/or nervous system development. An analysis of their subcellular localization shows that 15 of the 22 nuclear proteins identified accumulate in the nuclear subcompartment known as nuclear speckles. This localization is lost when the histidine repeat is deleted, and significantly, closely related paralogous proteins without histidine repeats also fail to localize to nuclear speckles. Hence, the histidine tract appears to be directly involved in targeting proteins to this compartment. The removal of DNA-binding domains or treatment with RNA polymerase II inhibitors induces the re-localization of several polyhistidine-containing proteins from the nucleoplasm to nuclear speckles. These findings highlight the dynamic relationship between sites of transcription and nuclear speckles. Therefore, we define the histidine repeats as a novel targeting signal for nuclear speckles, and we suggest that these repeats are a way of generating evolutionary diversification in gene duplicates. These data contribute to our better understanding of the physiological role of single amino acid repeats in proteins. Single amino acid repeats are common in eukaryotic proteins. Some of them are associated with developmental and neurodegenerative disorders in humans, suggesting that they play important functions. However, the role of many of these repeats is unknown. Here, we have studied histidine repeats from a bioinformatics as well as a functional point of view. We found that only 86 proteins in the human genome contain stretches of five or more histidines, and that most of these proteins have functions related with RNA synthesis. When studying where these proteins localize in the cell, we found that a significant proportion accumulate in a subnuclear organelle known as nuclear speckles, via the histidine repeat. This is a structure where proteins related to the synthesis and processing of RNA accumulate. In some cases, the localization is transient and depends on the transcriptional requirements of the cell. Our findings are important because they identify a common cellular function for stretches of histidine residues, and they support the notion that histidine repeats contribute to generate evolutionary diversification. Finally, and considering that some of the proteins with histidine stretches are key elements in essential developmental processes, variation in these repeats would be expected to contribute to human disease.
Collapse
Affiliation(s)
- Eulàlia Salichs
- Genes and Disease Program, Centre de Regulació Genòmica (CRG), Barcelona, Spain
- El Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Barcelona, Spain
| | - Alice Ledda
- Biomedical Informatics Research Program, Institut Municipal d'Investigació Mèdica-IMIM, Barcelona, Spain
| | - Loris Mularoni
- Biomedical Informatics Research Program, Institut Municipal d'Investigació Mèdica-IMIM, Barcelona, Spain
| | - M. Mar Albà
- Biomedical Informatics Research Program, Institut Municipal d'Investigació Mèdica-IMIM, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Susana de la Luna
- Genes and Disease Program, Centre de Regulació Genòmica (CRG), Barcelona, Spain
- El Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
- * E-mail:
| |
Collapse
|
21
|
Siwach P, Sengupta S, Parihar R, Ganesh S. Spatial positions of homopolymeric repeats in the human proteome and their effect on cellular toxicity. Biochem Biophys Res Commun 2009; 380:382-6. [PMID: 19250635 DOI: 10.1016/j.bbrc.2009.01.101] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2009] [Accepted: 01/16/2009] [Indexed: 11/30/2022]
Abstract
Proteins with homopolymeric repeat tracts are very common in the human proteome. Intriguingly, some but not all repeat tracts show length variation in the population and, in a few, the expansion of repeat tract beyond the normal length is associated with neurodegenerative and developmental disorders. In this study we have addressed questions such as why some amino acid residues are favored in longer repeat tracts and why repeat tracts show terminal bias. Using cell biological assays for repeat tracts fused to green fluorescent protein we show here that homopolymeric repeats that are beyond their naturally occurring length in the proteome are cytotoxic in nature. This toxicity is further modulated by the length of the peptide that bears the repeat and the spatial location of the repeat within the peptide. Thus, the cellular toxicity appears to be one of the selective processes that regulate the evolution of homopolymeric repeats in the proteome.
Collapse
Affiliation(s)
- Pratibha Siwach
- Department of Biological Sciences and Bioengineering, Indian Institute of Technology, Kalyanpur, Kanpur 208016, India
| | | | | | | |
Collapse
|
22
|
Gibbons JG, Rokas A. Comparative and functional characterization of intragenic tandem repeats in 10 Aspergillus genomes. Mol Biol Evol 2008; 26:591-602. [PMID: 19056904 DOI: 10.1093/molbev/msn277] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Intragenic tandem repeats (ITRs) are consecutive repeats of three or more nucleotides found in coding regions. ITRs are the underlying cause of several human genetic diseases and have been associated with phenotypic variation, including pathogenesis, in several clades of the tree of life. We have examined the evolution and functional role of ITRs in 10 genomes spanning the fungal genus Aspergillus, a clade of relevance to medicine, agriculture, and industry. We identified several hundred ITRs in each of the species examined. ITR content varied extensively between species, with an average 79% of ITRs unique to a given species. For the fraction of conserved ITR regions, sequence comparisons within species and between close relatives revealed that they were highly variable. ITR-containing proteins were evolutionarily less conserved, compositionally distinct, and overrepresented for domains associated with cell-surface localization and function relative to the rest of the proteome. Furthermore, ITRs were preferentially found in proteins involved in transcription, cellular communication, and cell-type differentiation but were underrepresented in proteins involved in metabolism and energy. Importantly, although ITRs were evolutionarily labile, their functional associations appeared. To be remarkably conserved across eukaryotes. Fungal ITRs likely participate in a variety of developmental processes and cell-surface-associated functions, suggesting that their contribution to fungal lifestyle and evolution may be more general than previously assumed.
Collapse
Affiliation(s)
- John G Gibbons
- Department of Biological Sciences, Vanderbilt University, Nashville, USA
| | | |
Collapse
|
23
|
Anbazhagan P, Purushottam M, Kumar HBK, Kubendran S, Mukherjee O, Brahmachari SK, Jain S, Sowdhamini R. Evolutionary analysis of PHLPP1 gene in humans and non-human primates. Bioinformation 2008; 2:471-4. [PMID: 18841245 PMCID: PMC2561169 DOI: 10.6026/97320630002471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2008] [Revised: 07/20/2008] [Accepted: 07/23/2008] [Indexed: 12/05/2022] Open
Abstract
The chromosome 18q22-23 region has been shown to be implicated in bipolar disorder (BPAD) by several studies. PHLPP1 gene, in the locus (chromosome 18q22-23), is involved in circadian pathways and bears modules like 'PH domain and leucine rich repeat protein phosphatase'. This gene also contains a polyglutamine (CAG or PolyQ) repeat motif at the carboxyl terminal end. A comparative analysis of the PolyQ repeats of the PHLPP1 gene in humans, non-human primates and other species has been attempted in order to investigate the possible significance of repeat length as seen in other triplet-repeat associated diseases. Sequencing of the CAG repeat in humans and in non-human primates revealed that the CAG repeat is not polymorphic in humans; whereas, in other species it shows an area of high variability, both in length and sequence composition. Despite the conservation of circadian clock components in different species, there is remarkable diversity in the protein structure, regulation and biochemical functions of the circadian orthologs. These can be due to specific adaptations in accordance with the physiology of the particular species providing a species-specific biological advantage.
Collapse
Affiliation(s)
- Padmanabhan Anbazhagan
- Molecular Genetics Laboratory, Department of Psychiatry, National Institute of Mental Health and Neurosciences, Bangalore, India
- National Centre for Biological Sciences, Bangalore, India
| | - Meera Purushottam
- Molecular Genetics Laboratory, Department of Psychiatry, National Institute of Mental Health and Neurosciences, Bangalore, India
| | - H B Kiran Kumar
- Molecular Genetics Laboratory, Department of Psychiatry, National Institute of Mental Health and Neurosciences, Bangalore, India
| | - Shobana Kubendran
- Molecular Genetics Laboratory, Department of Psychiatry, National Institute of Mental Health and Neurosciences, Bangalore, India
| | | | | | - Sanjeev Jain
- Molecular Genetics Laboratory, Department of Psychiatry, National Institute of Mental Health and Neurosciences, Bangalore, India
| | | |
Collapse
|
24
|
Model of perfect tandem repeat with random pattern and empirical homogeneity testing poly-criteria for latent periodicity revelation in biological sequences. Math Biosci 2008; 211:186-204. [DOI: 10.1016/j.mbs.2007.10.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2007] [Revised: 10/19/2007] [Accepted: 10/26/2007] [Indexed: 11/23/2022]
|
25
|
O'Malley KG, Camara MD, Banks MA. Candidate loci reveal genetic differentiation between temporally divergent migratory runs of Chinook salmon (Oncorhynchus tshawytscha). Mol Ecol 2007; 16:4930-41. [PMID: 17971087 DOI: 10.1111/j.1365-294x.2007.03565.x] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Local adaptation is a dynamic process driven by selection that can vary both in space and time. One important temporal adaptation for migratory animals is the time at which individuals return to breeding sites. Chinook salmon (Oncorhynchus tshawytscha) are excellent subjects for studying the genetic basis of temporal adaptation because their high seasonal homing fidelity promotes reproductive isolation leading to the formation of local populations across diverse environments. We tested for adaptive genetic differentiation between seasonal runs of Chinook salmon using two candidate loci; the circadian rhythm gene, OtsClock1b, and Ots515NWFSC, a microsatellite locus showing sequence identity to three salmonid genes central to reproductive development. We found significant evidence for two genetically distinct migratory runs in the Feather River, California (OtsClock1b: F(ST)=0.042, P=0.02; Ots515NWFSC: F(ST)=0.058, P=0.003). In contrast, the fall and threatened spring runs are genetically homogenous based on neutral microsatellite data (F(ST)=-0.0002). Similarly, two temporally divergent migratory runs of Chinook salmon from New Zealand are genetically differentiated based on polymorphisms in the candidate loci (OtsClock1b: F(ST)=0.083, P-value=0.001; Ots515NWFSC: F(ST)=0.095, P-value=0.000). We used an individual-based assignment method to confirm that these recently diverged populations originated from a single source in California. Tests for selective neutrality indicate that OtsClock1b and Ots515NWFSC exhibit substantial departures from neutral expectations in both systems. The large F(ST )estimates could therefore be the result of directional selection. Evidence presented here suggests that OtsClock1b and Ots515NWFSC may influence migration and spawning timing of Chinook salmon in these river systems.
Collapse
Affiliation(s)
- Kathleen G O'Malley
- Coastal Oregon Marine Experiment Station, Hatfield Marine Science Center, Department of Fisheries and Wildlife, Oregon State University, 2030 SE Marine Science Drive, Newport, Oregon 97365, USA.
| | | | | |
Collapse
|
26
|
Huntley MA, Clark AG. Evolutionary Analysis of Amino Acid Repeats across the Genomes of 12 Drosophila Species. Mol Biol Evol 2007; 24:2598-609. [PMID: 17602168 DOI: 10.1093/molbev/msm129] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Repeated motifs of amino acids within proteins are an abundant feature of eukaryotic sequences and may catalyze the rapid production of genetic and even phenotypic variation among organisms. The completion of the genome sequencing projects of 12 distinct Drosophila species provides a unique dataset to study these intriguing sequence features on a phylogeny with a variety of timescales. We show that there is a higher percentage of proteins containing repeats within the Drosophila genus than most other eukaryotes, including non-Drosphila insects, which makes this collection of species particularly useful for the study of protein repeats. We also find that proteins containing repeats are overrepresented in functional categories involving developmental processes, signaling, and gene regulation. Using the set of 1-to-1 ortholog alignments for the 12 Drosophila species, we test the ability of repeats to act as reliable phylogenetic signals and find that they resolve the generally accepted phylogeny despite the noise caused by their accelerated rate of evolution. We also determine that in general the position of repeats within a protein sequence is non-random, with repeats more often being absent from the middle regions of sequences. Finally we find evidence to suggest that the presence of repeats is associated with an increase in evolutionary rate upon the entire sequence in which they are embedded. With additional evidence to suggest a corresponding elevation in positive selection we propose that some repeats may be inducing compensatory substitutions in their surrounding sequence.
Collapse
Affiliation(s)
- Melanie A Huntley
- Department of Molecular Biology and Genetics Cornell University, USA.
| | | |
Collapse
|
27
|
Faux NG, Huttley GA, Mahmood K, Webb GI, Garcia de la Banda M, Whisstock JC. RCPdb: An evolutionary classification and codon usage database for repeat-containing proteins. Genome Res 2007; 17:1118-27. [PMID: 17567984 PMCID: PMC1899123 DOI: 10.1101/gr.6255407] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Over 3% of human proteins contain single amino acid repeats (repeat-containing proteins, RCPs). Many repeats (homopeptides) localize to important proteins involved in transcription, and the expansion of certain repeats, in particular poly-Q and poly-A tracts, can also lead to the development of neurological diseases. Previous studies have suggested that the homopeptide makeup is a result of the presence of G+C-rich tracts in the encoding genes and that expansion occurs via replication slippage. Here, we have performed a large-scale genomic analysis of the variation of the genes encoding RCPs in 13 species and present these data in an online database (http://repeats.med.monash.edu.au/genetic_analysis/). This resource allows rapid comparison and analysis of RCPs, homopeptides, and their underlying genetic tracts across the eukaryotic species considered. We report three major findings. First, there is a bias for a small subset of codons being reiterated within homopeptides, and there is no G+C or A+T bias relative to the organism's transcriptome. Second, single base pair transversions from the homocodon are unusually common and may represent a mechanism of reducing the rate of homopeptide mutations. Third, homopeptides that are conserved across different species lie within regions that are under stronger purifying selection in contrast to nonconserved homopeptides.
Collapse
Affiliation(s)
- Noel G. Faux
- Protein Crystallography Unit, Department of Biochemistry and Molecular Biology, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia
- Victorian Bioinformatics Consortium, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia
- ARC Centre for Structural and Functional Microbial Genomics, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia
| | - Gavin A. Huttley
- John Curtin School of Medical Research, Australian National University, Canberra, Australian National Territory 0200, Australia
| | - Khalid Mahmood
- Protein Crystallography Unit, Department of Biochemistry and Molecular Biology, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia
- Victorian Bioinformatics Consortium, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia
- ARC Centre for Structural and Functional Microbial Genomics, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia
| | - Geoffrey I. Webb
- Victorian Bioinformatics Consortium, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia
- School of Computer Science and Software Engineering, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia
| | - Maria Garcia de la Banda
- Victorian Bioinformatics Consortium, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia
- School of Computer Science and Software Engineering, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia
- Corresponding authors.E-mail ; fax 61 3 9905 4699.E-mail ; fax 61 3 9905 4699
| | - James C. Whisstock
- Protein Crystallography Unit, Department of Biochemistry and Molecular Biology, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia
- Victorian Bioinformatics Consortium, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia
- ARC Centre for Structural and Functional Microbial Genomics, Monash University, Clayton Campus, Melbourne, Victoria 3800, Australia
- Corresponding authors.E-mail ; fax 61 3 9905 4699.E-mail ; fax 61 3 9905 4699
| |
Collapse
|