1
|
Kavouras M, Malandrakis EE, Danis T, Blom E, Anastassiadis K, Panagiotaki P, Exadactylos A. Hox Genes Polymorphism Depicts Developmental Disruption of Common Sole Eggs. Open Life Sci 2019; 14:549-563. [PMID: 33817191 PMCID: PMC7874752 DOI: 10.1515/biol-2019-0061] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Accepted: 11/22/2019] [Indexed: 12/31/2022] Open
Abstract
In sole aquaculture production, consistency in the quality of produced eggs throughout the year is unpredictable. Hox genes have a crucial role in controlling embryonic development and their genetic variation could alter the phenotype dramatically. In teleosts genome duplication led paralog hox genes to become diverged. Direct association of polymorphism in hoxa1a, hoxa2a & hoxa2b of Solea solea with egg viability indicates hoxa2b as a potential genetic marker. High Resolution Melt (HRM) analysis was carried out in 52 viable and 61 non-viable eggs collected at 54±6 hours post fertilization (hpf). Allelic and genotypic frequencies of polymorphism were analyzed and results illustrated a significantly increased risk for non-viability for minor alleles and their homozygous genotypes. Haplotype analysis demonstrated a significant recessive effect on the risk of non-viability, by increasing the odds of disrupting embryonic development up to three-fold. Phylogenetic analysis showed that the paralog genes hoxa2a and hoxa2b, are separated distinctly in two clades and presented a significant ω variation, revealing their diverged evolutionary rate.
Collapse
Affiliation(s)
| | - Emmanouil E. Malandrakis
- Department of Ichthyology and Aquatic Environment, School of Agricultural Sciences, University of Thessaly, Fytokou str, Volos, Greece
| | - Theodoros Danis
- Department of Ichthyology and Aquatic Environment, School of Agricultural Sciences, University of Thessaly, Fytokou str, Volos, Greece
| | - Ewout Blom
- Wageningen Marine Research, Wageningen University & Research, IJmuiden, The Netherlands
| | | | - Panagiota Panagiotaki
- Department of Ichthyology and Aquatic Environment, School of Agricultural Sciences, University of Thessaly, Fytokou str, Volos, Greece
| | | |
Collapse
|
2
|
Li J, Su Y, Wang T. The Repeat Sequences and Elevated Substitution Rates of the Chloroplast accD Gene in Cupressophytes. FRONTIERS IN PLANT SCIENCE 2018; 9:533. [PMID: 29731764 PMCID: PMC5920036 DOI: 10.3389/fpls.2018.00533] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Accepted: 04/05/2018] [Indexed: 05/23/2023]
Abstract
The plastid accD gene encodes a subunit of the acetyl-CoA carboxylase (ACCase) enzyme. The length of accD gene has been supposed to expand in Cryptomeria japonica, Taiwania cryptomerioides, Cephalotaxus, Taxus chinensis, and Podocarpus lambertii, and the main reason for this phenomenon was the existence of tandemly repeated sequences. However, it is still unknown whether the accD gene length in other cupressophytes has expanded. Here, in order to investigate how widespread this phenomenon was, 18 accD sequences and its surrounding regions of cupressophyte were sequenced and analyzed. Together with 39 GenBank sequence data, our taxon sampling covered all the extant gymnosperm orders. The repetitive elements and substitution rates of accD among 57 gymnosperm species were analyzed, the results show: (1) Reading frame length of accD gene in 18 cupressophytes species has also expanded. (2) Many repetitive elements were identified in accD gene of cupressophyte lineages. (3) The synonymous and non-synonymous substitution rates of accD were accelerated in cupressophytes. (4) accD was located in rearrangement endpoints. These results suggested that repetitive elements may mediate the chloroplast genome rearrangement and accelerated the substitution rates.
Collapse
Affiliation(s)
- Jia Li
- Department of Life Sciences, Shaanxi Xueqian Normal University, Xi’an, China
| | - Yingjuan Su
- School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- Research Institute of Sun Yat-sen University, Shenzhen, China
| | - Ting Wang
- College of Life Science, South China Agricultural University, Guangzhou, China
| |
Collapse
|
3
|
Chaudhry SR, Lwin N, Phelan D, Escalante AA, Battistuzzi FU. Comparative analysis of low complexity regions in Plasmodia. Sci Rep 2018; 8:335. [PMID: 29321589 PMCID: PMC5762703 DOI: 10.1038/s41598-017-18695-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Accepted: 12/14/2017] [Indexed: 12/20/2022] Open
Abstract
Low complexity regions (LCRs) are a common feature shared by many genomes, but their evolutionary and functional significance remains mostly unknown. At the core of the uncertainty is a poor understanding of the mechanisms that regulate their retention in genomes, whether driven by natural selection or neutral evolution. Applying a comparative approach of LCRs to multiple strains and species is a powerful approach to identify patterns of conservation in these regions. Using this method, we investigate the evolutionary history of LCRs in the genus Plasmodium based on orthologous protein coding genes shared by 11 species and strains from primate and rodent-infecting pathogens. We find multiple lines of evidence in support of natural selection as a major evolutionary force shaping the composition and conservation of LCRs through time and signatures that their evolutionary paths are species specific. Our findings add a comparative analysis perspective to the debate on the evolution of LCRs and harness the power of sequence comparisons to identify potential functionally important LCR candidates.
Collapse
Affiliation(s)
- S R Chaudhry
- Department of Biological Sciences, Oakland University, Rochester, MI, USA.,Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA
| | - N Lwin
- Department of Biological Sciences, Oakland University, Rochester, MI, USA
| | - D Phelan
- Department of Biological Sciences, Oakland University, Rochester, MI, USA
| | - A A Escalante
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - F U Battistuzzi
- Department of Biological Sciences, Oakland University, Rochester, MI, USA. .,Center for Data Science and Big Data Analytics, Oakland University, Rochester, MI, USA.
| |
Collapse
|
4
|
Constraints and consequences of the emergence of amino acid repeats in eukaryotic proteins. Nat Struct Mol Biol 2017; 24:765-777. [PMID: 28805808 DOI: 10.1038/nsmb.3441] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 06/23/2017] [Indexed: 12/21/2022]
Abstract
Proteins with amino acid homorepeats have the potential to be detrimental to cells and are often associated with human diseases. Why, then, are homorepeats prevalent in eukaryotic proteomes? In yeast, homorepeats are enriched in proteins that are essential and pleiotropic and that buffer environmental insults. The presence of homorepeats increases the functional versatility of proteins by mediating protein interactions and facilitating spatial organization in a repeat-dependent manner. During evolution, homorepeats are preferentially retained in proteins with stringent proteostasis, which might minimize repeat-associated detrimental effects such as unregulated phase separation and protein aggregation. Their presence facilitates rapid protein divergence through accumulation of amino acid substitutions, which often affect linear motifs and post-translational-modification sites. These substitutions may result in rewiring protein interaction and signaling networks. Thus, homorepeats are distinct modules that are often retained in stringently regulated proteins. Their presence facilitates rapid exploration of the genotype-phenotype landscape of a population, thereby contributing to adaptation and fitness.
Collapse
|
5
|
Sablok G, Chen TW, Lee CC, Yang C, Gan RC, Wegrzyn JL, Porta NL, Nayak KC, Huang PJ, Varotto C, Tang P. ChloroMitoCU: Codon patterns across organelle genomes for functional genomics and evolutionary applications. DNA Res 2017; 24:327-332. [PMID: 28419256 PMCID: PMC5499650 DOI: 10.1093/dnares/dsw044] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Accepted: 09/14/2016] [Indexed: 01/01/2023] Open
Abstract
Organelle genomes are widely thought to have arisen from reduction events involving cyanobacterial and archaeal genomes, in the case of chloroplasts, or α-proteobacterial genomes, in the case of mitochondria. Heterogeneity in base composition and codon preference has long been the subject of investigation of topics ranging from phylogenetic distortion to the design of overexpression cassettes for transgenic expression. From the overexpression point of view, it is critical to systematically analyze the codon usage patterns of the organelle genomes. In light of the importance of codon usage patterns in the development of hyper-expression organelle transgenics, we present ChloroMitoCU, the first-ever curated, web-based reference catalog of the codon usage patterns in organelle genomes. ChloroMitoCU contains the pre-compiled codon usage patterns of 328 chloroplast genomes (29,960 CDS) and 3,502 mitochondrial genomes (49,066 CDS), enabling genome-wide exploration and comparative analysis of codon usage patterns across species. ChloroMitoCU allows the phylogenetic comparison of codon usage patterns across organelle genomes, the prediction of codon usage patterns based on user-submitted transcripts or assembled organelle genes, and comparative analysis with the pre-compiled patterns across species of interest. ChloroMitoCU can increase our understanding of the biased patterns of codon usage in organelle genomes across multiple clades. ChloroMitoCU can be accessed at: http://chloromitocu.cgu.edu.tw/
Collapse
Affiliation(s)
- Gaurav Sablok
- Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all'Adige (TN), Italy
| | - Ting-Wen Chen
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Chi-Ching Lee
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Chi Yang
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Ruei-Chi Gan
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Jill L Wegrzyn
- Department of Ecology and Evolutionary Biology, University 10 of Connecticut, 75 North Eagleville Road, Storrs, CT 06269-3043 USA
| | - Nicola L Porta
- Department of Sustainable Agrobiosystems and Bioresources, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all'Adige (TN), Italy.,MOUNTFOR Project Centre, European Forest Institute, Via E. Mach 1, 38010 San Michele all'Adige, Trento, Italy
| | - Kinshuk C Nayak
- Bioinformatics Centre, Institute of Life Sciences, Department of Biotechnology, Govt. India, Nalco Square, Bhubaneswar - 751 023, India
| | - Po-Jung Huang
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Claudio Varotto
- Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all'Adige (TN), Italy
| | - Petrus Tang
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan.,Molecular Infectious Diseases Research Center, Chang Gung Memorial Hospital, Kweishan, Taoyuan 333, Taiwan
| |
Collapse
|
6
|
Battistuzzi FU, Schneider KA, Spencer MK, Fisher D, Chaudhry S, Escalante AA. Profiles of low complexity regions in Apicomplexa. BMC Evol Biol 2016; 16:47. [PMID: 26923229 PMCID: PMC4770516 DOI: 10.1186/s12862-016-0625-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2015] [Accepted: 02/17/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Low complexity regions (LCRs) are a ubiquitous feature in genomes and yet their evolutionary history and functional roles are unclear. Previous studies have shown contrasting evidence in favor of both neutral and selective mechanisms of evolution for different sets of LCRs suggesting that modes of identification of these regions may play a role in our ability to discern their evolutionary history. To further investigate this issue, we used a multiple threshold approach to identify species-specific profiles of proteome complexity and, by comparing properties of these sets, determine the influence that starting parameters have on evolutionary inferences. RESULTS We find that, although qualitatively similar, quantitatively each species has a unique LCR profile which represents the frequency of these regions within each genome. Inferences based on these profiles are more accurate in comparative analyses of genome complexity as they allow to determine the relative complexity of multiple genomes as well as the type of repetitiveness that is most common in each. Based on the multiple threshold LCR sets obtained, we identified predominant evolutionary mechanisms at different complexity levels, which show neutral mechanisms acting on highly repetitive LCRs (e.g., homopolymers) and selective forces becoming more important as heterogeneity of the LCRs increases. CONCLUSIONS Our results show how inferences based on LCRs are influenced by the parameters used to identify these regions. Sets of LCRs are heterogeneous aggregates of regions that include homo- and heteropolymers and, as such, evolve according to different mechanisms. LCR profiles provide a new way to investigate genome complexity across species and to determine the driving mechanism of their evolution.
Collapse
Affiliation(s)
| | - Kristan A Schneider
- Department of MNI, University of Applied Sciences Mittweida, Mittweida, Germany.
| | - Matthew K Spencer
- Department of Geology and Physics, Lake Superior State University, Sault Ste. Marie, MI, USA.
| | - David Fisher
- David Eccles School of Business, University of Utah, Salt Lake City, UT, USA.
| | - Sophia Chaudhry
- Department of Biological Sciences, Oakland University, Rochester, MI, USA. .,Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA.
| | - Ananias A Escalante
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.
| |
Collapse
|
7
|
Wu R, Liu Q, Zhang P, Liang D. Tandem amino acid repeats in the green anole (Anolis carolinensis) and other squamates may have a role in increasing genetic variability. BMC Genomics 2016; 17:109. [PMID: 26868501 PMCID: PMC4751654 DOI: 10.1186/s12864-016-2430-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2015] [Accepted: 02/02/2016] [Indexed: 01/04/2023] Open
Abstract
Background Tandem amino acid repeats are characterised by the consecutive recurrence of a single amino acid. They exhibit high rates of length mutations in addition to point mutations and have been proposed to be involved in genetic plasticity. Squamate reptiles (lizards and snakes) diversify in both morphology and physiology. The underlying mechanism is yet to be understood. In a previous phylogenomic analysis of reptiles, the density of tandem repeats in an anole lizard diverged heavily from that of the other reptiles. To gain further insight into the tandem amino acid repeats in squamates, we analysed the repeat content in the green anole (Anolis carolinensis) proteome and compared the amino acid repeats in a large orthologous protein data set from six vertebrates (the Western clawed frog, the green anole, the Chinese softshell turtle, the zebra finch, mouse and human). Results Our results revealed that the number of amino acid repeats in the green anole exceeded those found in the other five species studied. Species-only repeats were found in high proportion in the green anole but not in the other five species, suggesting that the green anole had gained many amino acid repeats in either the Anolis or the squamate lineage. Since the amino acid repeat containing genes in the green anole were highly enriched in genes related to transcription and development, an important family of developmental genes, i.e., the Hox family, was further studied in a wide collection of squamates. Abundant amino acid repeats were also observed, implying the general high tolerance of amino acid repeats in squamates. A particular enrichment of amino acid repeats was observed in the central class Hox genes that are known to be responsible for defining cervical to lumbar regions. Conclusions Our study suggests that the abundant amino acid repeats in the green anole, and possibly in other squamates, may play a role in increasing the genetic variability, and contribute to the evolutionary diversity of this clade. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2430-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Riga Wu
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, People's Republic of China.
| | - Qingfeng Liu
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, People's Republic of China.
| | - Peng Zhang
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, People's Republic of China.
| | - Dan Liang
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, People's Republic of China.
| |
Collapse
|
8
|
Lu X, Murphy RM. Asparagine Repeat Peptides: Aggregation Kinetics and Comparison with Glutamine Repeats. Biochemistry 2015. [PMID: 26204228 DOI: 10.1021/acs.biochem.5b00644] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Amino acid repeat runs are common occurrences in eukaryotic proteins, with glutamine (Q) and asparagine (N) as particularly frequent repeats. Abnormal expansion of Q-repeat domains causes at least nine neurodegenerative disorders, most likely because expansion leads to protein misfolding, aggregation, and toxicity. The linkage between Q-repeats and disease has motivated several investigations into the mechanism of aggregation and the role of Q-repeat length in aggregation. Curiously, glutamine repeats are common in vertebrates, whereas N-repeats are virtually absent in vertebrates, but common in invertebrates. One hypothesis for the lack of N-repeats in vertebrates is biophysical; that is, there is strong selective pressure in higher organisms against aggregation-prone proteins. If true, then asparagine and glutamine repeats must differ substantially in their aggregation properties despite their chemical similarities. In this work, aggregation of peptides with asparagine repeats of variable length (12-24) were characterized and compared to that of similar peptides with glutamine repeats. As with glutamine, aggregation of N-repeat peptides was strongly length-dependent. Replacement of glutamine with asparagine caused a subtle shift in the conformation of the monomer, which strongly affected the rate of aggregation. Specifically, N-repeat peptides adopted β-turn structural elements, leading to faster self-assembly into globular oligomers and much more rapid conversion into fibrillar aggregates, compared to Q-repeat peptides. These biophysical differences may account for the differing biological roles of N- versus Q-repeat domains.
Collapse
Affiliation(s)
- Xiaomeng Lu
- †Biophysics Program and ‡Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Regina M Murphy
- †Biophysics Program and ‡Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
9
|
Schaper E, Anisimova M. The evolution and function of protein tandem repeats in plants. THE NEW PHYTOLOGIST 2015; 206:397-410. [PMID: 25420631 DOI: 10.1111/nph.13184] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2014] [Accepted: 10/18/2014] [Indexed: 05/27/2023]
Abstract
Sequence tandem repeats (TRs) are abundant in proteomes across all domains of life. For plants, little is known about their distribution or contribution to protein function. We exhaustively annotated TRs and studied the evolution of TR unit variations for all Ensembl plants. Using phylogenetic patterns of TR units, we detected conserved TRs with unit number and order preserved during evolution, and those TRs that have diverged via recent TR unit gains/losses. We correlated the mode of evolution of TRs to protein function. TR number was strongly correlated with proteome size, with about one-half of all TRs recognized as common protein domains. The majority of TRs have been highly conserved over long evolutionary distances, some since the separation of red algae and green plants c. 1.6 billion yr ago. Conversely, recurrent recent TR unit mutations were rare. Our results suggest that the first TRs by far predate the first plants, and that TR appearance is an ongoing process with similar rates across the plant kingdom. Interestingly, the few detected highly mutable TRs might provide a source of variation for rapid adaptation. In particular, such TRs are enriched in leucine-rich repeats (LRRs) commonly found in R genes, where TR unit gain/loss may facilitate resistance to emerging pathogens.
Collapse
Affiliation(s)
- Elke Schaper
- Department of Computer Science, ETH Zürich, Zürich, 8092, Switzerland
- Institute of Integrative Biology, ETH Zürich, Zürich, 8092, Switzerland
- Vital-IT Competency Center, Swiss Institute for Bioinformatics (SIB), Lausanne, 1015, Switzerland
| | - Maria Anisimova
- Institute of Applied Simulation (IAS), School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW), Wädenswil, 8820, Switzerland
| |
Collapse
|
10
|
Lenz C, Haerty W, Golding GB. Increased substitution rates surrounding low-complexity regions within primate proteins. Genome Biol Evol 2014; 6:655-65. [PMID: 24572016 PMCID: PMC3971593 DOI: 10.1093/gbe/evu042] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Previous studies have found that DNA-flanking low-complexity regions (LCRs) have an increased substitution rate. Here, the substitution rate was confirmed to increase in the vicinity of LCRs in several primate species, including humans. This effect was also found among human sequences from the 1000 Genomes Project. A strong correlation was found between average substitution rate per site and distance from the LCR, as well as the proportion of genes with gaps in the alignment at each site and distance from the LCR. Along with substitution rates, dN/dS ratios were also determined for each site, and the proportion of sites undergoing negative selection was found to have a negative relationship with distance from the LCR.
Collapse
Affiliation(s)
- Carolyn Lenz
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | | | | |
Collapse
|
11
|
Schaper E, Gascuel O, Anisimova M. Deep conservation of human protein tandem repeats within the eukaryotes. Mol Biol Evol 2014; 31:1132-48. [PMID: 24497029 PMCID: PMC3995336 DOI: 10.1093/molbev/msu062] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Tandem repeats (TRs) are a major element of protein sequences in all domains of life. They are particularly abundant in mammals, where by conservative estimates one in three proteins contain a TR. High generation-scale duplication and deletion rates were reported for nucleic TR units. However, it is not known whether protein TR units can also be frequently lost or gained providing a source of variation for rapid adaptation of protein function, or alternatively, tend to have conserved TR unit configurations over long evolutionary times. To obtain a systematic picture, we performed a proteome-wide analysis of the mode of evolution for human protein TRs. For this purpose, we propose a novel method for the detection of orthologous TRs based on circular profile hidden Markov models. For all detected TRs, we reconstructed bispecies TR unit phylogenies across 61 eukaryotes ranging from human to yeast. Moreover, we performed additional analyses to correlate functional and structural annotations of human TRs with their mode of evolution. Surprisingly, we find that the vast majority of human TRs are ancient, with TR unit number and order preserved intact since distant speciation events. For example, ≥61% of all human TRs have been strongly conserved at least since the root of all mammals, approximately 300 Ma. Further, we find no human protein TR that shows evidence for strong recent duplications and deletions. The results are in contrast to the high generation-scale mutability of nucleic TRs. Presumably, most protein TRs fold into stable and conserved structures that are indispensable for the function of the TR-containing protein. All of our data and results are available for download from http://www.atgc-montpellier.fr/TRE.
Collapse
Affiliation(s)
- Elke Schaper
- Department of Computer Science, ETH Zürich, Zürich, Switzerland
| | | | | |
Collapse
|
12
|
Cloutier S, Miranda E, Ward K, Radovanovic N, Reimer E, Walichnowski A, Datla R, Rowland G, Duguid S, Ragupathy R. Simple sequence repeat marker development from bacterial artificial chromosome end sequences and expressed sequence tags of flax (Linum usitatissimum L.). TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2012; 125:685-94. [PMID: 22484296 PMCID: PMC3405236 DOI: 10.1007/s00122-012-1860-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2012] [Accepted: 03/21/2012] [Indexed: 05/09/2023]
Abstract
Flax is an important oilseed crop in North America and is mostly grown as a fibre crop in Europe. As a self-pollinated diploid with a small estimated genome size of ~370 Mb, flax is well suited for fast progress in genomics. In the last few years, important genetic resources have been developed for this crop. Here, we describe the assessment and comparative analyses of 1,506 putative simple sequence repeats (SSRs) of which, 1,164 were derived from BAC-end sequences (BESs) and 342 from expressed sequence tags (ESTs). The SSRs were assessed on a panel of 16 flax accessions with 673 (58 %) and 145 (42 %) primer pairs being polymorphic in the BESs and ESTs, respectively. With 818 novel polymorphic SSR primer pairs reported in this study, the repertoire of available SSRs in flax has more than doubled from the combined total of 508 of all previous reports. Among nucleotide motifs, trinucleotides were the most abundant irrespective of the class, but dinucleotides were the most polymorphic. SSR length was also positively correlated with polymorphism. Two dinucleotide (AT/TA and AG/GA) and two trinucleotide (AAT/ATA/TAA and GAA/AGA/AAG) motifs and their iterations, different from those reported in many other crops, accounted for more than half of all the SSRs and were also more polymorphic (63.4 %) than the rest of the markers (42.7 %). This improved resource promises to be useful in genetic, quantitative trait loci (QTL) and association mapping as well as for anchoring the physical/genetic map with the whole genome shotgun reference sequence of flax.
Collapse
Affiliation(s)
- Sylvie Cloutier
- Cereal Research Centre, Agriculture and Agri-Food Canada, 195 Dafoe Road, Winnipeg, MB, R3T 2M9, Canada.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Li H, Liu J, Wu K, Chen Y. Insight into role of selection in the evolution of polyglutamine tracts in humans. PLoS One 2012; 7:e41167. [PMID: 22848438 PMCID: PMC3405088 DOI: 10.1371/journal.pone.0041167] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2012] [Accepted: 06/18/2012] [Indexed: 11/21/2022] Open
Abstract
Glutamine tandem repeats are common in eukaryotic proteins. Although some studies have proposed that replication slippage plays an important role in shaping these repeats, the role of natural selection in glutamine tandem repeat evolution is somewhat unclear. In this study, we identified all of the glutamine tandem repeats containing four or more glutamines in human proteins and then estimated the nonsynonymous (dN) and synonymous (dS) substitution rates for the regions flanking the glutamine tandem repeats and the proteins containing them. The results indicated that most of the proteins containing polyglutamine (polyQ) tracts of four or more glutamines have undergone purifying selection, and that the purifying selection for the regions flanking the repeats is weaker. Additionally, we observed that the conserved repeats were under stronger selection constraints than the nonconserved repeats. Interestingly, we found that there was a higher level of purifying selection for the regions flanking the polyQ tracts encoded by pure CAG codons compared with those encoded by mixed codons. Based on our findings, we propose that selection has played a more important role than was previously speculated in constraining the expansion of polyQ tracts encoded by pure codons.
Collapse
Affiliation(s)
- Hongwei Li
- College of Veterinary Medicine, China Agricultural University, Beijing, China.
| | | | | | | |
Collapse
|
14
|
Sawaya SM, Lennon D, Buschiazzo E, Gemmell N, Minin VN. Measuring microsatellite conservation in mammalian evolution with a phylogenetic birth-death model. Genome Biol Evol 2012; 4:636-47. [PMID: 22593552 PMCID: PMC3516246 DOI: 10.1093/gbe/evs050] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Microsatellites make up ∼3% of the human genome, and there is increasing evidence that some microsatellites can have important functions and can be conserved by selection. To investigate this conservation, we performed a genome-wide analysis of human microsatellites and measured their conservation using a binary character birth--death model on a mammalian phylogeny. Using a maximum likelihood method to estimate birth and death rates for different types of microsatellites, we show that the rates at which microsatellites are gained and lost in mammals depend on their sequence composition, length, and position in the genome. Additionally, we use a mixture model to account for unequal death rates among microsatellites across the human genome. We use this model to assign a probability-based conservation score to each microsatellite. We found that microsatellites near the transcription start sites of genes are often highly conserved, and that distance from a microsatellite to the nearest transcription start site is a good predictor of the microsatellite conservation score. An analysis of gene ontology terms for genes that contain microsatellites near their transcription start site reveals that regulatory genes involved in growth and development are highly enriched with conserved microsatellites.
Collapse
Affiliation(s)
- Sterling M Sawaya
- Centre for Reproduction and Genomics, Department of Anatomy and Structural Biology, University of Otago, Dunedin, New Zealand
| | | | | | | | | |
Collapse
|
15
|
Affiliation(s)
- Julien Jorda
- Centre de Recherches de Biochimie Macromoléculaire UMR 5237, CNRS; University of Montpellier; 1 and 2 Montpellier France
- UCLA-DOE Institute for Genomics and Proteomics; Los Angeles CA USA
| | - Thierry Baudrand
- Centre de Recherches de Biochimie Macromoléculaire UMR 5237, CNRS; University of Montpellier; 1 and 2 Montpellier France
| | - Andrey V. Kajava
- Centre de Recherches de Biochimie Macromoléculaire UMR 5237, CNRS; University of Montpellier; 1 and 2 Montpellier France
| |
Collapse
|
16
|
Ramazzotti M, Monsellier E, Kamoun C, Degl'Innocenti D, Melki R. Polyglutamine repeats are associated to specific sequence biases that are conserved among eukaryotes. PLoS One 2012; 7:e30824. [PMID: 22312432 PMCID: PMC3270027 DOI: 10.1371/journal.pone.0030824] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2011] [Accepted: 12/23/2011] [Indexed: 12/20/2022] Open
Abstract
Nine human neurodegenerative diseases, including Huntington's disease and several spinocerebellar ataxia, are associated to the aggregation of proteins comprising an extended tract of consecutive glutamine residues (polyQs) once it exceeds a certain length threshold. This event is believed to be the consequence of the expansion of polyCAG codons during the replication process. This is in apparent contradiction with the fact that many polyQs-containing proteins remain soluble and are encoded by invariant genes in a number of eukaryotes. The latter suggests that polyQs expansion and/or aggregation might be counter-selected through a genetic and/or protein context. To identify this context, we designed a software that scrutinize entire proteomes in search for imperfect polyQs. The nature of residues flanking the polyQs and that of residues other than Gln within polyQs (insertions) were assessed. We discovered strong amino acid residue biases robustly associated to polyQs in the 15 eukaryotic proteomes we examined, with an over-representation of Pro, Leu and His and an under-representation of Asp, Cys and Gly amino acid residues. These biases are conserved amongst unrelated proteins and are independent of specific functional classes. Our findings suggest that specific residues have been co-selected with polyQs during evolution. We discuss the possible selective pressures responsible of the observed biases.
Collapse
Affiliation(s)
- Matteo Ramazzotti
- Dipartimento di Scienze Biochimiche, Università degli Studi di Firenze, Florence, Italy
- * E-mail: (MR); (EM)
| | - Elodie Monsellier
- Laboratoire d'Enzymologie et de Biochimie Structurales, UPR 3082 CNRS, Gif sur Yvette, France
- * E-mail: (MR); (EM)
| | - Choumouss Kamoun
- Laboratoire d'Enzymologie et de Biochimie Structurales, UPR 3082 CNRS, Gif sur Yvette, France
| | | | - Ronald Melki
- Laboratoire d'Enzymologie et de Biochimie Structurales, UPR 3082 CNRS, Gif sur Yvette, France
| |
Collapse
|
17
|
|
18
|
Faux N. Single amino acid and trinucleotide repeats: function and evolution. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2012; 769:26-40. [PMID: 23560303 DOI: 10.1007/978-1-4614-5434-2_3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The most well known effect of single amino acid repeat expansion, beyond a certain threshold, is the development of a specific disease, depending on the protein in which the expansion has occurred. For example, the expansion of the glutamine repeat in huntingtin leads to the debilitating neurodegenerative disease, Huntington's disease. Similarly, there are a range of other disorders caused by trinucleotide repeat expansions encoding polyglutamine or polyalanine tracts. The age of onset of the polyglutamine-induced neurodegenerative diseases is usually negatively correlated with the length of expanded CAG/glutamine repeat. However, recent studies have given evidence that single amino acid repeats may also play critical roles in normal protein function and that changes in the length of single amino acid repeats is likely to play a beneficial role in evolution. This chapter will look at the prevalence, function and possible role single amino acid repeats have in evolution and other biological processes.
Collapse
Affiliation(s)
- Noel Faux
- Mental Health Research Institute, The University of Melbourne, Parkville, Victoria, Australia.
| |
Collapse
|
19
|
Zhou Y, Liu J, Han L, Li ZG, Zhang Z. Comprehensive analysis of tandem amino acid repeats from ten angiosperm genomes. BMC Genomics 2011; 12:632. [PMID: 22195734 PMCID: PMC3283746 DOI: 10.1186/1471-2164-12-632] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2011] [Accepted: 12/23/2011] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The presence of tandem amino acid repeats (AARs) is one of the signatures of eukaryotic proteins. AARs were thought to be frequently involved in bio-molecular interactions. Comprehensive studies that primarily focused on metazoan AARs have suggested that AARs are evolving rapidly and are highly variable among species. However, there is still controversy over causal factors of this inter-species variation. In this work, we attempted to investigate this topic mainly by comparing AARs in orthologous proteins from ten angiosperm genomes. RESULTS Angiosperm AAR content is positively correlated with the GC content of the protein coding sequence. However, based on observations from fungal AARs and insect AARs, we argue that the applicability of this kind of correlation is limited by AAR residue composition and species' life history traits. Angiosperm AARs also tend to be fast evolving and structurally disordered, supporting the results of comprehensive analyses of metazoans. The functions of conserved long AARs are summarized. Finally, we propose that the rapid mRNA decay rate, alternative splicing and tissue specificity are regulatory processes that are associated with angiosperm proteins harboring AARs. CONCLUSIONS Our investigation suggests that GC content is a predictor of AAR content in the protein coding sequence under certain conditions. Although angiosperm AARs lack conservation and 3D structure, a fraction of the proteins that contain AARs may be functionally important and are under extensive regulation in plant cells.
Collapse
Affiliation(s)
- Yuan Zhou
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Jing Liu
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Lei Han
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Zhi-Gang Li
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|
20
|
Luo H, Lin K, David A, Nijveen H, Leunissen JAM. ProRepeat: an integrated repository for studying amino acid tandem repeats in proteins. Nucleic Acids Res 2011; 40:D394-9. [PMID: 22102581 PMCID: PMC3245022 DOI: 10.1093/nar/gkr1019] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
ProRepeat (http://prorepeat.bioinformatics.nl/) is an integrated curated repository and analysis platform for in-depth research on the biological characteristics of amino acid tandem repeats. ProRepeat collects repeats from all proteins included in the UniProt knowledgebase, together with 85 completely sequenced eukaryotic proteomes contained within the RefSeq collection. It contains non-redundant perfect tandem repeats, approximate tandem repeats and simple, low-complexity sequences, covering the majority of the amino acid tandem repeat patterns found in proteins. The ProRepeat web interface allows querying the repeat database using repeat characteristics like repeat unit and length, number of repetitions of the repeat unit and position of the repeat in the protein. Users can also search for repeats by the characteristics of repeat containing proteins, such as entry ID, protein description, sequence length, gene name and taxon. ProRepeat offers powerful analysis tools for finding biological interesting properties of repeats, such as the strong position bias of leucine repeats in the N-terminus of eukaryotic protein sequences, the differences of repeat abundance among proteomes, the functional classification of repeat containing proteins and GC content constrains of repeats’ corresponding codons.
Collapse
Affiliation(s)
- Hong Luo
- Laboratory of Bioinformatics, Wageningen University and Research Centre, PO Box 569, 6700 AN Wageningen, Netherlands
| | | | | | | | | |
Collapse
|
21
|
Haerty W, Golding GB. Increased polymorphism near low-complexity sequences across the genomes of Plasmodium falciparum isolates. Genome Biol Evol 2011; 3:539-50. [PMID: 21602572 PMCID: PMC3140889 DOI: 10.1093/gbe/evr045] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Low-complexity regions (LCRs) within proteins sequences are often considered to evolve neutrally even though recent studies reported evidence for selection acting on some of them. Because of their widespread distribution among eukaryotes genomes and the potential deleterious effect of expansion/contraction of some of them in humans, low-complexity sequences are of major interest and numerous studies have attempted to describe their dynamic between genomes as well as the factors correlated to their variation and to assess their selective value. However, due to the scarcity of individual genomes within a species, most of the analyses so far have been performed at the species level with the implicit assumption that the variation both in composition and size within species is too small relative to the between-species divergence to affect the conclusions of the analysis. Here we used the available genomes of 14 Plasmodium falciparum isolates to assess the relationship between low-complexity sequence variation and factors such as nucleotide polymorphism across strains, sequence composition, and protein expression. We report that more than half of the 7,711 low-complexity sequences found within aligned coding sequences are variable in size among strains. Across strains, we observed an increasing density of polymorphic sites toward the LCR boundaries. This observation strongly suggests the joint effects of lowered selective constraints on low-complexity sequences and a mutagenic effect of these simple sequences.
Collapse
Affiliation(s)
- Wilfried Haerty
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | | |
Collapse
|
22
|
Jorda J, Kajava AV. Protein homorepeats sequences, structures, evolution, and functions. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2011; 79:59-88. [PMID: 20621281 DOI: 10.1016/s1876-1623(10)79002-7] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The vast majority of protein sequences are aperiodic; they do not have any strong bias in the amino acid composition, and they use a subtle mixture of all or most of the 20 amino acid residues to code a great number of various structures and functions. In this context, homorepeats, runs of a single amino acid residue, represent unusual, eye-catching motifs in proteins. Despite the sequence simplicity and relatively small size, the homorepeat runs have a strong potential for molecular interactions due to the excessively high local concentration of a certain physico-chemical property. Appearance of such runs within proteins may give them new structural and functional features. An increasing number of studies demonstrate the abundance of these motifs in proteins, their important roles in biological processes, and their link to a number of hereditary and age-related diseases. In this chapter, we summarize data on the distribution of homorepeats in proteomes and on their structural properties, evolution, and functions.
Collapse
Affiliation(s)
- Julien Jorda
- Centre de Recherches de Biochimie Macromoléculaire UMR 5237, CNRS, University of Montpellier 1 and 2, Montpellier, France
| | | |
Collapse
|
23
|
Haerty W, Golding GB. Low-complexity sequences and single amino acid repeats: not just "junk" peptide sequences. Genome 2011; 53:753-62. [PMID: 20962881 DOI: 10.1139/g10-063] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
For decades proteins were thought to interact in a "lock and key" system, which led to the definition of a paradigm linking stable three-dimensional structure to biological function. As a consequence, any non-structured peptide was considered to be nonfunctional and to evolve neutrally. Surprisingly, the most commonly shared peptides between eukaryotic proteomes are low-complexity sequences that in most conditions do not present a stable three-dimensional structure. However, because these sequences evolve rapidly and because the size variation of a few of them can have deleterious effects, low-complexity sequences have been suggested to be the target of selection. Here we review evidence that supports the idea that these simple sequences should not be considered just "junk" peptides and that selection drives the evolution of many of them.
Collapse
Affiliation(s)
- Wilfried Haerty
- Biology Department, McMaster University, Hamilton, ON, Canada
| | | |
Collapse
|
24
|
Role of Everlasting Triplet Expansions in Protein Evolution. J Mol Evol 2010; 72:232-9. [DOI: 10.1007/s00239-010-9425-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2010] [Accepted: 12/01/2010] [Indexed: 02/05/2023]
|
25
|
Birge LM, Pitts ML, Richard BH, Wilkinson GS. Length polymorphism and head shape association among genes with polyglutamine repeats in the stalk-eyed fly, Teleopsis dalmanni. BMC Evol Biol 2010; 10:227. [PMID: 20663190 PMCID: PMC3055267 DOI: 10.1186/1471-2148-10-227] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2010] [Accepted: 07/27/2010] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Polymorphisms of single amino acid repeats (SARPs) are a potential source of genetic variation for rapidly evolving morphological traits. Here, we characterize variation in and test for an association between SARPs and head shape, a trait under strong sexual selection, in the stalk-eyed fly, Teleopsis dalmanni. Using an annotated expressed sequence tag database developed from eye-antennal imaginal disc tissues in T. dalmanni we identified 98 genes containing nine or more consecutive copies of a single amino acid. We then quantify variation in length and allelic diversity for 32 codon and 15 noncodon repeat regions in a large outbred population. We also assessed the frequency with which amino acid repeats are either gained or lost by identifying sequence similarities between T. dalmanni SARP loci and their orthologs in Drosophila melanogaster. Finally, to identify SARP containing genes that may influence head development we conducted a two-generation association study after assortatively mating for extreme relative eyespan. RESULTS We found that glutamine repeats occur more often than expected by amino acid abundance among 3,400 head development genes in T. dalmanni and D. melanogaster. Furthermore, glutamine repeats occur disproportionately in transcription factors. Loci with glutamine repeats exhibit heterozygosities and allelic diversities that do not differ from noncoding dinucleotide microsatellites, including greater variation among X-linked than autosomal regions. In the majority of cases, repeat tracts did not overlap between T. dalmanni and D. melanogaster indicating that large glutamine repeats are gained or lost frequently during Dipteran evolution. Analysis of covariance reveals a significant effect of parental genotype on mean progeny eyespan, with body length as a covariate, at six SARP loci [CG33692, ptip, band4.1 inhibitor LRP interactor, corto, 3531953:1, and ecdysone-induced protein 75B (Eip75B)]. Mixed model analysis of covariance using the eyespan of siblings segregating for repeat length variation confirms that significant genotype-phenotype associations exist for at least one sex at five of these loci and for one gene, CG33692, longer repeats were associated with longer relative eyespan in both sexes. CONCLUSION Among genes expressed during head development in stalk-eyed flies, long codon repeats typically contain glutamine, occur in transcription factors and exhibit high levels of heterozygosity. Furthermore, the presence of significant associations within families between repeat length and head shape indicates that six genes, or genes linked to them, contribute genetic variation to the development of this extremely sexually dimorphic trait.
Collapse
Affiliation(s)
- Leanna M Birge
- Department of Biology, University of Maryland, College Park, MD 20742 USA
- University College London, Research Department of Genetics, Evolution and Environment, Wolfson House, 4 Stephenson Way, London, NW1 2HE, UK
| | - Marie L Pitts
- Department of Biology, The College of William and Mary, Williamsburg, VA 23187 USA
| | - Baker H Richard
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY, 10024 USA
| | - Gerald S Wilkinson
- Department of Biology, University of Maryland, College Park, MD 20742 USA
| |
Collapse
|
26
|
Mularoni L, Ledda A, Toll-Riera M, Albà MM. Natural selection drives the accumulation of amino acid tandem repeats in human proteins. Genome Res 2010; 20:745-54. [PMID: 20335526 DOI: 10.1101/gr.101261.109] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Amino acid tandem repeats are found in a large number of eukaryotic proteins. They are often encoded by trinucleotide repeats and exhibit high intra- and interspecies size variability due to the high mutation rate associated with replication slippage. The extent to which natural selection is important in shaping amino acid repeat evolution is a matter of debate. On one hand, their high frequency may simply reflect their high probability of expansion by slippage, and they could essentially evolve in a neutral manner. On the other hand, there is experimental evidence that changes in repeat size can influence protein-protein interactions, transcriptional activity, or protein subcellular localization, indicating that repeats could be functionally relevant and thus shaped by selection. To gauge the relative contribution of neutral and selective forces in amino acid repeat evolution, we have performed a comparative analysis of amino acid repeat conservation in a large set of orthologous proteins from 12 vertebrate species. As a neutral model of repeat evolution we have used sequences with the same DNA triplet composition as the coding sequences--and thus expected to be subject to the same mutational forces--but located in syntenic noncoding genomic regions. The results strongly indicate that selection has played a more important role than previously suspected in amino acid tandem repeat evolution, by increasing the repeat retention rate and by modulating repeat size. The data obtained in this study have allowed us to identify a set of 92 repeats that are postulated to play important functional roles due to their strong selective signature, including five cases with direct experimental evidence.
Collapse
Affiliation(s)
- Loris Mularoni
- Biomedical Informatics Research Programme (GRIB), Fundació Institut Municipal d'Investigació Mèdica, Barcelona 08003, Spain
| | | | | | | |
Collapse
|
27
|
Simon M, Hancock JM. Tandem and cryptic amino acid repeats accumulate in disordered regions of proteins. Genome Biol 2009; 10:R59. [PMID: 19486509 PMCID: PMC2718493 DOI: 10.1186/gb-2009-10-6-r59] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2009] [Accepted: 06/01/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Amino acid repeats (AARs) are common features of protein sequences. They often evolve rapidly and are involved in a number of human diseases. They also show significant associations with particular Gene Ontology (GO) functional categories, particularly transcription, suggesting they play some role in protein function. It has been suggested recently that AARs play a significant role in the evolution of intrinsically unstructured regions (IURs) of proteins. We investigate the relationship between AAR frequency and evolution and their localization within proteins based on a set of 5,815 orthologous proteins from four mammalian (human, chimpanzee, mouse and rat) and a bird (chicken) genome. We consider two classes of AAR (tandem repeats and cryptic repeats: regions of proteins containing overrepresentations of short amino acid repeats). RESULTS Mammals show very similar repeat frequencies but chicken shows lower frequencies of many of the cryptic repeats common in mammals. Regions flanking tandem AARs evolve more rapidly than the rest of the protein containing the repeat and this phenomenon is more pronounced for non-conserved repeats than for conserved ones. GO associations are similar to those previously described for the mammals, but chicken cryptic repeats show fewer significant associations. Comparing the overlaps of AARs with IURs and protein domains showed that up to 96% of some AAR types are associated preferentially with IURs. However, no more than 15% of IURs contained an AAR. CONCLUSIONS Their location within IURs explains many of the evolutionary properties of AARs. Further study is needed on the types of IURs containing AARs.
Collapse
Affiliation(s)
- Michelle Simon
- Bioinformatics Group, MRC Harwell, Mammalian Genetics Unit, Harwell Science and Innovation Campus, Harwell, Oxfordshire, OX11 0RD, UK
| | - John M Hancock
- Bioinformatics Group, MRC Harwell, Mammalian Genetics Unit, Harwell Science and Innovation Campus, Harwell, Oxfordshire, OX11 0RD, UK
| |
Collapse
|
28
|
Richard GF, Kerrest A, Dujon B. Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol Mol Biol Rev 2008; 72:686-727. [PMID: 19052325 PMCID: PMC2593564 DOI: 10.1128/mmbr.00011-08] [Citation(s) in RCA: 323] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Repeated elements can be widely abundant in eukaryotic genomes, composing more than 50% of the human genome, for example. It is possible to classify repeated sequences into two large families, "tandem repeats" and "dispersed repeats." Each of these two families can be itself divided into subfamilies. Dispersed repeats contain transposons, tRNA genes, and gene paralogues, whereas tandem repeats contain gene tandems, ribosomal DNA repeat arrays, and satellite DNA, itself subdivided into satellites, minisatellites, and microsatellites. Remarkably, the molecular mechanisms that create and propagate dispersed and tandem repeats are specific to each class and usually do not overlap. In the present review, we have chosen in the first section to describe the nature and distribution of dispersed and tandem repeats in eukaryotic genomes in the light of complete (or nearly complete) available genome sequences. In the second part, we focus on the molecular mechanisms responsible for the fast evolution of two specific classes of tandem repeats: minisatellites and microsatellites. Given that a growing number of human neurological disorders involve the expansion of a particular class of microsatellites, called trinucleotide repeats, a large part of the recent experimental work on microsatellites has focused on these particular repeats, and thus we also review the current knowledge in this area. Finally, we propose a unified definition for mini- and microsatellites that takes into account their biological properties and try to point out new directions that should be explored in a near future on our road to understanding the genetics of repeated sequences.
Collapse
Affiliation(s)
- Guy-Franck Richard
- Institut Pasteur, Unité de Génétique Moléculaire des Levures, CNRS, URA2171, Université Pierre et Marie Curie, UFR927, 25 rue du Dr. Roux, F-75015, Paris, France.
| | | | | |
Collapse
|
29
|
Gorlov IP, Gorlova OY, Amos CI. Relative effects of mutability and selection on single nucleotide polymorphisms in transcribed regions of the human genome. BMC Genomics 2008; 9:292. [PMID: 18559102 PMCID: PMC2442617 DOI: 10.1186/1471-2164-9-292] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2008] [Accepted: 06/17/2008] [Indexed: 11/10/2022] Open
Abstract
MOTIVATION Single nucleotide polymorphisms (SNPs) are the most common type of genetic variation in humans. However, the factors that affect SNP density are poorly understood. The goal of this study was to estimate the relative effects of mutability and selection on SNP density in transcribed regions of human genes. It is important for prediction of the regions that harbor functional polymorphisms. RESULTS We used frequency-validated SNPs resulting from single-nucleotide substitutions. SNPs were subdivided into five functional categories: (i) 5' untranslated region (UTR) SNPs, (ii) 3' UTR SNPs, (iii) synonymous SNPs, (iv) SNPs producing conservative missense mutations, and (v) SNPs producing radical missense mutations. Each of these categories was further subdivided into nine mutational categories on the basis of the single-nucleotide substitution type. Thus, 45 functional/mutational categories were analyzed. The relative mutation rate in each mutational category was estimated on the basis of published data. The proportion of segregating sites (PSSs) for each functional/mutational category was estimated by dividing the observed number of SNPs by the number of potential sites in the genome for a given functional/mutational category. By analyzing each functional group separately, we found significant positive correlations between PSSs and relative mutation rates (Spearman's correlation coefficient, at least r = 0.96, df = 9, P < 0.001). We adjusted the PSSs for the mutation rate and found that the functional category had a significant effect on SNP density (F = 5.9, df = 4, P = 0.001), suggesting that selection affects SNP density in transcribed regions of the genome. We used analyses of variance and covariance to estimate the relative effects of selection (functional category) and mutability (relative mutation rate) on the PSSs and found that approximately 87% of variation in PSS was due to variation in the mutation rate and approximately 13% was due to selection, suggesting that the probability that a site located in a transcribed region of a gene is polymorphic mostly depends on the mutability of the site.
Collapse
Affiliation(s)
- Ivan P Gorlov
- Department of Epidemiology, The University of Texas M D Anderson Cancer Center, Houston, Texas 77030, USA.
| | | | | |
Collapse
|
30
|
Ruden DM, Jamison DC, Zeeberg BR, Garfinkel MD, Weinstein JN, Rasouli P, Lu X. The EDGE hypothesis: epigenetically directed genetic errors in repeat-containing proteins (RCPs) involved in evolution, neuroendocrine signaling, and cancer. Front Neuroendocrinol 2008; 29:428-44. [PMID: 18295320 PMCID: PMC2716011 DOI: 10.1016/j.yfrne.2007.12.004] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2007] [Revised: 10/31/2007] [Accepted: 12/18/2007] [Indexed: 11/22/2022]
Abstract
Trans-generational epigenetic phenomena, such as contamination with endocrine-disrupting chemicals (EDCs) that decrease fertility and the global methylation status of DNA in the offspring, are of great concern because they may affect health, particularly the health of children. However, of even greater concern is the possibility that trans-generational changes in the methylation status of the DNA might lead to permanent changes in the DNA sequence itself. By contaminating the environment with EDCs, mankind might be permanently affecting the health of future generations. In this section, we present evidence from our laboratory and others that trans-generational epigenetic changes in DNA might lead to mutations directed to genes encoding amino acid repeat-containing proteins (RCPs) that are important for adaptive evolution or cancer progression. Such epigenetic changes can be induced "naturally" by hormones or "unnaturally" by EDCs or environmental stress. To illustrate the phenomenon, we present new bioinformatic evidence that the only RCP ontological categories conserved from Drosophila to humans are "regulation of splicing," "regulation of transcription," and "regulation of synaptogenesis," which are classes of genes likely to be important for evolutionary processes. Based on that and other evidence, we propose a model for evolution that we call the EDGE (Epigenetically Directed Genetic Errors) hypothesis for the mechanism by which mutations are targeted at epigenetically modified "contingency genes" encoding RCPs. In the model, "epigenetic assimilation" of metastable epialleles of RCPs over many generations can lead to mutations directed to those genes, thereby permanently stabilizing the adaptive phenotype.
Collapse
Affiliation(s)
- Douglas M. Ruden
- Wayne State University, Institute for Environmental Health Sciences, 2727 2 Ave, Room 4000, Detroit, MI 48201
| | - D. Curtis Jamison
- George Mason University, Department of Bioinformatics and Computational Biology, Manassas, VA, 20110; current address Illumina, Inc., San Diego, CA, 92121,
| | - Barry R. Zeeberg
- Genomics & Bioinformatics Group, Laboratory of Molecular Pharmacology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892
| | - Mark D. Garfinkel
- University of Alabama at Birmingham, Department of Environmental Health Sciences, Birmingham, AL 35294-0022
| | - John N. Weinstein
- Genomics & Bioinformatics Group, Laboratory of Molecular Pharmacology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892
| | - Parsa Rasouli
- Wayne State University, Institute for Environmental Health Sciences, 2727 2 Ave, Room 4000, Detroit, MI 48201
| | | |
Collapse
|