1
|
Zafrir Z, Zur H, Tuller T. Selection for reduced translation costs at the intronic 5' end in fungi. DNA Res 2016; 23:377-94. [PMID: 27260512 PMCID: PMC4991832 DOI: 10.1093/dnares/dsw019] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 04/26/2016] [Indexed: 12/12/2022] Open
Abstract
It is generally believed that introns are not translated; therefore, the potential intronic features that may be related to the translation step (occurring after splicing) have yet to be thoroughly studied. Here, focusing on four fungi, we performed for the first time a comprehensive study aimed at characterizing how translation efficiency is encoded in introns and affects their evolution. By analysing their intronome we provide evidence of selection for STOP codons close to the intronic 5′ end, and show that the beginning of introns are selected for significantly high translation, presumably to reduce translation and metabolic costs in cases of non-spliced introns. Ribosomal profiling data analysis in Saccharomyces cerevisiae supports the conjecture that in this organism intron retention frequently occurs, introns are partially translated, and their translation efficiency affects organismal fitness. We show that the reported results are more significant in highly translated and highly spliced genes, but are not associated only with genes with a specific function. We also discuss the potential relation of the reported signals to efficient nonsense-mediated decay due to splicing errors. These new discoveries are supported by population-genetics considerations. In addition, they are contributory steps towards a broader understanding of intron evolution and the effect of silent mutations on gene expression and organismal fitness.
Collapse
Affiliation(s)
- Zohar Zafrir
- Department of Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel
| | - Hadas Zur
- Department of Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel
| | - Tamir Tuller
- Department of Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel The Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
2
|
Zhu A, Guo W, Jain K, Mower JP. Unprecedented Heterogeneity in the Synonymous Substitution Rate within a Plant Genome. Mol Biol Evol 2014; 31:1228-36. [DOI: 10.1093/molbev/msu079] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
3
|
Transcriptional enhancers in protein-coding exons of vertebrate developmental genes. PLoS One 2012; 7:e35202. [PMID: 22567096 PMCID: PMC3342275 DOI: 10.1371/journal.pone.0035202] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Accepted: 03/10/2012] [Indexed: 11/19/2022] Open
Abstract
Many conserved noncoding sequences function as transcriptional enhancers that regulate gene expression. Here, we report that protein-coding DNA also frequently contains enhancers functioning at the transcriptional level. We tested the enhancer activity of 31 protein-coding exons, which we chose based on strong sequence conservation between zebrafish and human, and occurrence in developmental genes, using a Tol2 transposable GFP reporter assay in zebrafish. For each exon we measured GFP expression in hundreds of embryos in 10 anatomies via a novel system that implements the voice-recognition capabilities of a cellular phone. We find that 24/31 (77%) exons drive GFP expression compared to a minimal promoter control, and 14/24 are anatomy-specific (expression in four anatomies or less). GFP expression driven by these coding enhancers frequently overlaps the anatomies where the host gene is expressed (60%), suggesting self-regulation. Highly conserved coding sequences and highly conserved noncoding sequences do not significantly differ in enhancer activity (coding: 24/31 vs. noncoding: 105/147) or tissue-specificity (coding: 14/24 vs. noncoding: 50/105). Furthermore, coding and noncoding enhancers display similar levels of the enhancer-related histone modification H3K4me1 (coding: 9/24 vs noncoding: 34/81). Meanwhile, coding enhancers are over three times as likely to contain an H3K4me1 mark as other exons of the host gene. Our work suggests that developmental transcriptional enhancers do not discriminate between coding and noncoding DNA and reveals widespread dual functions in protein-coding DNA.
Collapse
|
4
|
Lin MF, Kheradpour P, Washietl S, Parker BJ, Pedersen JS, Kellis M. Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes. Genome Res 2011; 21:1916-28. [PMID: 21994248 DOI: 10.1101/gr.108753.110] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The degeneracy of the genetic code allows protein-coding DNA and RNA sequences to simultaneously encode additional, overlapping functional elements. A sequence in which both protein-coding and additional overlapping functions have evolved under purifying selection should show increased evolutionary conservation compared to typical protein-coding genes--especially at synonymous sites. In this study, we use genome alignments of 29 placental mammals to systematically locate short regions within human ORFs that show conspicuously low estimated rates of synonymous substitution across these species. The 29-species alignment provides statistical power to locate more than 10,000 such regions with resolution down to nine-codon windows, which are found within more than a quarter of all human protein-coding genes and contain ∼2% of their synonymous sites. We collect numerous lines of evidence that the observed synonymous constraint in these regions reflects selection on overlapping functional elements including splicing regulatory elements, dual-coding genes, RNA secondary structures, microRNA target sites, and developmental enhancers. Our results show that overlapping functional elements are common in mammalian genes, despite the vast genomic landscape.
Collapse
Affiliation(s)
- Michael F Lin
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | | | | | | | | | | |
Collapse
|
5
|
Aoi MC, Rourke BC. Interspecific and intragenic differences in codon usage bias among vertebrate myosin heavy-chain genes. J Mol Evol 2011; 73:74-93. [PMID: 21915654 DOI: 10.1007/s00239-011-9457-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2010] [Accepted: 08/19/2011] [Indexed: 01/13/2023]
Abstract
Synonymous codon usage bias is a broadly observed phenomenon in bacteria, plants, and invertebrates and may result from selection. However, the role of selective pressures in shaping codon bias is still controversial in vertebrates, particularly for mammals. The myosin heavy-chain (MyHC) gene family comprises multiple isoforms of the major force-producing contractile protein in cardiac and skeletal muscles. Slow and fast genes are tandemly arrayed on separate chromosomes, and have distinct patterns of functionality and expression in muscle. We analyze both full-length MyHC genes (~5400 bp) and a larger collection of partial sequences at the 3' end (~500 bp). The MyHC isoforms are an interesting system in which to study codon usage bias because of their length, expression, and critical importance to organismal mobility. Codon bias and GC content differs among MyHC genes with regards to functional type, isoform, and position within the gene. Codon bias even varies by isoform within a species. We find evidence in favor of both chromosomal influences on nucleotide composition and selection against nonsense errors (SANE) acting on codon usage in MyHC genes. Intragenic variation in codon bias and elongation rate is significant, with a strong trend for increasing codon bias and elongation rate towards the 3' end of the gene, although the trend is dependent upon the degeneracy class of the codons. Therefore, patterns of codon usage in MyHC genes are consistent with models supporting SANE as a major force shaping codon usage.
Collapse
Affiliation(s)
- Mikio C Aoi
- Department of Mathematics, North Carolina State University, Raleigh, NC 27695, USA
| | | |
Collapse
|
6
|
Abstract
Mutation rates vary significantly within the genome and across species. Recent studies revealed a long suspected replication-timing effect on mutation rate, but the mechanisms that regulate the increase in mutation rate as the genome is replicated remain unclear. Evidence is emerging, however, that DNA repair systems, in general, are less efficient in late replicating heterochromatic regions compared to early replicating euchromatic regions of the genome. At the same time, mutation rates in both vertebrates and invertebrates have been shown to vary with generation time (GT). GT is correlated with genome size, which suggests a possible nucleotypic effect on species-specific mutation rates. These and other observations all converge on a role for DNA replication checkpoints in modulating generation times and mutation rates during the DNA synthetic phase (S phase) of the cell cycle. The following will examine the potential role of the intra-S checkpoint in regulating cell cycle times (GT) and mutation rates in eukaryotes. This article was published online on August 5, 2011. An error was subsequently identified. This notice is included in the online and print versions to indicate that both have been corrected October 4, 2011.
Collapse
Affiliation(s)
- John Herrick
- Department of Physics, Simon Fraser University, 8888 University Drive, Burnaby, British Columbia, Canada.
| |
Collapse
|
7
|
Magee AM, Aspinall S, Rice DW, Cusack BP, Sémon M, Perry AS, Stefanović S, Milbourne D, Barth S, Palmer JD, Gray JC, Kavanagh TA, Wolfe KH. Localized hypermutation and associated gene losses in legume chloroplast genomes. Genome Res 2010; 20:1700-10. [PMID: 20978141 DOI: 10.1101/gr.111955.110] [Citation(s) in RCA: 180] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Point mutations result from errors made during DNA replication or repair, so they are usually expected to be homogeneous across all regions of a genome. However, we have found a region of chloroplast DNA in plants related to sweetpea (Lathyrus) whose local point mutation rate is at least 20 times higher than elsewhere in the same molecule. There are very few precedents for such heterogeneity in any genome, and we suspect that the hypermutable region may be subject to an unusual process such as repeated DNA breakage and repair. The region is 1.5 kb long and coincides with a gene, ycf4, whose rate of evolution has increased dramatically. The product of ycf4, a photosystem I assembly protein, is more divergent within the single genus Lathyrus than between cyanobacteria and other angiosperms. Moreover, ycf4 has been lost from the chloroplast genome in Lathyrus odoratus and separately in three other groups of legumes. Each of the four consecutive genes ycf4-psaI-accD-rps16 has been lost in at least one member of the legume "inverted repeat loss" clade, despite the rarity of chloroplast gene losses in angiosperms. We established that accD has relocated to the nucleus in Trifolium species, but were unable to find nuclear copies of ycf4 or psaI in Lathyrus. Our results suggest that, as well as accelerating sequence evolution, localized hypermutation has contributed to the phenomenon of gene loss or relocation to the nucleus.
Collapse
Affiliation(s)
- Alan M Magee
- Smurfit Institute of Genetics, Trinity College, Dublin, Ireland
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
An Evolutionary Reduction Principle for Mutation Rates at Multiple Loci. Bull Math Biol 2010; 73:1227-70. [DOI: 10.1007/s11538-010-9557-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2009] [Accepted: 06/04/2010] [Indexed: 01/07/2023]
|
9
|
Kural D, Ding Y, Wu J, Korpi AM, Chuang JH. COMIT: identification of noncoding motifs under selection in coding sequences. Genome Biol 2009; 10:R133. [PMID: 19930548 PMCID: PMC3091326 DOI: 10.1186/gb-2009-10-11-r133] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2009] [Revised: 09/18/2009] [Accepted: 11/20/2009] [Indexed: 11/16/2022] Open
Abstract
COMIT is presented; an algorithm for detecting functional non-coding motifs in coding regions, separating nucleotide and amino acid effects. Coding nucleotide sequences contain myriad functions independent of their encoded protein sequences. We present the COMIT algorithm to detect functional noncoding motifs in coding regions using sequence conservation, explicitly separating nucleotide from amino acid effects. COMIT concurs with diverse experimental datasets, including splicing enhancers, silencers, replication motifs, and microRNA targets, and predicts many novel functional motifs. Intriguingly, COMIT scores are well-correlated to scores uncalibrated for amino acids, suggesting that nucleotide motifs often override peptide-level constraints.
Collapse
Affiliation(s)
- Deniz Kural
- Department of Biology, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA.
| | | | | | | | | |
Collapse
|
10
|
Imamura H, Karro JE, Chuang JH. Weak preservation of local neutral substitution rates across mammalian genomes. BMC Evol Biol 2009; 9:89. [PMID: 19416516 PMCID: PMC2689173 DOI: 10.1186/1471-2148-9-89] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2008] [Accepted: 05/05/2009] [Indexed: 01/06/2023] Open
Abstract
Background The rate at which neutral (non-functional) bases undergo substitution is highly dependent on their location within a genome. However, it is not clear how fast these location-dependent rates change, or to what extent the substitution rate patterns are conserved between lineages. To address this question, which is critical not only for understanding the substitution process but also for evaluating phylogenetic footprinting algorithms, we examine ancestral repeats: a predominantly neutral dataset with a significantly higher genomic density than other datasets commonly used to study substitution rate variation. Using this repeat data, we measure the extent to which orthologous ancestral repeat sequences exhibit similar substitution patterns in separate mammalian lineages, allowing us to ascertain how well local substitution rates have been preserved across species. Results We calculated substitution rates for each ancestral repeat in each of three independent mammalian lineages (primate – from human/macaque alignments, rodent – from mouse/rat alignments, and laurasiatheria – from dog/cow alignments). We then measured the correlation of local substitution rates among these lineages. Overall we found the correlations between lineages to be statistically significant, but too weak to have much predictive power (r2 <5%). These correlations were found to be primarily driven by regional effects at the scale of several hundred kb or larger. A few repeat classes (e.g. 7SK, Charlie8, and MER121) also exhibited stronger conservation of rate patterns, likely due to the effect of repeat-specific purifying selection. These classes should be excluded when estimating local neutral substitution rates. Conclusion Although local neutral substitution rates have some correlations among mammalian species, these correlations have little predictive power on the scale of individual repeats. This indicates that local substitution rates have changed significantly among the lineages we have studied, and are likely to have changed even more for more diverged lineages. The correlations that do persist are too weak to be responsible for many of the highly conserved elements found by phylogenetic footprinting algorithms, leading us to conclude that such elements must be conserved due to selective forces.
Collapse
Affiliation(s)
- Hideo Imamura
- Boston College, Department of Biology, Chestnut Hill, MA 02467, USA.
| | | | | |
Collapse
|
11
|
Fu C, Xiong J, Miao W. Genome-wide identification and characterization of cytochrome P450 monooxygenase genes in the ciliate Tetrahymena thermophila. BMC Genomics 2009; 10:208. [PMID: 19409101 PMCID: PMC2691746 DOI: 10.1186/1471-2164-10-208] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2008] [Accepted: 05/01/2009] [Indexed: 12/24/2022] Open
Abstract
Background Cytochrome P450 monooxygenases play key roles in the metabolism of a wide variety of substrates and they are closely associated with endocellular physiological processes or detoxification metabolism under environmental exposure. To date, however, none has been systematically characterized in the phylum Ciliophora. T. thermophila possess many advantages as a eukaryotic model organism and it exhibits rapid and sensitive responses to xenobiotics, making it an ideal model system to study the evolutionary and functional diversity of the P450 monooxygenase gene family. Results A total of 44 putative functional cytochrome P450 genes were identified and could be classified into 13 families and 21 sub-families according to standard nomenclature. The characteristics of both the conserved intron-exon organization and scaffold localization of tandem repeats within each P450 family clade suggested that the enlargement of T. thermophila P450 families probably resulted from recent separate small duplication events. Gene expression patterns of all T. thermophila P450s during three important cell physiological stages (vegetative growth, starvation and conjugation) were analyzed based on EST and microarray data, and three main categories of expression patterns were postulated. Evolutionary analysis including codon usage preference, site-specific selection and gene-expression evolution patterns were investigated and the results indicated remarkable divergences among the T. thermophila P450 genes. Conclusion The characterization, expression and evolutionary analysis of T. thermophila P450 monooxygenase genes in the current study provides useful information for understanding the characteristics and diversities of the P450 genes in the Ciliophora, and provides the baseline for functional analyses of individual P450 isoforms in this model ciliate species.
Collapse
Affiliation(s)
- Chengjie Fu
- Key Laboratory of Aquatic Biodiversity and Conservation, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, PR China.
| | | | | |
Collapse
|
12
|
Cooper MB, Loose M, Brookfield JFY. The evolutionary influence of binding site organisation on gene regulatory networks. Biosystems 2009; 96:185-93. [PMID: 19428984 DOI: 10.1016/j.biosystems.2009.02.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2008] [Revised: 01/23/2009] [Accepted: 02/01/2009] [Indexed: 12/30/2022]
Abstract
Gene regulatory networks are shaped by selection for advantageous gene expression patterns. Can we use this fact to predict and explain the structure and properties of gene regulatory networks? Here we address this question with evolutionary simulations of small (two to four genes) transcriptional regulatory networks. Each modeled network is tested for the frequency with which it evolves to produce a bimodal spatial expression pattern of a target gene (the output), in response to a linear trigger gradient (the input). By including network features such as the organisation of binding sites that do not evolve in the model, we can compare the relative chances of evolutionary success between networks differing only in these features. Specifically, we show that networks with competitive binding sites (where binding of one transcription factor excludes another) are more likely to evolve bimodal patterns of gene repression than are networks with independent binding sites (where binding of multiple transcription factors can occur simultaneously). These predictions have implications for the likely structure of gene regulatory networks carrying out bimodal (including bistable) gene expression functions in vivo. The capacity to predict the evolution of structure-function relationships in gene regulatory networks is constrained by gaps in current understanding such as the unknown prior probabilities of the network features, and the quantitative nature of the molecular interactions involved in gene expression. Methods for the circumvention of these constraints, and the potential of the evolutionary modeling approach, are discussed.
Collapse
Affiliation(s)
- Max B Cooper
- Institute of Genetics, School of Biology, University of Nottingham, Queens Medical Centre, Nottingham, NG7 2UH, United Kingdom.
| | | | | |
Collapse
|
13
|
Multilocus patterns of nucleotide diversity, population structure and linkage disequilibrium in Boechera stricta, a wild relative of Arabidopsis. Genetics 2008; 181:1021-33. [PMID: 19104077 DOI: 10.1534/genetics.108.095364] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Information about polymorphism, population structure, and linkage disequilibrium (LD) is crucial for association studies of complex trait variation. However, most genomewide studies have focused on model systems, with very few analyses of undisturbed natural populations. Here, we sequenced 86 mapped nuclear loci for a sample of 46 genotypes of Boechera stricta and two individuals of B. holboellii, both wild relatives of Arabidopsis. Isolation by distance was significant across the species range of B. stricta, and three geographic groups were identified by structure analysis, principal coordinates analysis, and distance-based phylogeny analyses. The allele frequency spectrum indicated a genomewide deviation from an equilibrium neutral model, with silent nucleotide diversity averaging 0.004. LD decayed rapidly, declining to background levels in approximately 10 kb or less. For tightly linked SNPs separated by <1 kb, LD was dependent on the reference population. LD was lower in the specieswide sample than within populations, suggesting that low levels of LD found in inbreeding species such as B. stricta, Arabidopsis thaliana, and barley may result from broad geographic sampling that spans heterogeneous genetic groups. Finally, analyses also showed that inbreeding B. stricta and A. thaliana have approximately 45% higher recombination per kilobase than outcrossing A. lyrata.
Collapse
|