101
|
Jjingo D, Huda A, Gundapuneni M, Mariño-Ramírez L, Jordan IK. Effect of the transposable element environment of human genes on gene length and expression. Genome Biol Evol 2011; 3:259-71. [PMID: 21362639 PMCID: PMC3070429 DOI: 10.1093/gbe/evr015] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Independent lines of investigation have documented effects of both transposable elements (TEs) and gene length (GL) on gene expression. However, TE gene fractions are highly correlated with GL, suggesting that they cannot be considered independently. We evaluated the TE environment of human genes and GL jointly in an attempt to tease apart their relative effects. TE gene fractions and GL were compared with the overall level of gene expression and the breadth of expression across tissues. GL is strongly correlated with overall expression level but weakly correlated with the breadth of expression, confirming the selection hypothesis that attributes the compactness of highly expressed genes to selection for economy of transcription. However, TE gene fractions overall, and for the L1 family in particular, show stronger anticorrelations with expression level than GL, indicating that GL may not be the most important target of selection for transcriptional economy. These results suggest a specific mechanism, removal of TEs, by which highly expressed genes are selectively tuned for efficiency. MIR elements are the only family of TEs with gene fractions that show a positive correlation with tissue-specific expression, suggesting that they may provide regulatory sequences that help to control human gene expression. Consistent with this notion, MIR fractions are relatively enriched close to transcription start sites and associated with coexpression in specific sets of related tissues. Our results confirm the overall relevance of the TE environment to gene expression and point to distinct mechanisms by which different TE families may contribute to gene regulation.
Collapse
Affiliation(s)
- Daudi Jjingo
- School of Biology, Georgia Institute of Technology, GA, USA
| | | | | | | | | |
Collapse
|
102
|
Lee Y, Zhou T, Tartaglia GG, Vendruscolo M, Wilke CO. Translationally optimal codons associate with aggregation-prone sites in proteins. Proteomics 2011; 10:4163-71. [PMID: 21046618 DOI: 10.1002/pmic.201000229] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
We analyze the relationship between codon usage bias and residue aggregation propensity in the genomes of four model organisms, Escherichia coli, yeast, fly, and mouse, as well as the archaeon Halobacterium species NRC-1. Using the Mantel-Haenszel procedure, we find that translationally optimal codons associate with aggregation-prone residues. Our results are qualitatively and quantitatively similar to those of an earlier study where we found an association between translationally optimal codons and buried residues. We also combine the aggregation-propensity data with solvent-accessibility data. Although the resulting data set is small, and hence statistical power low, results indicate that the association between optimal codons and aggregation-prone residues exists both at buried and at exposed sites. By comparing codon usage at different combinations of sites (exposed, aggregation-prone sites versus buried, non-aggregation-prone sites; buried, aggregation-prone sites versus exposed, non-aggregation-prone sites), we find that aggregation propensity and solvent accessibility seem to have independent effects of (on average) comparable magnitude on codon usage. Finally, in fly, we assess whether optimal codons associate with sites at which amino acid substitutions lead to an increase in aggregation propensity, and find only a very weak effect. These results suggest that optimal codons may be required to reduce the frequency of translation errors at aggregation-prone sites that coincide with certain functional sites, such as protein-protein interfaces. Alternatively, optimal codons may be required for rapid translation of aggregation-prone regions.
Collapse
Affiliation(s)
- Yaelim Lee
- Institute for Cell and Molecular Biology, The University of Texas at Austin, Austin, TX, USA
| | | | | | | | | |
Collapse
|
103
|
Abstract
Despite their name, synonymous mutations have significant consequences for cellular processes in all taxa. As a result, an understanding of codon bias is central to fields as diverse as molecular evolution and biotechnology. Although recent advances in sequencing and synthetic biology have helped to resolve longstanding questions about codon bias, they have also uncovered striking patterns that suggest new hypotheses about protein synthesis. Ongoing work to quantify the dynamics of initiation and elongation is as important for understanding natural synonymous variation as it is for designing transgenes in applied contexts.
Collapse
Affiliation(s)
- Joshua B Plotkin
- Department of Biology and Program in Applied Mathematics and Computational Science, University of Pennsylvania, 433 South University Avenue, Philadelphia, Pennsylvania 19104, USA.
| | | |
Collapse
|
104
|
Palidwor GA, Perkins TJ, Xia X. A general model of codon bias due to GC mutational bias. PLoS One 2010; 5:e13431. [PMID: 21048949 PMCID: PMC2965080 DOI: 10.1371/journal.pone.0013431] [Citation(s) in RCA: 122] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2010] [Accepted: 09/10/2010] [Indexed: 12/04/2022] Open
Abstract
Background In spite of extensive research on the effect of mutation and selection on codon usage, a general model of codon usage bias due to mutational bias has been lacking. Because most amino acids allow synonymous GC content changing substitutions in the third codon position, the overall GC bias of a genome or genomic region is highly correlated with GC3, a measure of third position GC content. For individual amino acids as well, G/C ending codons usage generally increases with increasing GC bias and decreases with increasing AT bias. Arginine and leucine, amino acids that allow GC-changing synonymous substitutions in the first and third codon positions, have codons which may be expected to show different usage patterns. Principal Findings In analyzing codon usage bias in hundreds of prokaryotic and plant genomes and in human genes, we find that two G-ending codons, AGG (arginine) and TTG (leucine), unlike all other G/C-ending codons, show overall usage that decreases with increasing GC bias, contrary to the usual expectation that G/C-ending codon usage should increase with increasing genomic GC bias. Moreover, the usage of some codons appears nonlinear, even nonmonotone, as a function of GC bias. To explain these observations, we propose a continuous-time Markov chain model of GC-biased synonymous substitution. This model correctly predicts the qualitative usage patterns of all codons, including nonlinear codon usage in isoleucine, arginine and leucine. The model accounts for 72%, 64% and 52% of the observed variability of codon usage in prokaryotes, plants and human respectively. When codons are grouped based on common GC content, 87%, 80% and 68% of the variation in usage is explained for prokaryotes, plants and human respectively. Conclusions The model clarifies the sometimes-counterintuitive effects that GC mutational bias can have on codon usage, quantifies the influence of GC mutational bias and provides a natural null model relative to which other influences on codon bias may be measured.
Collapse
|
105
|
Weber CC, Hurst LD. Intronic AT skew is a defendable proxy for germline transcription but does not predict crossing-over or protein evolution rates in Drosophila melanogaster. J Mol Evol 2010; 71:415-26. [PMID: 20938653 DOI: 10.1007/s00239-010-9395-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2010] [Accepted: 09/17/2010] [Indexed: 01/28/2023]
Abstract
Recent evidence suggests that germline transcription may affect both protein evolutionary rates, possibly mediated by repair processes, and recombination rates, possibly mediated by chromatin and epigenetic modification. Here, we test these propositions in Drosophila melanogaster. The challenge for such analyses is to provide defendable measures of germline gene expression. Intronic AT skew is a good candidate measure as it is thought to be a consequence, at least in part, of transcription-coupled repair. Prior evidence suggests that intronic AT skew in D. melanogaster is not affected by proximity to intron extremities and differs between transcribed DNA and flanking sequence. We now also establish that intronic AT skew is a defendable proxy for germline expression as (a) it is more similar than expected by chance between introns of the same gene (which is not accounted for by physical proximity), (b) is correlated with male germline expression, and (c) is more pronounced in broadly expressed genes. Furthermore, (d) a trend for intronic skew to differ between 3' and 5' ends of genes is particular to broadly expressed genes. Finally, (e) controlling for physical distance, introns of proximate genes are most different in skew if they have different tissue specificity. We find that intronic AT skew, employed as a proxy for germline transcription, correlates neither with recombination rates nor with the rate of protein evolution. We conclude that there is no prima facie evidence that germline expression modulates recombination rates or monotonically affects protein evolution rates in D. melanogaster.
Collapse
Affiliation(s)
- Claudia C Weber
- Department of Biology and Biochemistry, University of Bath, Bath, UK
| | | |
Collapse
|
106
|
Jiang R, Xie Z, Chen X, Geng Z. A single nucleotide polymorphism in the parathyroid hormone gene and effects on eggshell quality in chickens. Poult Sci 2010; 89:2101-5. [DOI: 10.3382/ps.2010-00888] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
|
107
|
Stover DA, Verrelli BC. Comparative Vertebrate Evolutionary Analyses of Type I Collagen: Potential of COL1a1 Gene Structure and Intron Variation for Common Bone-Related Diseases. Mol Biol Evol 2010; 28:533-42. [DOI: 10.1093/molbev/msq221] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
|
108
|
Abstract
Transcribed regions in the human genome differ from adjacent intergenic regions in transposable element density, crossover rates, and asymmetric substitution and sequence composition patterns. We tested whether these differences reflect selection or are instead a byproduct of germline transcription, using publicly available gene expression data from a variety of germline and somatic tissues. Crossover rate shows a strong negative correlation with gene expression in meiotic tissues, suggesting that crossover is inhibited by transcription. Strand-biased composition (G+T content) and A → G versus T → C substitution asymmetry are both positively correlated with germline gene expression. We find no evidence for a strand bias in allele frequency data, implying that the substitution asymmetry reflects a mutation rather than a fixation bias. The density of transposable elements is positively correlated with germline expression, suggesting that such elements preferentially insert into regions that are actively transcribed. For each of the features examined, our analyses favor a nonselective explanation for the observed trends and point to the role of germline gene expression in shaping the mammalian genome.
Collapse
Affiliation(s)
- Graham McVicker
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | | |
Collapse
|
109
|
O'Fallon BD. A method to correct for the effects of purifying selection on genealogical inference. Mol Biol Evol 2010; 27:2406-16. [PMID: 20513741 DOI: 10.1093/molbev/msq132] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Accurate reconstruction of the divergence times among individuals is an essential step toward inferring population parameters from genetic data. However, our ability to reconstruct accurate genealogies is often thwarted by the evolutionary forces we hope to detect, most prominently natural selection. Here, I demonstrate that purifying selection acting at many linked sites can systematically bias current methods of genealogical reconstruction, and I present a new method that corrects for this bias by allowing a class of sites to have a time-dependent rate. The parameters influencing the time dependency can be estimated from the data, allowing for a general method to detect the presence of selected sites and correcting for their distortion of the apparent mutation rate. The method works well under a variety of scenarios, including gamma-distributed selection coefficients as well as entirely neutral evolution. I also compare the performance of the new method to relaxed clock models, and I demonstrate the method on a data set from the mitochondrion of the North Atlantic whale-"louse" Cyamus ovalis.
Collapse
|
110
|
Rao YS, Wang ZF, Chai XW, Wu GZ, Zhou M, Nie QH, Zhang XQ. Selection for the compactness of highly expressed genes in Gallus gallus. Biol Direct 2010; 5:35. [PMID: 20465857 PMCID: PMC2883972 DOI: 10.1186/1745-6150-5-35] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2009] [Accepted: 05/14/2010] [Indexed: 11/10/2022] Open
Abstract
Background Coding sequence (CDS) length, gene size, and intron length vary within a genome and among genomes. Previous studies in diverse organisms, including human, D. Melanogaster, C. elegans, S. cerevisiae, and Arabidopsis thaliana, indicated that there are negative relationships between expression level and gene size, CDS length as well as intron length. Different models such as selection for economy model, genomic design model, and mutational bias hypotheses have been proposed to explain such observation. The debate of which model is a superior one to explain the observation has not been settled down. The chicken (Gallus gallus) is an important model organism that bridges the evolutionary gap between mammals and other vertebrates. As D. Melanogaster, chicken has a larger effective population size, selection for chicken genome is expected to be more effective in increasing protein synthesis efficiency. Therefore, in this study the chicken was used as a model organism to elucidate the interaction between gene features and expression pattern upon selection pressure. Results Based on different technologies, we gathered expression data for nuclear protein coding, single-splicing genes from Gallus gallus genome and compared them with gene parameters. We found that gene size, CDS length, first intron length, average intron length, and total intron length are negatively correlated with expression level and expression breadth significantly. The tissue specificity is positively correlated with the first intron length but negatively correlated with the average intron length, and not correlated with the CDS length and protein domain numbers. Comparison analyses showed that ubiquitously expressed genes and narrowly expressed genes with the similar expression levels do not differ in compactness. Our data provided evidence that the genomic design model can not, at least in part, explain our observations. We grouped all somatic-tissue-specific genes (n = 1105), and compared the first intron length and the average intron length between highly expressed genes (top 5% expressed genes) and weakly expressed genes (bottom 5% expressed genes). We found that the first intron length and the average intron length in highly expressed genes are not different from that in weakly expressed genes. We also made a comparison between ubiquitously expressed genes and narrowly expressed somatic genes with similar expression levels. Our data demonstrated that ubiquitously expressed genes are less compact than narrowly expressed genes with the similar expression levels. Obviously, these observations can not be explained by mutational bias hypotheses either. We also found that the significant trend between genes' compactness and expression level could not be affected by local mutational biases. We argued that the selection of economy model is most likely one to explain the relationship between gene expression and gene characteristics in chicken genome. Conclusion Natural selection appears to favor the compactness of highly expressed genes in chicken genome. This observation can be explained by the selection of economy model. Reviewers This article was reviewed by Dr. Gavin Huttley, Dr. Liran Carmel (nominated by Dr. Eugene V. Koonin) and Dr. Araxi Urrutia (nominated by Dr. Laurence D. Hurst).
Collapse
Affiliation(s)
- You S Rao
- Department of Biological Technology, Jiangxi Educational Institute, Nanchang, Jiangxi, China
| | | | | | | | | | | | | |
Collapse
|
111
|
Waldman YY, Tuller T, Shlomi T, Sharan R, Ruppin E. Translation efficiency in humans: tissue specificity, global optimization and differences between developmental stages. Nucleic Acids Res 2010; 38:2964-74. [PMID: 20097653 PMCID: PMC2875035 DOI: 10.1093/nar/gkq009] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2009] [Revised: 01/01/2010] [Accepted: 01/05/2010] [Indexed: 01/22/2023] Open
Abstract
Various studies in unicellular and multicellular organisms have shown that codon bias plays a significant role in translation efficiency (TE) by co-adaptation to the tRNA pool. Yet, in humans and other mammals the role of codon bias is still an open question, with contradictory results from different studies. Here we address this question, performing a large-scale tissue-specific analysis of TE in humans, using the tRNA Adaptation Index (tAI) as a direct measure for TE. We find tAI to significantly correlate with expression levels both in tissue-specific and in global expression measures, testifying to the TE of human tissues. Interestingly, we find significantly higher correlations in adult tissues as opposed to fetal tissues, suggesting that the tRNA pool is more adjusted to the adult period. Optimization based analysis suggests that the tRNA pool-codon bias co-adaptation is globally (and not tissue-specific) driven. Additionally, we find that tAI correlates with several measures related to the protein functionally importance, including gene essentiality. Using inferred tissue-specific tRNA pools lead to similar results and shows that tissue-specific genes are more adapted to their tRNA pool than other genes and that related sets of functional gene groups are translated efficiently in each tissue. Similar results are obtained for other mammals. Taken together, these results demonstrate the role of codon bias in TE in humans, and pave the way for future studies of tissue-specific TE in multicellular organisms.
Collapse
Affiliation(s)
- Yedael Y. Waldman
- Blavatnik School of Computer Science, Department of Molecular Microbiology and Biotechnology and School of Medicine, Tel Aviv University, Ramat Aviv 69978, Israel, Computer Science Department, Technion – Israel Institute of Technology, Haifa 32000, Israel
| | - Tamir Tuller
- Blavatnik School of Computer Science, Department of Molecular Microbiology and Biotechnology and School of Medicine, Tel Aviv University, Ramat Aviv 69978, Israel, Computer Science Department, Technion – Israel Institute of Technology, Haifa 32000, Israel
| | - Tomer Shlomi
- Blavatnik School of Computer Science, Department of Molecular Microbiology and Biotechnology and School of Medicine, Tel Aviv University, Ramat Aviv 69978, Israel, Computer Science Department, Technion – Israel Institute of Technology, Haifa 32000, Israel
| | - Roded Sharan
- Blavatnik School of Computer Science, Department of Molecular Microbiology and Biotechnology and School of Medicine, Tel Aviv University, Ramat Aviv 69978, Israel, Computer Science Department, Technion – Israel Institute of Technology, Haifa 32000, Israel
| | - Eytan Ruppin
- Blavatnik School of Computer Science, Department of Molecular Microbiology and Biotechnology and School of Medicine, Tel Aviv University, Ramat Aviv 69978, Israel, Computer Science Department, Technion – Israel Institute of Technology, Haifa 32000, Israel
| |
Collapse
|
112
|
Najafabadi HS, Goodarzi H, Salavati R. Universal function-specificity of codon usage. Nucleic Acids Res 2010; 37:7014-23. [PMID: 19773421 PMCID: PMC2790905 DOI: 10.1093/nar/gkp792] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Synonymous codon usage has long been known as a factor that affects average expression level of proteins in fast-growing microorganisms, but neither its role in dynamic changes of expression in response to environmental changes nor selective factors shaping it in the genomes of higher eukaryotes have been fully understood. Here, we propose that codon usage is ubiquitously selected to synchronize the translation efficiency with the dynamic alteration of protein expression in response to environmental and physiological changes. Our analysis reveals that codon usage is universally correlated with gene function, suggesting its potential contribution to synchronized regulation of genes with similar functions. We directly show that coexpressed genes have similar synonymous codon usages within the genomes of human, yeast, Caenorhabditis elegans and Escherichia coli. We also demonstrate that perturbing the codon usage directly affects the level or even direction of changes in protein expression in response to environmental stimuli. Perturbing tRNA composition also has tangible phenotypic effects on the cell. By showing that codon usage is universally function-specific, our results expand, to almost all organisms, the notion that cells may need to dynamically alter their intracellular tRNA composition in order to adapt to their new environment or physiological role.
Collapse
Affiliation(s)
- Hamed Shateri Najafabadi
- Institute of Parasitology, McGill University, 21111 Lakeshore Road, Ste. Anne de Bellevue, Montreal, Quebec, H9X3V9, Canada
| | | | | |
Collapse
|
113
|
Ogino K, Tsuneki K, Furuya H. Unique genome of dicyemid mesozoan: Highly shortened spliceosomal introns in conservative exon/intron structure. Gene 2010; 449:70-6. [DOI: 10.1016/j.gene.2009.09.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2008] [Revised: 08/31/2009] [Accepted: 09/01/2009] [Indexed: 01/08/2023]
|
114
|
Abstract
Recent large-scale cancer sequencing studies have focused primarily on identifying cancer-associated genes, but as an important byproduct provide "passenger mutation" data that can potentially illuminate the mutational mechanisms at work in cancer cells. Here, we explore patterns of nucleotide substitution in several cancer types using published data. We first show that selection (negative or positive) has affected only a small fraction of mutations, allowing us to attribute observed trends to underlying mutational processes rather than selection. We then show that the increased CpG mutation frequency observed in some cancers primarily occurs outside of CpG islands and CpG island shores, thus rejecting the hypothesis that the increase is a byproduct of island or shore methylation followed by deamination. We observe an A-->G vs. T-->C mutational asymmetry in some cancers similar to one that has been observed in germline mutations in transcribed regions, suggesting that the mutation process may be influenced by gene expression. We also demonstrate that the relative frequency of mutations at dinucleotide "hotspots" can be used as a tool to detect likely technical artifacts in large-scale studies.
Collapse
|
115
|
Carmel L, Koonin EV. A universal nonmonotonic relationship between gene compactness and expression levels in multicellular eukaryotes. Genome Biol Evol 2009; 1:382-90. [PMID: 20333206 PMCID: PMC2817431 DOI: 10.1093/gbe/evp038] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/19/2009] [Indexed: 01/21/2023] Open
Abstract
Analysis of gene architecture and expression levels of four organisms, Homo sapiens, Caenorhabditis elegans, Drosophila melanogaster, and Arabidopsis thaliana, reveals a surprising, nonmonotonic, universal relationship between expression level and gene compactness. With increasing expression level, the genes tend at first to become longer but, from a certain level of expression, they become more and more compact, resulting in an approximate bell-shaped dependence. There are two leading hypotheses to explain the compactness of highly expressed genes. The selection hypothesis predicts that gene compactness is predominantly driven by the level of expression, whereas the genomic design hypothesis predicts that expression breadth across tissues is the driving force. We observed the connection between gene expression breadth in humans and gene compactness to be significantly weaker than the connection between expression level and compactness, a result that is compatible with the selection hypothesis but not the genome design hypothesis. The initial gene elongation with increasing expression level could be explained, at least in part, by accumulation of regulatory elements enhancing expression, in particular, in introns. This explanation is compatible with the observed positive correlation between intron density and expression level of a gene. Conversely, the trend toward increasing compactness for highly expressed genes could be caused by selection for minimization of energy and time expenditure during transcription and splicing and for increased fidelity of transcription, splicing, and/or translation that is likely to be particularly critical for highly expressed genes. Regardless of the exact nature of the forces that shape the gene architecture, we present evidence that, at least, in animals, coding and noncoding parts of genes show similar architectonic trends.
Collapse
Affiliation(s)
- Liran Carmel
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| | | |
Collapse
|
116
|
Mutational biases and selective forces shaping the structure of Arabidopsis genes. PLoS One 2009; 4:e6356. [PMID: 19633720 PMCID: PMC2712092 DOI: 10.1371/journal.pone.0006356] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2009] [Accepted: 06/01/2009] [Indexed: 01/08/2023] Open
Abstract
Recently features of gene expression profiles have been associated with structural parameters of gene sequences in organisms representing a diverse set of taxa. The emerging picture indicates that natural selection, mediated by gene expression profiles, has a significant role in determining genic structures. However the current situation is less clear in plants as the available data indicates that the effect of natural selection mediated by gene expression is very weak. Moreover, the direction of the patterns in plants appears to contradict those observed in animal genomes. In the present work we analized expression data for >18000 Arabidopsis genes retrieved from public datasets obtained with different technologies (MPSS and high density chip arrays) and compared them with gene parameters. Our results show that the impact of natural selection mediated by expression on genes sequences is significant and distinguishable from the effects of regional mutational biases. In addition, we provide evidence that the level and the breadth of gene expression are related in opposite ways to many structural parameters of gene sequences. Higher levels of expression abundance are associated with smaller transcripts, consistent with the need to reduce costs of both transcription and translation. Expression breadth, however, shows a contrasting pattern, i.e. longer genes have higher breadth of expression, possibly to ensure those structural features associated with gene plasticity. Based on these results, we propose that the specific balance between these two selective forces play a significant role in shaping the structure of Arabidopsis genes.
Collapse
|
117
|
Clustering of codons with rare cognate tRNAs in human genes suggests an extra level of expression regulation. PLoS Genet 2009; 5:e1000548. [PMID: 19578405 PMCID: PMC2697378 DOI: 10.1371/journal.pgen.1000548] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2008] [Accepted: 06/03/2009] [Indexed: 12/31/2022] Open
Abstract
In species with large effective population sizes, highly expressed genes tend to be encoded by codons with highly abundant cognate tRNAs to maximize translation rate. However, there has been little evidence for a similar bias of synonymous codons in highly expressed human genes. Here, we ask instead whether there is evidence for the selection for codons associated with low abundance tRNAs. Rather than averaging the codon usage of complete genes, we scan the genes for windows with deviating codon usage. We show that there is a significant over representation of human genes that contain clusters of codons with low abundance cognate tRNAs. We name these regions, which on average have a 50% reduction in the amount of cognate tRNA available compared to the remainder of the gene, RTS (rare tRNA score) clusters. We observed a significant reduction in the substitution rate between the human RTS clusters and their orthologous chimp sequence, when compared to non-RTS cluster sequences. Overall, the genes with an RTS cluster have higher tissue specificity than the non-RTS cluster genes. Furthermore, these genes are functionally enriched for transcription regulation. As genes that regulate transcription in lower eukaryotes are known to be involved in translation on demand, this suggests that the mechanism of translation level expression regulation also exists within the human genome.
Collapse
|
118
|
Zhou T, Weems M, Wilke CO. Translationally optimal codons associate with structurally sensitive sites in proteins. Mol Biol Evol 2009; 26:1571-80. [PMID: 19349643 DOI: 10.1093/molbev/msp070] [Citation(s) in RCA: 152] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The mistranslation-induced protein misfolding hypothesis predicts that selection should prefer high-fidelity codons at sites at which translation errors are structurally disruptive and lead to protein misfolding and aggregation. To test this hypothesis, we analyzed the relationship between codon usage bias and protein structure in the genomes of four model organisms, Escherichia coli, yeast, fly, and mouse. Using both the Mantel-Haenszel procedure, which applies to categorical data, and a newly developed association test for continuous variables, we find that translationally optimal codons associate with buried residues and also with residues at sites where mutations lead to large changes in free energy (DeltaDeltaG). In each species, only a subset of all amino acids show this signal, but most amino acids show the signal in at least one species. By repeating the analysis on a reduced data set that excludes interdomain linkers, we show that our results are not caused by an association of rare codons with solvent-accessible linker regions. Finally, we find that our results depend weakly on expression level; the association between optimal codons and buried sites exists at all expression levels, but increases in strength as expression level increases.
Collapse
Affiliation(s)
- Tong Zhou
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, TX, USA
| | | | | |
Collapse
|
119
|
Abstract
Natural selection on codon usage is a pervasive force that acts on a large variety of prokaryotic and eukaryotic genomes. Despite this, obtaining reliable estimates of selection on codon usage has proved complicated, perhaps due to the fact that the selection coefficients involved are very small. In this work, a population genetics model is used to measure the strength of selected codon usage bias, S, in 10 eukaryotic genomes. It is shown that the strength of selection is closely linked to expression and that reliable estimates of selection coefficients can only be obtained for genes with very similar expression levels. We compare the strength of selected codon usage for orthologous genes across all 10 genomes classified according to expression categories. Fungi genomes present the largest S values (2.24-2.56), whereas multicellular invertebrate and plant genomes present more moderate values (0.61-1.91). The large mammalian genomes (human and mouse) show low S values (0.22-0.51) for the most highly expressed genes. This might not be evidence for selection in these organisms as the technique used here to estimate S does not properly account for nucleotide composition heterogeneity along such genomes. The relationship between estimated S values and empirical estimates of population size is presented here for the first time. It is shown, as theoretically expected, that population size has an important role in the operativity of translational selection.
Collapse
Affiliation(s)
- Mario dos Reis
- School of Crystallography, Birkbeck College, London, UK.
| | | |
Collapse
|
120
|
Niu DK. Exon definition as a potential negative force against intron losses in evolution. Biol Direct 2008; 3:46. [PMID: 19014515 PMCID: PMC2614967 DOI: 10.1186/1745-6150-3-46] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2008] [Accepted: 11/13/2008] [Indexed: 12/03/2022] Open
Abstract
Background Previous studies have indicated that the wide variation in intron density (the number of introns per gene) among different eukaryotes largely reflects varying degrees of intron loss during evolution. The most popular model, which suggests that organisms lose introns through a mechanism in which reverse-transcribed cDNA recombines with the genomic DNA, concerns only one mutational force. Hypothesis Using exons as the units of splicing-site recognition, exon definition constrains the length of exons. An intron-loss event results in fusion of flanking exons and thus a larger exon. The large size of the newborn exon may cause splicing errors, i.e., exon skipping, if the splicing of pre-mRNAs is initiated by exon definition. By contrast, if the splicing of pre-mRNAs is initiated by intron definition, intron loss does not matter. Exon definition may thus be a selective force against intron loss. An organism with a high frequency of exon definition is expected to experience a low rate of intron loss throughout evolution and have a high density of spliceosomal introns. Conclusion The majority of spliceosomal introns in vertebrates may be maintained during evolution not because of potential functions, but because of their splicing mechanism (i.e., exon definition). Further research is required to determine whether exon definition is a negative force in maintaining the high intron density of vertebrates. Reviewers This article was reviewed by Dr. Scott W. Roy (nominated by Dr. John Logsdon), Dr. Eugene V. Koonin, and Dr. Igor B. Rogozin (nominated by Dr. Mikhail Gelfand). For the full reviews, please go to the Reviewers' comments section.
Collapse
Affiliation(s)
- Deng-Ke Niu
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, PR China.
| |
Collapse
|
121
|
Mugal CF, von Grünberg HH, Peifer M. Transcription-induced mutational strand bias and its effect on substitution rates in human genes. Mol Biol Evol 2008; 26:131-42. [PMID: 18974087 DOI: 10.1093/molbev/msn245] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
If substitution rates are not the same on the two complementary DNA strands, a substitution is considered strand asymmetric. Such substitutional strand asymmetries are determined here for the three most frequent types of substitution on the human genome (C --> T, A --> G, and G --> T). Substitution rate differences between both strands are estimated for 4,590 human genes by aligning all repeats occurring within the introns with their ancestral consensus sequences. For 1,630 of these genes, both coding strand and noncoding strand rates could be compared with rates in gene-flanking regions. All three rates considered are found to be on average higher on the coding strand and lower on the transcribed strand in comparison to their values in the gene-flanking regions. This finding points to the simultaneous action of rate-increasing effects on the coding strand--such as increased adenine and cytosine deamination--and transcription-coupled repair as a rate-reducing effect on the transcribed strand. The common behavior of the three rates leads to strong correlations of the rate asymmetries: Whenever one rate is strand biased, the other two rates are likely to show the same bias. Furthermore, we determine all three rate asymmetries as a function of time: the A --> G and G --> T rate asymmetries are both found to be constant in time, whereas the C --> T rate asymmetry shows a pronounced time dependence, an observation that explains the difference between our results and those of an earlier work by Green et al. (2003. Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet. 33:514-517.). Finally, we show that in addition to transcription also the replication process biases the substitution rates in genes.
Collapse
Affiliation(s)
- Carina F Mugal
- Institute of Chemistry, Karl-Franzens University Graz, Graz, Austria
| | | | | |
Collapse
|
122
|
Atambayeva SA, Khailenko VA, Ivashchenko AT. Intron and exon length variation in Arabidopsis, rice, nematode, and human. Mol Biol 2008. [DOI: 10.1134/s0026893308020180] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
123
|
Huang YF, Niu DK. Evidence against the energetic cost hypothesis for the short introns in highly expressed genes. BMC Evol Biol 2008; 8:154. [PMID: 18492248 PMCID: PMC2424036 DOI: 10.1186/1471-2148-8-154] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2007] [Accepted: 05/20/2008] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND In animals, the moss Physcomitrella patens and the pollen of Arabidopsis thaliana, highly expressed genes have shorter introns than weakly expressed genes. A popular explanation for this is selection for transcription efficiency, which includes two sub-hypotheses: to minimize the energetic cost or to minimize the time cost. RESULTS In an individual human, different organs may differ up to hundreds of times in cell number (for example, a liver versus a hypothalamus). Considered at the individual level, a gene specifically expressed in a large organ is actually transcribed tens or hundreds of times more than a gene with a similar expression level (a measure of mRNA abundance per cell) specifically expressed in a small organ. According to the energetic cost hypothesis, the former should have shorter introns than the latter. However, in humans and mice we have not found significant differences in intron length between large-tissue/organ-specific genes and small-tissue/organ-specific genes with similar expression levels. Qualitative estimation shows that the deleterious effect (that is, the energetic burden) of long introns in highly expressed genes is too negligible to be efficiently selected against in mammals. CONCLUSION The short introns in highly expressed genes should not be attributed to energy constraint. We evaluated evidence for the time cost hypothesis and other alternatives.
Collapse
Affiliation(s)
- Yi-Fei Huang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, P R China.
| | | |
Collapse
|
124
|
Lanier W, Moustafa A, Bhattacharya D, Comeron JM. EST analysis of Ostreococcus lucimarinus, the most compact eukaryotic genome, shows an excess of introns in highly expressed genes. PLoS One 2008; 3:e2171. [PMID: 18478122 PMCID: PMC2367439 DOI: 10.1371/journal.pone.0002171] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2008] [Accepted: 03/25/2008] [Indexed: 11/19/2022] Open
Abstract
Background The genome of the pico-eukaryotic (bacterial-sized) prasinophyte green alga Ostreococcus lucimarinus has one of the highest gene densities known in eukaryotes, yet it contains many introns. Phylogenetic studies suggest this unusually compact genome (13.2 Mb) is an evolutionarily derived state among prasinophytes. The presence of introns in the highly reduced O. lucimarinus genome appears to be in opposition to simple explanations of genome evolution based on unidirectional tendencies, either neutral or selective. Therefore, patterns of intron retention in this species can potentially provide insights into the forces governing intron evolution. Methodology/Principal Findings Here we studied intron features and levels of expression in O. lucimarinus using expressed sequence tags (ESTs) to annotate the current genome assembly. ESTs were assembled into unigene clusters that were mapped back to the O. lucimarinus Build 2.0 assembly using BLAST and the level of gene expression was inferred from the number of ESTs in each cluster. We find a positive correlation between expression levels and both intron number (R = +0.0893, p = <0.0005) and intron density (number of introns/kb of CDS; R = +0.0753, p = <0.005). Conclusions/Significance In a species with a genome that has been recently subjected to a great reduction of non-coding DNA, these results imply the existence of selective/functional roles for introns that are principally detectable in highly expressed genes. In these cases, introns are likely maintained by balancing the selective forces favoring their maintenance with other mutational and/or selective forces acting on genome size.
Collapse
Affiliation(s)
- William Lanier
- Interdisciplinary Program in Genetics, University of Iowa, Iowa, United States of America
| | - Ahmed Moustafa
- Interdisciplinary Program in Genetics, University of Iowa, Iowa, United States of America
| | - Debashish Bhattacharya
- Interdisciplinary Program in Genetics, University of Iowa, Iowa, United States of America
- Department of Biological Sciences and Roy J. Carver Center for Comparative Genomics, University of Iowa, Iowa, United States of America
| | - Josep M. Comeron
- Interdisciplinary Program in Genetics, University of Iowa, Iowa, United States of America
- Department of Biological Sciences and Roy J. Carver Center for Comparative Genomics, University of Iowa, Iowa, United States of America
- * E-mail:
| |
Collapse
|
125
|
Larracuente AM, Sackton TB, Greenberg AJ, Wong A, Singh ND, Sturgill D, Zhang Y, Oliver B, Clark AG. Evolution of protein-coding genes in Drosophila. Trends Genet 2008; 24:114-23. [DOI: 10.1016/j.tig.2007.12.001] [Citation(s) in RCA: 225] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2007] [Revised: 12/06/2007] [Accepted: 12/10/2007] [Indexed: 11/27/2022]
|
126
|
Almuly R, Skopal T, Funkenstein B. Regulatory regions in the promoter and first intron of Sparus aurata growth hormone gene: Repression of gene activity by a polymorphic minisatellite. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY D-GENOMICS & PROTEOMICS 2008; 3:43-50. [DOI: 10.1016/j.cbd.2006.12.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2006] [Revised: 12/05/2006] [Accepted: 12/07/2006] [Indexed: 10/23/2022]
|
127
|
Dynamic covariation between gene expression and genome characteristics. Gene 2008; 410:53-66. [PMID: 18191345 DOI: 10.1016/j.gene.2007.11.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2007] [Revised: 11/13/2007] [Accepted: 11/29/2007] [Indexed: 11/21/2022]
Abstract
Gene and protein expression is controlled so that cells can react to changing intra- and extracellular signals by modulating biochemical networks and pathways. We have previously shown that gene expression and the properties of expressed proteins are dynamically correlated. Here we investigated correlations between gene related parameters and gene expression patterns, and found statistically significant correlations in microarray datasets for different cell types, organisms and processes, including human B and T cell stimulation, cell cycle in HeLa cells, infection in intestinal epithelial cells, Drosophila melanogaster life span, and Saccharomyces cerevisiae cell cycle. Our method was applied to time course datasets individually for each time point. We derived from sequence information numerous parameters for nucleotide composition, two-base composition, codon usage, skew parameters, and codon bias. In addition to coding regions, we also investigated correlations for complete genes and introns. Significant dynamic correlations were identified for each of the analyses. Our method also proved useful for detecting dynamic shifts in gene expression profiles, such as in the D. melanogaster dataset. Detection of changes in the properties of expressed genes and proteins might be useful for predicting or following biological processes, responses, growth, differentiation and possibly in related disorders.
Collapse
|
128
|
Arhondakis S, Clay O, Bernardi G. GC level and expression of human coding sequences. Biochem Biophys Res Commun 2008; 367:542-5. [PMID: 18177737 DOI: 10.1016/j.bbrc.2007.12.155] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2007] [Accepted: 12/21/2007] [Indexed: 11/29/2022]
Abstract
Several groups have addressed the issue of the influence of GC on expression levels in mammalian genes. In general, GC-rich genes appeared to be more expressed than GC-poor ones. Recently, expression levels of GC(3)-rich and GC(3)-poor versions of genes (GC(3) is the third codon position GC), inserted in vector plasmids, were compared in order to eliminate differences associated with their genomic context. Transfection experiments showed that GC(3)-rich genes were expressed more efficiently than their GC(3)-poor counterparts, indicating that GC(3) dramatically and intrinsically boosts expression efficiency. Here we show that, while the protocols used eliminated the original genomic context, they replaced it with the plasmid contexts whose compositional properties affected the results.
Collapse
Affiliation(s)
- Stilianos Arhondakis
- Laboratory of Molecular Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy
| | | | | |
Collapse
|
129
|
Fahey ME, Higgins DG. Gene expression, intron density, and splice site strength in Drosophila and Caenorhabditis. J Mol Evol 2007; 65:349-57. [PMID: 17763878 DOI: 10.1007/s00239-007-9015-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2007] [Accepted: 07/06/2007] [Indexed: 10/22/2022]
Abstract
In this paper we investigate the relationships among intron density (number of introns per kilobase of coding sequence), gene expression level, and strength of splicing signals in two species: Drosophila melanogaster and Caenorhabditis elegans. We report a negative correlation between intron density and gene expression levels, opposite to the effect previously observed in human. An increase in splice site strength has been observed in long introns in D. melanogaster. We show this is also true of C. elegans. We also examine the relationship between intron density and splice site strength. There is an increase in splice site strength as the intron structure becomes less dense. This could suggest that introns are not recognized in isolation but could function in a cooperative manner to ensure proper splicing. This effect remains if we control for the effects of alternative splicing on splice site strength.
Collapse
Affiliation(s)
- Marie E Fahey
- UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland.
| | | |
Collapse
|
130
|
Abstract
While it has often been assumed that, in humans, synonymous mutations would have no effect on fitness, let alone cause disease, this position has been questioned over the last decade. There is now considerable evidence that such mutations can, for example, disrupt splicing and interfere with miRNA binding. Two recent publications suggest involvement of additional mechanisms: modification of protein abundance most probably mediated by alteration in mRNA stability and modification of protein structure and activity, probably mediated by induction of translational pausing. These case histories put a further nail into the coffin of the assumption that synonymous mutations must be neutral.
Collapse
Affiliation(s)
- Joanna L Parmley
- Department of Biology and Biochemistry, University of Bath, Bath, UK
| | | |
Collapse
|
131
|
Li SW, Feng L, Niu DK. Selection for the miniaturization of highly expressed genes. Biochem Biophys Res Commun 2007; 360:586-92. [PMID: 17610841 DOI: 10.1016/j.bbrc.2007.06.085] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2007] [Accepted: 06/18/2007] [Indexed: 11/29/2022]
Abstract
Most widely expressed genes are also highly expressed. Based on high or wide expression, different models were proposed to explain the small sizes of highly/widely expressed genes. We found that housekeeping genes are not more compact than narrowly expressed genes with similar expression levels, but compactness and expression level are correlated in housekeeping genes (except that highly expressed Arabidopsis HK genes have longer intron length). Meanwhile, we found evidence that genes with high functional/regulatory complexity do not have longer introns and longer proteins. The genome design hypothesis is thus not supported. Furthermore, we found that housekeeping genes are not more compact than the narrowly expressed somatic genes with similar average expression levels. Because housekeeping genes are expected to have much higher germline expression levels than narrowly expressed somatic genes, transcription-associated deletion bias is not supported. Selection of the compactness of highly expressed genes for economy is supported.
Collapse
Affiliation(s)
- Shu-Wei Li
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | | | | |
Collapse
|
132
|
Mutational pattern and frequency of induced nucleotide changes in mouse ENU mutagenesis. BMC Mol Biol 2007; 8:52. [PMID: 17584492 PMCID: PMC1914352 DOI: 10.1186/1471-2199-8-52] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2006] [Accepted: 06/20/2007] [Indexed: 11/16/2022] Open
Abstract
Background With the advent of sequence-based approaches in the mutagenesis studies, it is now possible to directly evaluate the genome-wide pattern of experimentally induced DNA sequence changes for a diverse array of organisms. To gain a more comprehensive understanding of the mutational bias inherent in mouse ENU mutagenesis, this study describes a detailed evaluation of the induced mutational pattern obtained from a sequence-based screen of ENU-mutagenized mice. Results Based on a large-scale screening data, we derive the sequence-based estimates of the nucleotide-specific pattern and frequency of ENU-induced base replacement mutation in the mouse germline, which are then combined with the pattern of codon usage in the mouse coding sequences to infer the spectrum of amino acid changes obtained by ENU mutagenesis. We detect a statistically significant difference between the mutational patterns in phenotype- versus sequence-based screens, which presumably reflects differential phenotypic effects caused by different amino acid replacements. We also demonstrate that the mutations exhibit strong strand asymmetry, and that this imbalance is generated by transcription, most likely as a by-product of transcription-coupled DNA repair in the germline. Conclusion The results clearly illustrate the biased nature of ENU-induced mutations. We expect that a precise understanding of the mutational pattern and frequency of induced nucleotide changes would be of practical importance when designing sequence-based screening strategies to generate mutant mouse strains harboring amino acid variants at specific loci. More generally, by enhancing the collection of experimentally induced mutations in unambiguously defined genomic regions, sequence-based mutagenesis studies will further illuminate the molecular basis of mutagenic and repair mechanisms that preferentially produce a certain class of mutational changes over others.
Collapse
|
133
|
Ren L, Gao G, Zhao D, Ding M, Luo J, Deng H. Developmental stage related patterns of codon usage and genomic GC content: searching for evolutionary fingerprints with models of stem cell differentiation. Genome Biol 2007; 8:R35. [PMID: 17349061 PMCID: PMC1868930 DOI: 10.1186/gb-2007-8-3-r35] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2006] [Revised: 01/08/2007] [Accepted: 03/12/2007] [Indexed: 11/26/2022] Open
Abstract
Developmental-stage-related patterns of gene expression correlate with codon usage and genomic GC content in stem cell hierarchies. Background The usage of synonymous codons shows considerable variation among mammalian genes. How and why this usage is non-random are fundamental biological questions and remain controversial. It is also important to explore whether mammalian genes that are selectively expressed at different developmental stages bear different molecular features. Results In two models of mouse stem cell differentiation, we established correlations between codon usage and the patterns of gene expression. We found that the optimal codons exhibited variation (AT- or GC-ending codons) in different cell types within the developmental hierarchy. We also found that genes that were enriched (developmental-pivotal genes) or specifically expressed (developmental-specific genes) at different developmental stages had different patterns of codon usage and local genomic GC (GCg) content. Moreover, at the same developmental stage, developmental-specific genes generally used more GC-ending codons and had higher GCg content compared with developmental-pivotal genes. Further analyses suggest that the model of translational selection might be consistent with the developmental stage-related patterns of codon usage, especially for the AT-ending optimal codons. In addition, our data show that after human-mouse divergence, the influence of selective constraints is still detectable. Conclusion Our findings suggest that developmental stage-related patterns of gene expression are correlated with codon usage (GC3) and GCg content in stem cell hierarchies. Moreover, this paper provides evidence for the influence of natural selection at synonymous sites in the mouse genome and novel clues for linking the molecular features of genes to their patterns of expression during mammalian ontogenesis.
Collapse
Affiliation(s)
- Lichen Ren
- College of Life Sciences, Shanghai Jiao Tong University, Shanghai, 200240, PR China
| | - Ge Gao
- Center for Bioinformatics, College of Life Sciences, National Laboratory of Protein Engineering and Plant Genetics Engineering, Peking University, Beijing, 100871, PR China
| | - Dongxin Zhao
- Department of Cell Biology and Genetics, College of Life Sciences, Peking University, Beijing, 100871, PR China
| | - Mingxiao Ding
- Department of Cell Biology and Genetics, College of Life Sciences, Peking University, Beijing, 100871, PR China
| | - Jingchu Luo
- Center for Bioinformatics, College of Life Sciences, National Laboratory of Protein Engineering and Plant Genetics Engineering, Peking University, Beijing, 100871, PR China
| | - Hongkui Deng
- Department of Cell Biology and Genetics, College of Life Sciences, Peking University, Beijing, 100871, PR China
| |
Collapse
|
134
|
Parmley JL, Hurst LD. Exonic splicing regulatory elements skew synonymous codon usage near intron-exon boundaries in mammals. Mol Biol Evol 2007; 24:1600-3. [PMID: 17525472 DOI: 10.1093/molbev/msm104] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
In mammals there is a bias in amino acid usage near splice sites that is explained, in large part, by the high density of exonic splicing enhancers (ESEs) in these regions. Is there a similar bias for the relative use of synonymous codons, and can any such bias be predicted by their abundance in ESEs? Prior reports suggested that such trends may exist. From analysis of human exons, we find that 47 of the 59 codons with at least one synonym show differential usage in the proximity of exon ends, of which 42 remain significant after correction for multiple testing. Within sets of synonymous codons those more preferred near splice sites are generally those that are relatively more abundant within the ESEs. However, the examples given previously appear exceptionally good fits and there exist many exceptions, the usage of lysine's codons being a case in point. Similar results are observed in mouse exons. We conclude that splice regulation impacts on the choice of synonymous codons in mammals, but the magnitude of this effect is less than might at first have been supposed.
Collapse
Affiliation(s)
- Joanna L Parmley
- Department of Biology and Biochemistry, University of Bath, Bath, UK.
| | | |
Collapse
|
135
|
Abstract
Compact genes contain short and few introns, and they are highly expressed in different animal genomes. Recently, it has been shown that in Oryza sativa and Arabidopsis thaliana, highly expressed genes tend to be least compact, containing long and many introns. It has been suggested that selection on genome organization may have acted differently in plants compared with animals. Gene expression can be estimated as the number of hits when comparing a gene sequence with publicly available expressed sequence tags. Here it is shown that in the haploid moss Physcomitrella pates, highly expressed genes contain shorter introns than genes with low expression levels. This study therefore supports the hypothesis that selection may strongly favour transcriptional efficiency at least in the haploid phase of plant life cycles. It is concluded that plants do not necessarily respond to other selection pressures than animals regarding genome structuring.
Collapse
Affiliation(s)
- H K Stenøien
- Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway.
| |
Collapse
|
136
|
Niu DK. Protecting exons from deleterious R-loops: a potential advantage of having introns. Biol Direct 2007; 2:11. [PMID: 17459149 PMCID: PMC1863416 DOI: 10.1186/1745-6150-2-11] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2007] [Accepted: 04/25/2007] [Indexed: 02/02/2023] Open
Abstract
Background Accumulating evidence indicates that the nascent RNA can invade and pair with one strand of DNA, forming an R-loop structure that threatens the stability of the genome. In addition, the cost and benefit of introns are still in debate. Results At least three factors are likely required for the R-loop formation: 1) sequence complementarity between the nascent RNA and the target DNA, 2) spatial juxtaposition between the nascent RNA and the template DNA, and 3) accessibility of the template DNA and the nascent RNA. The removal of introns from pre-mRNA reduces the complementarity between RNA and the template DNA and avoids the spatial juxtaposition between the nascent RNA and the template DNA. In addition, the secondary structures of group I and group II introns may act as spatial obstacles for the formation of R-loops between nearby exons and the genomic DNA. Conclusion Organisms may benefit from introns by avoiding deleterious R-loops. The potential contribution of this benefit in driving intron evolution is discussed. I propose that additional RNA polymerases may inhibit R-loop formation between preceding nascent RNA and the template DNA. This idea leads to a testable prediction: intermittently transcribed genes and genes with frequently prolonged transcription should have higher intron density. Reviewers This article was reviewed by Dr. Eugene V. Koonin, Dr. Alexei Fedorov (nominated by Dr. Laura F Landweber), and Dr. Scott W. Roy (nominated by Dr. Arcady Mushegian).
Collapse
Affiliation(s)
- Deng-Ke Niu
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China.
| |
Collapse
|
137
|
Parmley JL, Urrutia AO, Potrzebowski L, Kaessmann H, Hurst LD. Splicing and the evolution of proteins in mammals. PLoS Biol 2007; 5:e14. [PMID: 17298171 PMCID: PMC1790955 DOI: 10.1371/journal.pbio.0050014] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2006] [Accepted: 11/13/2006] [Indexed: 12/25/2022] Open
Abstract
It is often supposed that a protein's rate of evolution and its amino acid content are determined by the function and anatomy of the protein. Here we examine an alternative possibility, namely that the requirement to specify in the unprocessed RNA, in the vicinity of intron–exon boundaries, information necessary for removal of introns (e.g., exonic splice enhancers) affects both amino acid usage and rates of protein evolution. We find that the majority of amino acids show skewed usage near intron–exon boundaries, and that differences in the trends for the 2-fold and 4-fold blocks of both arginine and leucine show this to be owing to effects mediated at the nucleotide level. More specifically, there is a robust relationship between the extent to which an amino acid is preferred/avoided near boundaries and its enrichment/paucity in splice enhancers. As might then be expected, the rate of evolution is lowest near intron–exon boundaries, at least in part owing to splice enhancers, such that domains flanking intron–exon junctions evolve on average at under half the rate of exon centres from the same gene. In contrast, the rate of evolution of intronless retrogenes is highest near the domains where intron–exon junctions previously resided. The proportion of sequence near intron–exon boundaries is one of the stronger predictors of a protein's rate of evolution in mammals yet described. We conclude that after intron insertion selection favours modification of amino acid content near intron–exon junctions, so as to enable efficient intron removal, these changes then being subject to strong purifying selection even if nonoptimal for protein function. Thus there exists a strong force operating on protein evolution in mammals that is not explained directly in terms of the biology of the protein. Intron-exon boundaries, once fixed in proteins, are found to be subject to purifying selection, even if they are not optimal for protein function. Most of the DNA in our genes is actually not involved in the specification of proteins. Rather, the bits with the protein-coding information (exons) are separated from each other by noncoding bits, introns. Before a gene can be translated into protein these introns are removed and the exons are spliced back together to be translated into protein. While information about which DNA to remove is largely in the introns themselves, parts of the exons near the intron–exon boundary can, for example, function as splice enhancer elements. In principle, then, these parts of exons have two functions: to specify the amino acids of the resulting protein and to enable the correct removal of introns. What impact might this have on a gene's evolution? We show that near intron–exon boundaries, amino acid usage is biased towards nucleotides involved in splice control. Moreover, these parts of genes evolve especially slowly. Indeed, we estimate that a gene with many exons would evolve at under half the rate of the same gene with no introns, simply owing to the need to specify where to remove introns. Likewise, genes that have lost their introns evolve especially fast near the former intron's location. Thus, human proteins may not be as optimised as they could be, as their sequence is serving two conflicting roles.
Collapse
Affiliation(s)
- Joanna L Parmley
- Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | - Araxi O Urrutia
- Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | - Lukasz Potrzebowski
- Center for Integrative Genomics, Genopode, University of Lausanne, Lausanne, Switzerland
| | - Henrik Kaessmann
- Center for Integrative Genomics, Genopode, University of Lausanne, Lausanne, Switzerland
| | - Laurence D Hurst
- Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
138
|
Abstract
Research into the origins of introns is at a critical juncture in the resolution of theories on the evolution of early life (which came first, RNA or DNA?), the identity of LUCA (the last universal common ancestor, was it prokaryotic- or eukaryotic-like?), and the significance of noncoding nucleotide variation. One early notion was that introns would have evolved as a component of an efficient mechanism for the origin of genes. But alternative theories emerged as well. From the debate between the "introns-early" and "introns-late" theories came the proposal that introns arose before the origin of genetically encoded proteins and DNA, and the more recent "introns-first" theory, which postulates the presence of introns at that early evolutionary stage from a reconstruction of the "RNA world." Here we review seminal and recent ideas about intron origins. Recent discoveries about the patterns and causes of intron evolution make this one of the most hotly debated and exciting topics in molecular evolutionary biology today.
Collapse
Affiliation(s)
- Francisco Rodríguez-Trelles
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697-2525, USA.
| | | | | |
Collapse
|
139
|
Abstract
Human tissue-specific genes were reported to be longer than housekeeping genes (both in coding and intronic parts). The competing neutralist and adaptationist models were proposed to explain this observation. Here I show that in human genome the longest are genes with the intermediate expression pattern. From the standpoint of information theory, the regulation of such genes should be most complex. In the genomewide context, they are found here to have the higher informational load on all available levels: from participation in protein interaction networks, pathways and modules reflected in Gene Ontology categories through transcription factor regulatory sets and protein functional domains to amino acid tuples (words) in encoded proteins and nucleotide tuples in introns and promoter regions. Thus, the intermediately expressed genes have the higher functional and regulatory complexity that is reflected in their greater length (which is consistent with the 'genome design' model). The dichotomy of housekeeping versus tissue-specific entities is more pronounced on the modular level than on the molecular level. There are much lesser intermediate-specific modules (modules overrepresented in the intermediately expressed genes) than housekeeping or tissue-specific modules (normalized to gene number). The dichotomy of housekeeping versus tissue-specific genes and modules in multicellular organisms is probably caused by the burden of regulatory complexity acted on the intermediately expressed genes.
Collapse
|
140
|
Arhondakis S, Clay O, Bernardi G. Compositional properties of human cDNA libraries: practical implications. FEBS Lett 2006; 580:5772-8. [PMID: 17022979 DOI: 10.1016/j.febslet.2006.09.034] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2006] [Revised: 09/12/2006] [Accepted: 09/19/2006] [Indexed: 01/28/2023]
Abstract
The strikingly wide and bimodal gene distribution exhibited by the human genome has prompted us to study the correlations between EST-counts (expression levels) and base composition of genes, especially since existing data are contradictory. Here we investigate how cDNA library preparation affects the GC distributions of ESTs and/or genes found in the library, and address consequences for expression studies. We observe that strongly anomalous GC distributions often indicate experimental biases or deficits during their preparation. We propose the use of compositional distributions of raw ESTs from a cDNA library, and/or of the genes they represent, as a simple and effective tool for quality control.
Collapse
Affiliation(s)
- Stilianos Arhondakis
- Laboratory of Molecular Evolution, Stazione Zoologica Anton Dohrn, 80121 Naples, Italy
| | | | | |
Collapse
|
141
|
Qu HQ, Lawrence SG, Guo F, Majewski J, Polychronakos C. Strand bias in complementary single-nucleotide polymorphisms of transcribed human sequences: evidence for functional effects of synonymous polymorphisms. BMC Genomics 2006; 7:213. [PMID: 16916449 PMCID: PMC1559705 DOI: 10.1186/1471-2164-7-213] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2006] [Accepted: 08/17/2006] [Indexed: 11/25/2022] Open
Abstract
Background Complementary single-nucleotide polymorphisms (SNPs) may not be distributed equally between two DNA strands if the strands are functionally distinct, such as in transcribed genes. In introns, an excess of A↔G over the complementary C↔T substitutions had previously been found and attributed to transcription-coupled repair (TCR), demonstrating the valuable functional clues that can be obtained by studying such asymmetry. Here we studied asymmetry of human synonymous SNPs (sSNPs) in the fourfold degenerate (FFD) sites as compared to intronic SNPs (iSNPs). Results The identities of the ancestral bases and the direction of mutations were inferred from human-chimpanzee genomic alignment. After correction for background nucleotide composition, excess of A→G over the complementary T→C polymorphisms, which was observed previously and can be explained by TCR, was confirmed in FFD SNPs and iSNPs. However, when SNPs were separately examined according to whether they mapped to a CpG dinucleotide or not, an excess of C→T over G→A polymorphisms was found in non-CpG site FFD SNPs but was absent from iSNPs and CpG site FFD SNPs. Conclusion The genome-wide discrepancy of human FFD SNPs provides novel evidence for widespread selective pressure due to functional effects of sSNPs. The similar asymmetry pattern of FFD SNPs and iSNPs that map to a CpG can be explained by transcription-coupled mechanisms, including TCR and transcription-coupled mutation. Because of the hypermutability of CpG sites, more CpG site FFD SNPs are relatively younger and have confronted less selection effect than non-CpG FFD SNPs, which can explain the asymmetric discrepancy of CpG site FFD SNPs vs. non-CpG site FFD SNPs.
Collapse
Affiliation(s)
- Hui-Qi Qu
- Endocrine Genetics Laboratory, The McGill University Health Center (Montreal Children's Hospital), Montréal, Québec, Canada
| | - Steve G Lawrence
- Department of Human Genetics, McGill University, Montréal, Québec, Canada
| | - Fan Guo
- Endocrine Genetics Laboratory, The McGill University Health Center (Montreal Children's Hospital), Montréal, Québec, Canada
| | - Jacek Majewski
- Department of Human Genetics, McGill University, Montréal, Québec, Canada
| | - Constantin Polychronakos
- Endocrine Genetics Laboratory, The McGill University Health Center (Montreal Children's Hospital), Montréal, Québec, Canada
- Department of Pediatrics, The McGill University Health Center (Montreal Children's Hospital), 2300 Tupper, Montréal, Québec H3H 1P3, Canada
| |
Collapse
|
142
|
Diatchenko L, Anderson AD, Slade GD, Fillingim RB, Shabalina SA, Higgins TJ, Sama S, Belfer I, Goldman D, Max MB, Weir BS, Maixner W. Three major haplotypes of the beta2 adrenergic receptor define psychological profile, blood pressure, and the risk for development of a common musculoskeletal pain disorder. Am J Med Genet B Neuropsychiatr Genet 2006; 141B:449-62. [PMID: 16741943 PMCID: PMC2570772 DOI: 10.1002/ajmg.b.30324] [Citation(s) in RCA: 138] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Adrenergic receptor beta(2) (ADRB2) is a primary target for epinephrine. It plays a critical role in mediating physiological and psychological responses to environmental stressors. Thus, functional genetic variants of ADRB2 will be associated with a complex array of psychological and physiological phenotypes. These genetic variants should also interact with environmental factors such as physical or emotional stress to produce a phenotype vulnerable to pathological states. In this study, we determined whether common genetic variants of ADRB2 contribute to the development of a common chronic pain condition that is associated with increased levels of psychological distress and low blood pressure, factors which are strongly influenced by the adrenergic system. We genotyped 202 female subjects and examined the relationships between three major ADRB2 haplotypes and psychological factors, resting blood pressure, and the risk of developing a chronic musculoskeletal pain condition-Temporomandibular Joint Disorder (TMD). We propose that the first haplotype codes for lower levels of ADRB2 expression, the second haplotype codes for higher ADRB2 expression, and the third haplotype codes for higher receptor expression and rapid agonist-induced internalization. Individuals who carried one haplotype coding for high and one coding for low ADRB2 expression displayed the highest positive psychological traits, had higher levels of resting arterial pressure, and were about 10 times less likely to develop TMD. Thus, our data suggest that either positive or negative imbalances in ADRB2 function increase the vulnerability to chronic pain conditions such as TMD through different etiological pathways that imply the need for tailored treatment options.
Collapse
Affiliation(s)
- Luda Diatchenko
- University of North Carolina, Center for Neurosensory Disorders, North Carolina, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
143
|
Hurst LD. Preliminary assessment of the impact of microRNA-mediated regulation on coding sequence evolution in mammals. J Mol Evol 2006; 63:174-82. [PMID: 16786435 DOI: 10.1007/s00239-005-0273-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2005] [Accepted: 03/02/2006] [Indexed: 10/24/2022]
Abstract
Despite prior claims to the contrary, several lines of evidence suggest that selection acts on synonymous mutations in mammals. What might be the mechanisms for such selection? Here I attempt to quantify the constraints on the evolution of the coding sequence resulting from regulation of mRNA by microRNAs (miRNAs) that antisense-bind to the coding region of mRNAs. I employ a set of genes recently experimentally verified to be the target of a miRNA, all with putative antisense pairing domains within the coding sequence. Although very small ( approximately 22 nucleotides), 2 of 13 pairing domains show evidence of significantly slow sequence evolution. This, along with evidence that these genes are regulated by the miRNA under consideration, provides the first good candidate domains for intra-CDS pairing of a miRNA in mammals. When analyzed en masse, the putative pairing domains have a significantly reduced rate of synonymous evolution (approximately 35% lower than null). However, given the size and rarity of pairing domains within the coding sequence, the effects that such constraint has on estimates of the mutation rate are small enough to be ignored (probably less than 1% reduction). The pairing sites also have low Ka values and the selection on the synonymous sites is unlikely to lead to misleading reports of localized high Ka/Ks ratios.
Collapse
Affiliation(s)
- Laurence D Hurst
- Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK.
| |
Collapse
|
144
|
Shabalina SA, Ogurtsov AY, Spiridonov NA. A periodic pattern of mRNA secondary structure created by the genetic code. Nucleic Acids Res 2006; 34:2428-37. [PMID: 16682450 PMCID: PMC1458515 DOI: 10.1093/nar/gkl287] [Citation(s) in RCA: 151] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Single-stranded mRNA molecules form secondary structures through complementary self-interactions. Several hypotheses have been proposed on the relationship between the nucleotide sequence, encoded amino acid sequence and mRNA secondary structure. We performed the first transcriptome-wide in silico analysis of the human and mouse mRNA foldings and found a pronounced periodic pattern of nucleotide involvement in mRNA secondary structure. We show that this pattern is created by the structure of the genetic code, and the dinucleotide relative abundances are important for the maintenance of mRNA secondary structure. Although synonymous codon usage contributes to this pattern, it is intrinsic to the structure of the genetic code and manifests itself even in the absence of synonymous codon usage bias at the 4-fold degenerate sites. While all codon sites are important for the maintenance of mRNA secondary structure, degeneracy of the code allows regulation of stability and periodicity of mRNA secondary structure. We demonstrate that the third degenerate codon sites contribute most strongly to mRNA stability. These results convincingly support the hypothesis that redundancies in the genetic code allow transcripts to satisfy requirements for both protein structure and RNA structure. Our data show that selection may be operating on synonymous codons to maintain a more stable and ordered mRNA secondary structure, which is likely to be important for transcript stability and translation. We also demonstrate that functional domains of the mRNA [5′-untranslated region (5′-UTR), CDS and 3′-UTR] preferentially fold onto themselves, while the start codon and stop codon regions are characterized by relaxed secondary structures, which may facilitate initiation and termination of translation.
Collapse
Affiliation(s)
- Svetlana A Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
145
|
Chamary JV, Parmley JL, Hurst LD. Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet 2006; 7:98-108. [PMID: 16418745 DOI: 10.1038/nrg1770] [Citation(s) in RCA: 590] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Although the assumption of the neutral theory of molecular evolution - that some classes of mutation have too small an effect on fitness to be affected by natural selection - seems intuitively reasonable, over the past few decades the theory has been in retreat. At least in species with large populations, even synonymous mutations in exons are not neutral. By contrast, in mammals, neutrality of these mutations is still commonly assumed. However, new evidence indicates that even some synonymous mutations are subject to constraint, often because they affect splicing and/or mRNA stability. This has implications for understanding disease, optimizing transgene design, detecting positive selection and estimating the mutation rate.
Collapse
Affiliation(s)
- J V Chamary
- Center for Integrative Genomics, University of Lausanne, Switzerland.
| | | | | |
Collapse
|
146
|
Comeron JM. Weak selection and recent mutational changes influence polymorphic synonymous mutations in humans. Proc Natl Acad Sci U S A 2006; 103:6940-5. [PMID: 16632609 PMCID: PMC1458998 DOI: 10.1073/pnas.0510638103] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Recent large-scale genomic and evolutionary studies have revealed the small but detectable signature of weak selection on synonymous mutations during mammalian evolution, likely acting at the level of translational efficacy (i.e., translational selection). To investigate whether weak selection, and translational selection in particular, plays any role in shaping the fate of synonymous mutations that are present today in human populations, we studied genetic variation at the polymorphic level and patterns of evolution in the human lineage after human-chimpanzee separation. We find evidence that neutral mechanisms are influencing the frequency of polymorphic mutations in humans. Our results suggest a recent increase in mutational tendencies toward AT, observed in all isochores, that is responsible for AT mutations segregating at lower frequencies than GC mutations. In all, however, changes in mutational tendencies and other neutral scenarios are not sufficient to explain a difference between synonymous and noncoding mutations or a difference between synonymous mutations potentially advantageous or deleterious under a translational selection model. Furthermore, several estimates of selection intensity on synonymous mutations all suggest a detectable influence of weak selection acting at the level of translational selection. Thus, random genetic drift, recent changes in mutational tendencies, and weak selection influence the fate of synonymous mutations that are present today as polymorphisms. All of these features, neutral and selective, should be taken into account in evolutionary analyses that often assume constancy of mutational tendencies and complete neutrality of synonymous mutations.
Collapse
Affiliation(s)
- Josep M Comeron
- Department of Biological Sciences, University of Iowa, 212 Biology Building, Iowa City, IA 52242, USA.
| |
Collapse
|
147
|
Kotlar D, Lavner Y. The action of selection on codon bias in the human genome is related to frequency, complexity, and chronology of amino acids. BMC Genomics 2006; 7:67. [PMID: 16584540 PMCID: PMC1456966 DOI: 10.1186/1471-2164-7-67] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2005] [Accepted: 04/03/2006] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND The question of whether synonymous codon choice is affected by cellular tRNA abundance has been positively answered in many organisms. In some recent works, concerning the human genome, this relation has been studied, but no conclusive answers have been found. In the human genome, the variation in base composition and the absence of cellular tRNA count data makes the study of the question more complicated. In this work we study the relation between codon choice and tRNA abundance in the human genome by correcting relative codon usage for background base composition and using a measure based on tRNA-gene copy numbers as a rough estimate of tRNA abundance. RESULTS We term major codons to be those codons with a relatively large tRNA-gene copy number for their corresponding amino acid. We use two measures of expression: breadth of expression (the number of tissues in which a gene was expressed) and maximum expression level among tissues (the highest value of expression of a gene among tissues). We show that for half the amino acids in the study (8 of 16) the relative major codon usage rises with breadth of expression. We show that these amino acids are significantly more frequent, are smaller and simpler, and are more ancient than the rest of the amino acids. Similar, although weaker, results were obtained for maximum expression level. CONCLUSION There is evidence that codon bias in the human genome is related to selection, although the selection forces acting on codon bias may not be straightforward and may be different for different amino acids. We suggest that, in the first group of amino acids, selection acts to enhance translation efficiency in highly expressed genes by preferring major codons, and acts to reduce translation rate in lowly expressed genes by preferring non-major ones. In the second group of amino acids other selection forces, such as reducing misincorporation rate of expensive amino acids, in terms of their size/complexity, may be in action. The fact that codon usage is more strongly related to breadth of expression than to maximum expression level supports the notion, presented in a recent study, that codon choice may be related to the tRNA abundance in the tissue in which a gene is expressed.
Collapse
Affiliation(s)
- Daniel Kotlar
- Department of Computer Science, Tel-Hai Academic College, Upper Galilee, 12210, Israel
| | - Yizhar Lavner
- Department of Computer Science, Tel-Hai Academic College, Upper Galilee, 12210, Israel
| |
Collapse
|
148
|
Scaiewicz V, Sabbía V, Piovani R, Musto H. CpG islands are the second main factor shaping codon usage in human genes. Biochem Biophys Res Commun 2006; 343:1257-61. [PMID: 16581018 DOI: 10.1016/j.bbrc.2006.03.108] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2006] [Accepted: 03/15/2006] [Indexed: 01/22/2023]
Abstract
A correspondence analysis of codon usage in human genes revealed, as expected, that the first axis is strongly correlated with the base composition at synonymous third codon positions. At one extreme of the second axis were localized genes with a high frequency of NCG and CGN codons. The great majority of these sequences were embedded in CpG islands, while the opposite is true for the genes placed at the other extreme. The two main conclusions of this paper are: (1) the influence of CpG islands on codon usage, and (2) since the second axis is orthogonal (and therefore independent) of the first, GC3-rich genes are not necessarily associated with CpG islands.
Collapse
Affiliation(s)
- Viviana Scaiewicz
- Laboratorio de Organización y Evolución del Genoma, Facultad de Ciencias, Iguá 4225, Montevideo 11400, Uruguay
| | | | | | | |
Collapse
|
149
|
Kondrashov FA, Ogurtsov AY, Kondrashov AS. Selection in favor of nucleotides G and C diversifies evolution rates and levels of polymorphism at mammalian synonymous sites. J Theor Biol 2005; 240:616-26. [PMID: 16343547 DOI: 10.1016/j.jtbi.2005.10.020] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2005] [Revised: 10/26/2005] [Accepted: 10/27/2005] [Indexed: 11/24/2022]
Abstract
The impact of synonymous nucleotide substitutions on fitness in mammals remains controversial. Despite some indications of selective constraint, synonymous sites are often assumed to be neutral, and the rate of their evolution is used as a proxy for mutation rate. We subdivide all sites into four classes in terms of the mutable CpG context, nonCpG, postC, preG, and postCpreG, and compare four-fold synonymous sites and intron sites residing outside transposable elements. The distribution of the rate of evolution across all synonymous sites is trimodal. Rate of evolution at nonCpG synonymous sites, not preceded by C and not followed by G, is approximately 10% below that at such intron sites. In contrast, rate of evolution at postCpreG synonymous sites is approximately 30% above that at such intron sites. Finally, synonymous and intron postC and preG sites evolve at similar rates. The relationship between the levels of polymorphism at the corresponding synonymous and intron sites is very similar to that between their rates of evolution. Within every class, synonymous sites are occupied by G or C much more often than intron sites, whose nucleotide composition is consistent with neutral mutation-drift equilibrium. These patterns suggest that synonymous sites are under weak selection in favor of G and C, with the average coefficient s approximately 0.25/Ne approximately 10(-5), where Ne is the effective population size. Such selection decelerates evolution and reduces variability at sites with symmetric mutation, but has the opposite effects at sites where the favored nucleotides are more mutable. The amino-acid composition of proteins dictates that many synonymous sites are CpGprone, which causes them, on average, to evolve faster and to be more polymorphic than intron sites. An average genotype carries approximately 10(7) suboptimal nucleotides at synonymous sites, implying synergistic epistasis in selection against them.
Collapse
Affiliation(s)
- Fyodor A Kondrashov
- Section of Ecology, Behavior and Evolution, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0346, USA.
| | | | | |
Collapse
|
150
|
Sémon M, Lobry JR, Duret L. No Evidence for Tissue-Specific Adaptation of Synonymous Codon Usage in Humans. Mol Biol Evol 2005; 23:523-9. [PMID: 16280544 DOI: 10.1093/molbev/msj053] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
It has been proposed that the synonymous codon usage of human tissue-specific genes was under selective pressure to modulate the expression of proteins by codon-mediated translational control (Plotkin, J. B., H. Robins, and A. J. Levine. 2004. Tissue-specific codon usage and the expression of human genes. Proc. Natl. Acad. Sci. USA 101:12588-12591.) To test this model, we analyzed by internal correspondence analysis the codon usage of 2,126 human tissue-specific genes expressed in 18 different tissues. We confirm that synonymous codon usage differs significantly between the tissues. However, the effect is very weak: the variability of synonymous codon usage between tissues represents only 2.3% of the total codon usage variability. Moreover, this variability is directly linked to isochore-scale (>100 kb) variability of GC-content that affect both coding and introns or intergenic regions. This demonstrates that variations of synonymous codon usage between tissue-specific genes expressed in different tissues are due to regional variations of substitution patterns and not to translational selection.
Collapse
Affiliation(s)
- Marie Sémon
- Laboratoire de Biométrie et Biologie Evolutive (UMR 5558), Centre National de la Recherche Scientifique, Université Claude Bernard Lyon 1, Villeurbanne, France
| | | | | |
Collapse
|