1
|
Cope AL, Shah P. Intragenomic variation in non-adaptive nucleotide biases causes underestimation of selection on synonymous codon usage. PLoS Genet 2022; 18:e1010256. [PMID: 35714134 PMCID: PMC9246145 DOI: 10.1371/journal.pgen.1010256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 06/30/2022] [Accepted: 05/13/2022] [Indexed: 11/20/2022] Open
Abstract
Patterns of non-uniform usage of synonymous codons vary across genes in an organism and between species across all domains of life. This codon usage bias (CUB) is due to a combination of non-adaptive (e.g. mutation biases) and adaptive (e.g. natural selection for translation efficiency/accuracy) evolutionary forces. Most models quantify the effects of mutation bias and selection on CUB assuming uniform mutational and other non-adaptive forces across the genome. However, non-adaptive nucleotide biases can vary within a genome due to processes such as biased gene conversion (BGC), potentially obfuscating signals of selection on codon usage. Moreover, genome-wide estimates of non-adaptive nucleotide biases are lacking for non-model organisms. We combine an unsupervised learning method with a population genetics model of synonymous coding sequence evolution to assess the impact of intragenomic variation in non-adaptive nucleotide bias on quantification of natural selection on synonymous codon usage across 49 Saccharomycotina yeasts. We find that in the absence of a priori information, unsupervised learning can be used to identify genes evolving under different non-adaptive nucleotide biases. We find that the impact of intragenomic variation in non-adaptive nucleotide bias varies widely, even among closely-related species. We show that the overall strength and direction of translational selection can be underestimated by failing to account for intragenomic variation in non-adaptive nucleotide biases. Interestingly, genes falling into clusters identified by machine learning are also physically clustered across chromosomes. Our results indicate the need for more nuanced models of sequence evolution that systematically incorporate the effects of variable non-adaptive nucleotide biases on codon frequencies.
Collapse
Affiliation(s)
- Alexander L. Cope
- Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey, United States of America
- Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey, United States of America
| | - Premal Shah
- Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey, United States of America
| |
Collapse
|
2
|
Cope AL, Gilchrist MA. Quantifying shifts in natural selection on codon usage between protein regions: a population genetics approach. BMC Genomics 2022; 23:408. [PMID: 35637464 PMCID: PMC9153123 DOI: 10.1186/s12864-022-08635-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 05/03/2022] [Indexed: 11/28/2022] Open
Abstract
Background Codon usage bias (CUB), the non-uniform usage of synonymous codons, occurs across all domains of life. Adaptive CUB is hypothesized to result from various selective pressures, including selection for efficient ribosome elongation, accurate translation, mRNA secondary structure, and/or protein folding. Given the critical link between protein folding and protein function, numerous studies have analyzed the relationship between codon usage and protein structure. The results from these studies have often been contradictory, likely reflecting the differing methods used for measuring codon usage and the failure to appropriately control for confounding factors, such as differences in amino acid usage between protein structures and changes in the frequency of different structures with gene expression. Results Here we take an explicit population genetics approach to quantify codon-specific shifts in natural selection related to protein structure in S. cerevisiae and E. coli. Unlike other metrics of codon usage, our approach explicitly separates the effects of natural selection, scaled by gene expression, and mutation bias while naturally accounting for a region’s amino acid usage. Bayesian model comparisons suggest selection on codon usage varies only slightly between helix, sheet, and coil secondary structures and, similarly, between structured and intrinsically-disordered regions. Similarly, in contrast to prevous findings, we find selection on codon usage only varies slightly at the termini of helices in E. coli. Using simulated data, we show this previous work indicating “non-optimal” codons are enriched at the beginning of helices in S. cerevisiae was due to failure to control for various confounding factors (e.g. amino acid biases, gene expression, etc.), and rather than selection to modulate cotranslational folding. Conclusions Our results reveal a weak relationship between codon usage and protein structure, indicating that differences in selection on codon usage between structures are slight. In addition to the magnitude of differences in selection between protein structures being slight, the observed shifts appear to be idiosyncratic and largely codon-specific rather than systematic reversals in the nature of selection. Overall, our work demonstrates the statistical power and benefits of studying selective shifts on codon usage or other genomic features from an explicitly evolutionary approach. Limitations of this approach and future potential research avenues are discussed. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-022-08635-0).
Collapse
Affiliation(s)
- Alexander L Cope
- Genome Science and Technology, University of Tennessee, Knoxville, United States.,Current Address: Department of Genetics, Rutgers University, Piscataway, United States
| | - Michael A Gilchrist
- Genome Science and Technology, University of Tennessee, Knoxville, United States. .,National Institute for Mathematical and Biological Synthesis, Knoxville, TN, United States. .,Department of Ecology and Evolutionary Biology, University of Tennessee, Knoxville, United States.
| |
Collapse
|
3
|
Liu Y, Yang Q, Zhao F. Synonymous but Not Silent: The Codon Usage Code for Gene Expression and Protein Folding. Annu Rev Biochem 2021; 90:375-401. [PMID: 33441035 DOI: 10.1146/annurev-biochem-071320-112701] [Citation(s) in RCA: 65] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Codon usage bias, the preference for certain synonymous codons, is found in all genomes. Although synonymous mutations were previously thought to be silent, a large body of evidence has demonstrated that codon usage can play major roles in determining gene expression levels and protein structures. Codon usage influences translation elongation speed and regulates translation efficiency and accuracy. Adaptation of codon usage to tRNA expression determines the proteome landscape. In addition, codon usage biases result in nonuniform ribosome decoding rates on mRNAs, which in turn influence the cotranslational protein folding process that is critical for protein function in diverse biological processes. Conserved genome-wide correlations have also been found between codon usage and protein structures. Furthermore, codon usage is a major determinant of mRNA levels through translation-dependent effects on mRNA decay and translation-independent effects on transcriptional and posttranscriptional processes. Here, we discuss the multifaceted roles and mechanisms of codon usage in different gene regulatory processes.
Collapse
Affiliation(s)
- Yi Liu
- Department of Physiology, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9040, USA;
| | - Qian Yang
- Department of Physiology, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9040, USA;
| | - Fangzhou Zhao
- Department of Physiology, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9040, USA;
| |
Collapse
|
4
|
Beaulieu JM, O’Meara BC, Zaretzki R, Landerer C, Chai J, Gilchrist MA. Population Genetics Based Phylogenetics Under Stabilizing Selection for an Optimal Amino Acid Sequence: A Nested Modeling Approach. Mol Biol Evol 2019; 36:834-851. [PMID: 30521036 PMCID: PMC6445302 DOI: 10.1093/molbev/msy222] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
We present a new phylogenetic approach, selection on amino acids and codons (SelAC), whose substitution rates are based on a nested model linking protein expression to population genetics. Unlike simpler codon models that assume a single substitution matrix for all sites, our model more realistically represents the evolution of protein-coding DNA under the assumption of consistent, stabilizing selection using a cost-benefit approach. This cost-benefit approach allows us to generate a set of 20 optimal amino acid-specific matrix families using just a handful of parameters and naturally links the strength of stabilizing selection to protein synthesis levels, which we can estimate. Using a yeast data set of 100 orthologs for 6 taxa, we find SelAC fits the data much better than popular models by 104-105 Akike information criterion units adjusted for small sample bias. Our results also indicated that nested, mechanistic models better predict observed data patterns highlighting the improvement in biological realism in amino acid sequence evolution that our model provides. Additional parameters estimated by SelAC indicate that a large amount of nonphylogenetic, but biologically meaningful, information can be inferred from existing data. For example, SelAC prediction of gene-specific protein synthesis rates correlates well with both empirical (r=0.33-0.48) and other theoretical predictions (r=0.45-0.64) for multiple yeast species. SelAC also provides estimates of the optimal amino acid at each site. Finally, because SelAC is a nested approach based on clearly stated biological assumptions, future modifications, such as including shifts in the optimal amino acid sequence within or across lineages, are possible.
Collapse
Affiliation(s)
- Jeremy M Beaulieu
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville, TN
- National Institute for Mathematical and Biological Synthesis, Knoxville, TN
| | - Brian C O’Meara
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville, TN
- National Institute for Mathematical and Biological Synthesis, Knoxville, TN
| | | | - Cedric Landerer
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville, TN
- National Institute for Mathematical and Biological Synthesis, Knoxville, TN
| | - Juanjuan Chai
- National Institute for Mathematical and Biological Synthesis, Knoxville, TN
- Suite 1039, White Plains, NY
| | - Michael A Gilchrist
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville, TN
- National Institute for Mathematical and Biological Synthesis, Knoxville, TN
| |
Collapse
|
5
|
Abstract
Populations evolve as mutations arise in individual organisms and, through hereditary transmission, may become "fixed" (shared by all individuals) in the population. Most mutations are lethal or have negative fitness consequences for the organism. Others have essentially no effect on organismal fitness and can become fixed through the neutral stochastic process known as random drift. However, mutations may also produce a selective advantage that boosts their chances of reaching fixation. Regions of genomes where new mutations are beneficial, rather than neutral or deleterious, tend to evolve more rapidly due to positive selection. Genes involved in immunity and defense are a well-known example; rapid evolution in these genes presumably occurs because new mutations help organisms to prevail in evolutionary "arms races" with pathogens. In recent years genome-wide scans for selection have enlarged our understanding of the genome evolution of various species. In this chapter, we will focus on methods to detect selection on the genome. In particular, we will discuss probabilistic models and how they have changed with the advent of new genome-wide data now available.
Collapse
Affiliation(s)
- Carolin Kosiol
- Centre of Biological Diversity, School of Biology, University of St Andrews, Fife, UK.
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria.
| | - Maria Anisimova
- Institute of Applied Simulation, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
6
|
Abrahams L, Hurst LD. Adenine Enrichment at the Fourth CDS Residue in Bacterial Genes Is Consistent with Error Proofing for +1 Frameshifts. Mol Biol Evol 2018; 34:3064-3080. [PMID: 28961919 PMCID: PMC5850271 DOI: 10.1093/molbev/msx223] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Beyond selection for optimal protein functioning, coding sequences (CDSs) are under selection at the RNA and DNA levels. Here, we identify a possible signature of “dual-coding,” namely extensive adenine (A) enrichment at bacterial CDS fourth sites. In 99.07% of studied bacterial genomes, fourth site A use is greater than expected given genomic A-starting codon use. Arguing for nucleotide level selection, A-starting serine and arginine second codons are heavily utilized when compared with their non-A starting synonyms. Several models have the ability to explain some of this trend. In part, A-enrichment likely reduces 5′ mRNA stability, promoting translation initiation. However T/U, which may also reduce stability, is avoided. Further, +1 frameshifts on the initiating ATG encode a stop codon (TGA) provided A is the fourth residue, acting either as a frameshift “catch and destroy” or a frameshift stop and adjust mechanism and hence implicated in translation initiation. Consistent with both, genomes lacking TGA stop codons exhibit weaker fourth site A-enrichment. Sequences lacking a Shine–Dalgarno sequence and those without upstream leader genes, that may be more error prone during initiation, have greater utilization of A, again suggesting a role in initiation. The frameshift correction model is consistent with the notion that many genomic features are error-mitigation factors and provides the first evidence for site-specific out of frame stop codon selection. We conjecture that the NTG universal start codon may have evolved as a consequence of TGA being a stop codon and the ability of NTGA to rapidly terminate or adjust a ribosome.
Collapse
Affiliation(s)
- Liam Abrahams
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| | - Laurence D Hurst
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| |
Collapse
|
7
|
Sablok G, Chen TW, Lee CC, Yang C, Gan RC, Wegrzyn JL, Porta NL, Nayak KC, Huang PJ, Varotto C, Tang P. ChloroMitoCU: Codon patterns across organelle genomes for functional genomics and evolutionary applications. DNA Res 2017; 24:327-332. [PMID: 28419256 PMCID: PMC5499650 DOI: 10.1093/dnares/dsw044] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Accepted: 09/14/2016] [Indexed: 01/01/2023] Open
Abstract
Organelle genomes are widely thought to have arisen from reduction events involving cyanobacterial and archaeal genomes, in the case of chloroplasts, or α-proteobacterial genomes, in the case of mitochondria. Heterogeneity in base composition and codon preference has long been the subject of investigation of topics ranging from phylogenetic distortion to the design of overexpression cassettes for transgenic expression. From the overexpression point of view, it is critical to systematically analyze the codon usage patterns of the organelle genomes. In light of the importance of codon usage patterns in the development of hyper-expression organelle transgenics, we present ChloroMitoCU, the first-ever curated, web-based reference catalog of the codon usage patterns in organelle genomes. ChloroMitoCU contains the pre-compiled codon usage patterns of 328 chloroplast genomes (29,960 CDS) and 3,502 mitochondrial genomes (49,066 CDS), enabling genome-wide exploration and comparative analysis of codon usage patterns across species. ChloroMitoCU allows the phylogenetic comparison of codon usage patterns across organelle genomes, the prediction of codon usage patterns based on user-submitted transcripts or assembled organelle genes, and comparative analysis with the pre-compiled patterns across species of interest. ChloroMitoCU can increase our understanding of the biased patterns of codon usage in organelle genomes across multiple clades. ChloroMitoCU can be accessed at: http://chloromitocu.cgu.edu.tw/
Collapse
Affiliation(s)
- Gaurav Sablok
- Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all'Adige (TN), Italy
| | - Ting-Wen Chen
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Chi-Ching Lee
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Chi Yang
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Ruei-Chi Gan
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Jill L Wegrzyn
- Department of Ecology and Evolutionary Biology, University 10 of Connecticut, 75 North Eagleville Road, Storrs, CT 06269-3043 USA
| | - Nicola L Porta
- Department of Sustainable Agrobiosystems and Bioresources, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all'Adige (TN), Italy.,MOUNTFOR Project Centre, European Forest Institute, Via E. Mach 1, 38010 San Michele all'Adige, Trento, Italy
| | - Kinshuk C Nayak
- Bioinformatics Centre, Institute of Life Sciences, Department of Biotechnology, Govt. India, Nalco Square, Bhubaneswar - 751 023, India
| | - Po-Jung Huang
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Claudio Varotto
- Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all'Adige (TN), Italy
| | - Petrus Tang
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan.,Molecular Infectious Diseases Research Center, Chang Gung Memorial Hospital, Kweishan, Taoyuan 333, Taiwan
| |
Collapse
|
8
|
Genome-wide comparative analysis of codon usage bias and codon context patterns among cyanobacterial genomes. Mar Genomics 2016; 32:31-39. [PMID: 27733306 DOI: 10.1016/j.margen.2016.10.001] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Revised: 09/11/2016] [Accepted: 10/03/2016] [Indexed: 11/20/2022]
Abstract
With the increasing accumulation of genomic sequence information of prokaryotes, the study of codon usage bias has gained renewed attention. The purpose of this study was to examine codon selection pattern within and across cyanobacterial species belonging to diverse taxonomic orders and habitats. We performed detailed comparative analysis of cyanobacterial genomes with respect to codon bias. Our analysis reflects that in cyanobacterial genomes, A- and/or T-ending codons were used predominantly in the genes whereas G- and/or C-ending codons were largely avoided. Variation in the codon context usage of cyanobacterial genes corresponded to the clustering of cyanobacteria as per their GC content. Analysis of codon adaptation index (CAI) and synonymous codon usage order (SCUO) revealed that majority of genes are associated with low codon bias. Codon selection pattern in cyanobacterial genomes reflected compositional constraints as major influencing factor. It is also identified that although, mutational constraint may play some role in affecting codon usage bias in cyanobacteria, compositional constraint in terms of genomic GC composition coupled with environmental factors affected codon selection pattern in cyanobacterial genomes.
Collapse
|
9
|
Satapathy SS, Powdel BR, Buragohain AK, Ray SK. Discrepancy among the synonymous codons with respect to their selection as optimal codon in bacteria. DNA Res 2016; 23:441-449. [PMID: 27426467 PMCID: PMC5066170 DOI: 10.1093/dnares/dsw027] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2016] [Accepted: 05/19/2016] [Indexed: 01/05/2023] Open
Abstract
The different triplets encoding the same amino acid, termed as synonymous codons, are not equally abundant in a genome. Factors such as G + C% and tRNA are known to influence their abundance in a genome. However, the order of the nucleotide in each codon per se might also be another factor impacting on its abundance values. Of the synonymous codons for specific amino acids, some are preferentially used in the high expression genes that are referred to as the 'optimal codons' (OCs). In this study, we compared OCs of the 18 amino acids in 221 species of bacteria. It is observed that there is amino acid specific influence for the selection of OCs. There is also influence of phylogeny in the choice of OCs for some amino acids such as Glu, Gln, Lys and Leu. The phenomenon of codon bias is also supported by the comparative studies of the abundance values of the synonymous codons with same G + C. It is likely that the order of the nucleotides in the triplet codon is also perhaps involved in the phenomenon of codon usage bias in organisms.
Collapse
Affiliation(s)
| | - Bhesh Raj Powdel
- Department of Statistics, Darrang College, Tezpur 784001, Assam, India
| | - Alak Kumar Buragohain
- Department of Molecular Biology and Biotechnology, Tezpur University, Napaam, Tezpur 784028, Assam, India.,Office of the Vice-Chancellor, Dibrugarh University, Dibrugarh 786004, Assam, India
| | - Suvendra Kumar Ray
- Department of Molecular Biology and Biotechnology, Tezpur University, Napaam, Tezpur 784028, Assam, India
| |
Collapse
|
10
|
Kubatko L, Shah P, Herbei R, Gilchrist MA. A codon model of nucleotide substitution with selection on synonymous codon usage. Mol Phylogenet Evol 2015; 94:290-7. [PMID: 26358614 DOI: 10.1016/j.ympev.2015.08.026] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2014] [Revised: 08/22/2015] [Accepted: 08/30/2015] [Indexed: 10/23/2022]
Abstract
The quality of phylogenetic inference made from protein-coding genes depends, in part, on the realism with which the codon substitution process is modeled. Here we propose a new mechanistic model that combines the standard M0 substitution model of Yang (1997) with a simplified model from Gilchrist (2007) that includes selection on synonymous substitutions as a function of codon-specific nonsense error rates. We tested the newly proposed model by applying it to 104 protein-coding genes in brewer's yeast, and compared the fit of the new model to the standard M0 model and to the mutation-selection model of Yang and Nielsen (2008) using the AIC. Our new model provided significantly better fit in approximately 85% of the cases considered for the basic M0 model and in approximately 25% of the cases for the M0 model with estimated codon frequencies, but only in a few cases when the mutation-selection model was considered. However, our model includes a parameter that can be interpreted as a measure of the rate of protein production, and the estimates of this parameter were highly correlated with an independent measure of protein production for the yeast genes considered here. Finally, we found that in some cases the new model led to the preference of a different phylogeny for a subset of the genes considered, indicating that substitution model choice may have an impact on the estimated phylogeny.
Collapse
Affiliation(s)
- Laura Kubatko
- Department of Statistics, The Ohio State University, Columbus, OH 43210, United States; Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, United States.
| | - Premal Shah
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Radu Herbei
- Department of Statistics, The Ohio State University, Columbus, OH 43210, United States
| | - Michael A Gilchrist
- Department of Ecology and Evolutionary Biology, University of Tennessee - Knoxville, Knoxville, TN 37996-1610, United States
| |
Collapse
|
11
|
Gilchrist MA, Chen WC, Shah P, Landerer CL, Zaretzki R. Estimating Gene Expression and Codon-Specific Translational Efficiencies, Mutation Biases, and Selection Coefficients from Genomic Data Alone. Genome Biol Evol 2015; 7:1559-79. [PMID: 25977456 PMCID: PMC4494061 DOI: 10.1093/gbe/evv087] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Extracting biologically meaningful information from the continuing flood of genomic data is a major challenge in the life sciences. Codon usage bias (CUB) is a general feature of most genomes and is thought to reflect the effects of both natural selection for efficient translation and mutation bias. Here we present a mechanistically interpretable, Bayesian model (ribosome overhead costs Stochastic Evolutionary Model of Protein Production Rate [ROC SEMPPR]) to extract meaningful information from patterns of CUB within a genome. ROC SEMPPR is grounded in population genetics and allows us to separate the contributions of mutational biases and natural selection against translational inefficiency on a gene-by-gene and codon-by-codon basis. Until now, the primary disadvantage of similar approaches was the need for genome scale measurements of gene expression. Here, we demonstrate that it is possible to both extract accurate estimates of codon-specific mutation biases and translational efficiencies while simultaneously generating accurate estimates of gene expression, rather than requiring such information. We demonstrate the utility of ROC SEMPPR using the Saccharomyces cerevisiae S288c genome. When we compare our model fits with previous approaches we observe an exceptionally high agreement between estimates of both codon-specific parameters and gene expression levels ([Formula: see text] in all cases). We also observe strong agreement between our parameter estimates and those derived from alternative data sets. For example, our estimates of mutation bias and those from mutational accumulation experiments are highly correlated ([Formula: see text]). Our estimates of codon-specific translational inefficiencies and tRNA copy number-based estimates of ribosome pausing time ([Formula: see text]), and mRNA and ribosome profiling footprint-based estimates of gene expression ([Formula: see text]) are also highly correlated, thus supporting the hypothesis that selection against translational inefficiency is an important force driving the evolution of CUB. Surprisingly, we find that for particular amino acids, codon usage in highly expressed genes can still be largely driven by mutation bias and that failing to take mutation bias into account can lead to the misidentification of an amino acid's "optimal" codon. In conclusion, our method demonstrates that an enormous amount of biologically important information is encoded within genome scale patterns of codon usage, accessing this information does not require gene expression measurements, but instead carefully formulated biologically interpretable models.
Collapse
Affiliation(s)
- Michael A Gilchrist
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville National Institute for Mathematical and Biological Synthesis, Knoxville, Tennessee
| | - Wei-Chen Chen
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland
| | - Premal Shah
- Department of Biology, University of Pennsylvania
| | - Cedric L Landerer
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville
| | - Russell Zaretzki
- National Institute for Mathematical and Biological Synthesis, Knoxville, Tennessee Department of Business Analytics and Statistics, University of Tennessee, Knoxville
| |
Collapse
|
12
|
Tuller T, Zur H. Multiple roles of the coding sequence 5' end in gene expression regulation. Nucleic Acids Res 2014; 43:13-28. [PMID: 25505165 PMCID: PMC4288200 DOI: 10.1093/nar/gku1313] [Citation(s) in RCA: 133] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The codon composition of the coding sequence's (ORF) 5′ end first few dozen codons is known to be distinct to that of the rest of the ORF. Various explanations for the unusual codon distribution in this region have been proposed in recent years, and include, among others, novel regulatory mechanisms of translation initiation and elongation. However, due to the fact that many overlapping regulatory signals are suggested to be associated with this relatively short region, its research is challenging. Here, we review the currently known signals that appear in this region, the theories related to the way they regulate translation and affect the organismal fitness, and the debates they provoke.
Collapse
Affiliation(s)
- Tamir Tuller
- Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv University, Tel Aviv, Israel The Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 69978, Israel
| | - Hadas Zur
- Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
13
|
McCandlish DM, Stoltzfus A. Modeling evolution using the probability of fixation: history and implications. QUARTERLY REVIEW OF BIOLOGY 2014; 89:225-52. [PMID: 25195318 DOI: 10.1086/677571] [Citation(s) in RCA: 123] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Many models of evolution calculate the rate of evolution by multiplying the rate at which new mutations originate within a population by a probability of fixation. Here we review the historical origins, contemporary applications, and evolutionary implications of these "origin-fixation" models, which are widely used in evolutionary genetics, molecular evolution, and phylogenetics. Origin-fixation models were first introduced in 1969, in association with an emerging view of "molecular" evolution. Early origin-fixation models were used to calculate an instantaneous rate of evolution across a large number of independently evolving loci; in the 1980s and 1990s, a second wave of origin-fixation models emerged to address a sequence of fixation events at a single locus. Although origin fixation models have been applied to a broad array of problems in contemporary evolutionary research, their rise in popularity has not been accompanied by an increased appreciation of their restrictive assumptions or their distinctive implications. We argue that origin-fixation models constitute a coherent theory of mutation-limited evolution that contrasts sharply with theories of evolution that rely on the presence of standing genetic variation. A major unsolved question in evolutionary biology is the degree to which these models provide an accurate approximation of evolution in natural populations.
Collapse
|
14
|
Zhou JH, Zhang J, Sun DJ, Ma Q, Chen HT, Ma LN, Ding YZ, Liu YS. The distribution of synonymous codon choice in the translation initiation region of dengue virus. PLoS One 2013; 8:e77239. [PMID: 24204777 PMCID: PMC3808402 DOI: 10.1371/journal.pone.0077239] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Accepted: 08/30/2013] [Indexed: 11/18/2022] Open
Abstract
Dengue is the most common arthropod-borne viral (Arboviral) illness in humans. The genetic features concerning the codon usage of dengue virus (DENV) were analyzed by the relative synonymous codon usage, the effective number of codons and the codon adaptation index. The evolutionary distance between DENV and the natural hosts (Homo sapiens, Pan troglodytes, Aedes albopictus and Aedes aegypti) was estimated by a novel formula. Finally, the synonymous codon usage preference for the translation initiation region of this virus was also analyzed. The result indicates that the general trend of the 59 synonymous codon usage of the four genotypes of DENV are similar to each other, and this pattern has no link with the geographic distribution of the virus. The effect of codon usage pattern of Aedes albopictus and Aedes aegypti on the formation of codon usage of DENV is stronger than that of the two primates. Turning to the codon usage preference of the translation initiation region of this virus, some codons pairing to low tRNA copy numbers in the two primates have a stronger tendency to exist in the translation initiation region than those in the open reading frame of DENV. Although DENV, like other RNA viruses, has a high mutation to adapt its hosts, the regulatory features about the synonymous codon usage have been 'branded' on the translation initiation region of this virus in order to hijack the translational mechanisms of the hosts.
Collapse
Affiliation(s)
- Jian-hua Zhou
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences. Lanzhou, Gansu, P.R. China
| | - Jie Zhang
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences. Lanzhou, Gansu, P.R. China
| | - Dong-jie Sun
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences. Lanzhou, Gansu, P.R. China
| | - Qi Ma
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences. Lanzhou, Gansu, P.R. China
| | - Hao-tai Chen
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences. Lanzhou, Gansu, P.R. China
| | - Li-na Ma
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences. Lanzhou, Gansu, P.R. China
| | - Yao-zhong Ding
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences. Lanzhou, Gansu, P.R. China
| | - Yong-sheng Liu
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences. Lanzhou, Gansu, P.R. China
- * E-mail:
| |
Collapse
|
15
|
O'Neill PK, Or M, Erill I. scnRCA: a novel method to detect consistent patterns of translational selection in mutationally-biased genomes. PLoS One 2013; 8:e76177. [PMID: 24116094 PMCID: PMC3792112 DOI: 10.1371/journal.pone.0076177] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 08/23/2013] [Indexed: 12/04/2022] Open
Abstract
Codon usage bias (CUB) results from the complex interplay between translational selection and mutational biases. Current methods for CUB analysis apply heuristics to integrate both components, limiting the depth and scope of CUB analysis as a technique to probe into the evolution and optimization of protein-coding genes. Here we introduce a self-consistent CUB index (scnRCA) that incorporates implicit correction for mutational biases, facilitating exploration of the translational selection component of CUB. We validate this technique using gene expression data and we apply it to a detailed analysis of CUB in the Pseudomonadales. Our results illustrate how the selective enrichment of specific codons among highly expressed genes is preserved in the context of genome-wide shifts in codon frequencies, and how the balance between mutational and translational biases leads to varying definitions of codon optimality. We extend this analysis to other moderate and fast growing bacteria and we provide unified support for the hypothesis that C- and A-ending codons of two-box amino acids, and the U-ending codons of four-box amino acids, are systematically enriched among highly expressed genes across bacteria. The use of an unbiased estimator of CUB allows us to report for the first time that the signature of translational selection is strongly conserved in the Pseudomonadales in spite of drastic changes in genome composition, and extends well beyond the core set of highly optimized genes in each genome. We generalize these results to other moderate and fast growing bacteria, hinting at selection for a universal pattern of gene expression that is conserved and detectable in conserved patterns of codon usage bias.
Collapse
Affiliation(s)
- Patrick K. O'Neill
- Department of Biological Sciences, University of Maryland Baltimore County (UMBC), Baltimore, Maryland, United States of America
| | - Mindy Or
- Department of Biological Sciences, University of Maryland Baltimore County (UMBC), Baltimore, Maryland, United States of America
| | - Ivan Erill
- Department of Biological Sciences, University of Maryland Baltimore County (UMBC), Baltimore, Maryland, United States of America
- * E-mail:
| |
Collapse
|
16
|
Sablok G, Wu X, Kuo J, Nayak KC, Baev V, Varotto C, Zhou F. Combinational effect of mutational bias and translational selection for translation efficiency in tomato (Solanum lycopersicum) cv. Micro-Tom. Genomics 2013; 101:290-5. [PMID: 23474140 DOI: 10.1016/j.ygeno.2013.02.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2012] [Revised: 01/21/2013] [Accepted: 02/21/2013] [Indexed: 11/24/2022]
Abstract
We conducted a comprehensive analysis of codon usage bias (CUB) based on the available non-redundant full-length cDNA (nrFLcDNA) and expressed sequence tags (ESTs) data of cultivar Micro-Tom and evaluated the associations of observed CUB and measurements of transcriptional and translational effectiveness. The analysis presented in our study suggests a correlation, which is negative but highly correlated between Axis 1 and GC3s (r=-0.827, P<0.01), indicating that mutational bias has a significant and dominant repressive role to the choices of GC3. We also observed a strong positive correlation between codon adaptation index (CAI) and translational adaptation index (tAIg) (0.407, P<0.01), which demonstrates the facilitation of efficient translation by the optimal codon usage patterns of the highly expressed genes. We believe that the complete set of optimal codon usage patterns detected in this study will serve as a model to enhance the transgenesis in the studied cultivar of Solanum lycopersicum.
Collapse
Affiliation(s)
- Gaurav Sablok
- Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E Mach 1, 38010 S. Michele all'Adige (TN), Italy.
| | | | | | | | | | | | | |
Collapse
|
17
|
Piovesan A, Vitale L, Pelleri MC, Strippoli P. Universal tight correlation of codon bias and pool of RNA codons (codonome): The genome is optimized to allow any distribution of gene expression values in the transcriptome from bacteria to humans. Genomics 2013; 101:282-9. [PMID: 23466472 DOI: 10.1016/j.ygeno.2013.02.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Revised: 02/18/2013] [Accepted: 02/21/2013] [Indexed: 10/27/2022]
Abstract
Codon bias is the phenomenon in which distinct synonymous codons are used with different frequencies. We define here the "codonome value" as the total number of codons present across all the expressed mRNAs in a given biological condition. We have developed the "CODONOME" software, which calculates the codon bias and, following integration with a gene expression profile, estimates the actual frequency of each codon at the transcriptome level (codonome bias) of a given tissue. Systematic analysis across different human tissues and multiple species shows a surprisingly tight correlation between the codon bias and the codonome bias. An aneuploidy and cancer condition such as that of Down Syndrome-related acute megakaryoblastic leukemia (DS-AMKL), does not appear to alter this relationship. The law of correlation between codon bias and codonome emerges as a property of the distribution and range of the number, sequence and expression level of the genes in a genome.
Collapse
Affiliation(s)
- Allison Piovesan
- Department of Experimental, Diagnostic and Specialty Medicine, Activity of Histology, Embryology and Applied Biology, University of Bologna, via Belmeloro 8, 40126 Bologna (BO), Italy.
| | | | | | | |
Collapse
|
18
|
Wald N, Alroy M, Botzman M, Margalit H. Codon usage bias in prokaryotic pyrimidine-ending codons is associated with the degeneracy of the encoded amino acids. Nucleic Acids Res 2012; 40:7074-83. [PMID: 22581775 PMCID: PMC3424539 DOI: 10.1093/nar/gks348] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Synonymous codons are unevenly distributed among genes, a phenomenon termed codon usage bias. Understanding the patterns of codon bias and the forces shaping them is a major step towards elucidating the adaptive advantage codon choice can confer at the level of individual genes and organisms. Here, we perform a large-scale analysis to assess codon usage bias pattern of pyrimidine-ending codons in highly expressed genes in prokaryotes. We find a bias pattern linked to the degeneracy of the encoded amino acid. Specifically, we show that codon-pairs that encode two- and three-fold degenerate amino acids are biased towards the C-ending codon while codons encoding four-fold degenerate amino acids are biased towards the U-ending codon. This codon usage pattern is widespread in prokaryotes, and its strength is correlated with translational selection both within and between organisms. We show that this bias is associated with an improved correspondence with the tRNA pool, avoidance of mis-incorporation errors during translation and moderate stability of codon–anticodon interaction, all consistent with more efficient translation.
Collapse
Affiliation(s)
- Naama Wald
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel-Canada, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 91120, Israel
| | | | | | | |
Collapse
|
19
|
Hilterbrand A, Saelens J, Putonti C. CBDB: the codon bias database. BMC Bioinformatics 2012; 13:62. [PMID: 22536831 PMCID: PMC3463423 DOI: 10.1186/1471-2105-13-62] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2012] [Accepted: 03/26/2012] [Indexed: 02/01/2023] Open
Abstract
Background In many genomes, a clear preference in the usage of particular codons exists. The mechanisms that induce codon biases remain an open question; studies have attributed codon usage to translational selection, mutational bias and drift. Furthermore, correlations between codon usage within host genomes and their viral pathogens have been observed for a myriad of host-virus systems. As such, numerous studies have investigated codon usage and codon bias in an effort to better understand how species evolve. Numerous metrics have been developed to identify biases in codon usage. In addition, a few data repositories of codon bias data are available, differing in the metrics reported as well as the number and taxonomy of strains examined. Description We have created a new web resource called the Codon Bias Database (CBDB) which provides information regarding the codon bias within the set of highly expressed genes for 300+ bacterial genomes. CBDB was developed to provide a resource for researchers investigating codon bias in bacteria, facilitating comparisons between strains and species. Furthermore, the site was created to serve those studying adaptation in phage; the genera selected for this first release of CBDB all have sequenced, annotated bacteriophages. The annotations and sequences for the highly expressed gene set are available for each strain in addition to the strain’s codon bias measurements. Conclusions Comparing species and strains provides a comprehensive look at how codon usage has been shaped over evolutionary time and can elucidate the putative mechanisms behind it. The Codon Bias Database provides a centralized repository of look-up tables and codon usage bias measures for a wide variety of genera, species and strains. Through our analysis of the variation in codon usage within the strains presently available, we find that most members of a genus have a codon composition most similar to other members of its genus, although not necessarily other members of its species.
Collapse
Affiliation(s)
- Adam Hilterbrand
- Department of Biology, Loyola University Chicago, 1032 W Sheridan Road, Chicago, IL 60660, USA
| | | | | |
Collapse
|
20
|
Aoi MC, Rourke BC. Interspecific and intragenic differences in codon usage bias among vertebrate myosin heavy-chain genes. J Mol Evol 2011; 73:74-93. [PMID: 21915654 DOI: 10.1007/s00239-011-9457-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2010] [Accepted: 08/19/2011] [Indexed: 01/13/2023]
Abstract
Synonymous codon usage bias is a broadly observed phenomenon in bacteria, plants, and invertebrates and may result from selection. However, the role of selective pressures in shaping codon bias is still controversial in vertebrates, particularly for mammals. The myosin heavy-chain (MyHC) gene family comprises multiple isoforms of the major force-producing contractile protein in cardiac and skeletal muscles. Slow and fast genes are tandemly arrayed on separate chromosomes, and have distinct patterns of functionality and expression in muscle. We analyze both full-length MyHC genes (~5400 bp) and a larger collection of partial sequences at the 3' end (~500 bp). The MyHC isoforms are an interesting system in which to study codon usage bias because of their length, expression, and critical importance to organismal mobility. Codon bias and GC content differs among MyHC genes with regards to functional type, isoform, and position within the gene. Codon bias even varies by isoform within a species. We find evidence in favor of both chromosomal influences on nucleotide composition and selection against nonsense errors (SANE) acting on codon usage in MyHC genes. Intragenic variation in codon bias and elongation rate is significant, with a strong trend for increasing codon bias and elongation rate towards the 3' end of the gene, although the trend is dependent upon the degeneracy class of the codons. Therefore, patterns of codon usage in MyHC genes are consistent with models supporting SANE as a major force shaping codon usage.
Collapse
Affiliation(s)
- Mikio C Aoi
- Department of Mathematics, North Carolina State University, Raleigh, NC 27695, USA
| | | |
Collapse
|
21
|
Determinants of translation efficiency and accuracy. Mol Syst Biol 2011; 7:481. [PMID: 21487400 PMCID: PMC3101949 DOI: 10.1038/msb.2011.14] [Citation(s) in RCA: 325] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2010] [Accepted: 02/15/2011] [Indexed: 12/17/2022] Open
Abstract
A given protein sequence can be encoded by an astronomical number of alternative nucleotide sequences. Recent research has revealed that this flexibility provides evolution with multiple ways to tune the efficiency and fidelity of protein translation and folding. Proper functioning of biological cells requires that the process of protein expression be carried out with high efficiency and fidelity. Given an amino-acid sequence of a protein, multiple degrees of freedom still remain that may allow evolution to tune efficiency and fidelity for each gene under various conditions and cell types. Particularly, the redundancy of the genetic code allows the choice between alternative codons for the same amino acid, which, although ‘synonymous,' may exert dramatic effects on the process of translation. Here we review modern developments in genomics and systems biology that have revolutionized our understanding of the multiple means by which translation is regulated. We suggest new means to model the process of translation in a richer framework that will incorporate information about gene sequences, the tRNA pool of the organism and the thermodynamic stability of the mRNA transcripts. A practical demonstration of a better understanding of the process would be a more accurate prediction of the proteome, given the transcriptome at a diversity of biological conditions.
Collapse
|
22
|
Schmid P, Flegel WA. Codon usage in vertebrates is associated with a low risk of acquiring nonsense mutations. J Transl Med 2011; 9:87. [PMID: 21651781 PMCID: PMC3123582 DOI: 10.1186/1479-5876-9-87] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2011] [Accepted: 06/08/2011] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Codon usage in genomes is biased towards specific subsets of codons. Codon usage bias affects translational speed and accuracy, and it is associated with the tRNA levels and the GC content of the genome. Spontaneous mutations drive genomes to a low GC content. Active cellular processes are needed to maintain a high GC content, which influences the codon usage of a species. Loss-of-function mutations, such as nonsense mutations, are the molecular basis of many recessive alleles, which can greatly affect the genome of an organism and are the cause of many genetic diseases in humans. METHODS We developed an event based model to calculate the risk of acquiring nonsense mutations in coding sequences. Complete coding sequences and genomes of 40 eukaryotes were analyzed for GC and CpG content, codon usage, and the associated risk of acquiring nonsense mutations. We included one species per genus for all eukaryotes with available reference sequence. RESULTS We discovered that the codon usage bias detected in genomes of high GC content decreases the risk of acquiring nonsense mutations (Pearson's r = -0.95; P < 0.0001). In the genomes of all examined vertebrates, including humans, this risk was lower than expected (0.93 ± 0.02; mean ± SD) and lower than the risk in genomes of non-vertebrates (1.02 ± 0.13; P = 0.019). CONCLUSIONS While the maintenance of a high GC content is energetically costly, it is associated with a codon usage bias harboring a low risk of acquiring nonsense mutations. The reduced exposure to this risk may contribute to the fitness of vertebrates.
Collapse
Affiliation(s)
- Pirmin Schmid
- National Institutes of Health, Clinical Center, Bethesda, MD, USA
| | | |
Collapse
|
23
|
Explaining complex codon usage patterns with selection for translational efficiency, mutation bias, and genetic drift. Proc Natl Acad Sci U S A 2011; 108:10231-6. [PMID: 21646514 DOI: 10.1073/pnas.1016719108] [Citation(s) in RCA: 104] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The genetic code is redundant with most amino acids using multiple codons. In many organisms, codon usage is biased toward particular codons. Understanding the adaptive and nonadaptive forces driving the evolution of codon usage bias (CUB) has been an area of intense focus and debate in the fields of molecular and evolutionary biology. However, their relative importance in shaping genomic patterns of CUB remains unsolved. Using a nested model of protein translation and population genetics, we show that observed gene level variation of CUB in Saccharomyces cerevisiae can be explained almost entirely by selection for efficient ribosomal usage, genetic drift, and biased mutation. The correlation between observed codon counts within individual genes and our model predictions is 0.96. Although a variety of factors shape patterns of CUB at the level of individual sites within genes, our results suggest that selection for efficient ribosome usage is a central force in shaping codon usage at the genomic scale. In addition, our model allows direct estimation of codon-specific mutation rates and elongation times and can be readily applied to any organism with high-throughput expression datasets. More generally, we have developed a natural framework for integrating models of molecular processes to population genetics models to quantitatively estimate parameters underlying fundamental biological processes, such a protein translation.
Collapse
|
24
|
Shah P, Gilchrist MA. Effect of correlated tRNA abundances on translation errors and evolution of codon usage bias. PLoS Genet 2010; 6:e1001128. [PMID: 20862306 PMCID: PMC2940732 DOI: 10.1371/journal.pgen.1001128] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2010] [Accepted: 08/18/2010] [Indexed: 11/19/2022] Open
Abstract
Despite the fact that tRNA abundances are thought to play a major role in determining translation error rates, their distribution across the genetic code and the resulting implications have received little attention. In general, studies of codon usage bias (CUB) assume that codons with higher tRNA abundance have lower missense error rates. Using a model of protein translation based on tRNA competition and intra-ribosomal kinetics, we show that this assumption can be violated when tRNA abundances are positively correlated across the genetic code. Examining the distribution of tRNA abundances across 73 bacterial genomes from 20 different genera, we find a consistent positive correlation between tRNA abundances across the genetic code. This work challenges one of the fundamental assumptions made in over 30 years of research on CUB that codons with higher tRNA abundances have lower missense error rates and that missense errors are the primary selective force responsible for CUB. Codon usage bias (CUB) is a ubiquitous and important phenomenon. CUB is thought to be driven primarily due to selection against missense errors. For over 30 years, the standard model of translation errors has implicitly assumed that the relationship between translation errors and tRNA abundances are inversely related. This is based on an implicit and unstated assumption that the distribution of tRNA abundances across the genetic code are uncorrelated. Examining these abundance distributions across 73 bacterial genomes from 20 different genera, we find a consistent positive correlation between tRNA abundances across the genetic code. We further show that codons with higher tRNA abundances are not always “optimal” with respect to reducing the missense error rate and hence cannot explain the observed patterns of CUB.
Collapse
Affiliation(s)
- Premal Shah
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville, Tennessee, USA.
| | | |
Collapse
|
25
|
Entropy and Information Approaches to Genetic Diversity and its Expression: Genomic Geography. ENTROPY 2010. [DOI: 10.3390/e12071765] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|