51
|
Serohijos AWR, Shakhnovich EI. Contribution of selection for protein folding stability in shaping the patterns of polymorphisms in coding regions. Mol Biol Evol 2013; 31:165-76. [PMID: 24124208 PMCID: PMC3879451 DOI: 10.1093/molbev/mst189] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
The patterns of polymorphisms in genomes are imprints of the evolutionary forces at play in nature. In particular, polymorphisms have been extensively used to infer the fitness effects of mutations and their dynamics of fixation. However, the role and contribution of molecular biophysics to these observations remain unclear. Here, we couple robust findings from protein biophysics, enzymatic flux theory, the selection against the cytotoxic effects of protein misfolding, and explicit population dynamics simulations in the polyclonal regime. First, we recapitulate results on the dynamics of clonal interference and on the shape of the DFE, thus providing them with a molecular and mechanistic foundation. Second, we predict that if evolution is indeed under the dynamic equilibrium of mutation-selection balance, the fraction of stabilizing and destabilizing mutations is almost equal among single-nucleotide polymorphisms segregating at high allele frequencies. This prediction is proven true for polymorphisms in the human coding region. Overall, our results show how selection for protein folding stability predominantly shapes the patterns of polymorphisms in coding regions.
Collapse
|
52
|
Jovelin R. Pleiotropic constraints, expression level, and the evolution of miRNA sequences. J Mol Evol 2013; 77:206-20. [PMID: 24100521 DOI: 10.1007/s00239-013-9588-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2013] [Accepted: 09/24/2013] [Indexed: 12/22/2022]
Abstract
Post-transcriptional gene regulation mediated by microRNAs (miRNAs) plays critical roles during development by modulating gene expression and conferring robustness to stochastic errors. Phylogenetic analyses suggest that miRNA acquisition could play a role in phenotypic innovation. Moreover, miRNA-induced regulation strongly impacts genome evolution, increasing selective constraints on 3'UTRs, protein sequences, and expression level divergence. Thus, it is essential to understand the factors governing sequence evolution for this important class of regulatory molecules. Investigation of the patterns of molecular evolution at miRNA loci have been limited in Caenorhabditis elegans because of the lack of a close outgroup. Instead, I used Caenorhabditis briggsae as the focus point of this study because of its close relationship to Caenorhabditis sp. 9. I also corroborated the patterns of sequence evolution in Caenorhabditis using published orthologous relationships among miRNAs in Drosophila. In nematodes and in flies, miRNA sequence divergence is not influenced by the genomic neighborhood (i.e., intronic or intergenic) but is nevertheless affected by the genomic context because X-linked miRNAs evolve faster than autosomal miRNAs. However, this effect of chromosomal linkage can be explained by differential expression levels rather than a fast-X effect. The results presented here support a universal negative relationship between rates of molecular evolution and expression level, and suggest that mutations in highly expressed miRNAs are more likely to be deleterious because they potentially affect a larger number of target genes. Finally, I show that many single family member miRNAs evolve faster than miRNAs from multigene families and have limited functional scope, suggesting that they are not strongly integrated in gene regulatory networks.
Collapse
Affiliation(s)
- Richard Jovelin
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, M5S 3B2, Canada,
| |
Collapse
|
53
|
Serohijos AWR, Lee SYR, Shakhnovich EI. Highly abundant proteins favor more stable 3D structures in yeast. Biophys J 2013; 104:L1-3. [PMID: 23442924 DOI: 10.1016/j.bpj.2012.11.3838] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2012] [Revised: 11/19/2012] [Accepted: 11/29/2012] [Indexed: 11/26/2022] Open
Abstract
To understand the variation of protein sequences in nature, we need to reckon with evolutionary constraints that are biophysical, cellular, and ecological. Here, we show that under the global selection against protein misfolding, there exists a scaling among protein folding stability, protein cellular abundance, and effective population size. The specific scaling implies that the several-orders-of-magnitude range of protein abundances in the cell should leave imprints on extant protein structures, a prediction that is supported by our structural analysis of the yeast proteome.
Collapse
|
54
|
Dixit PD, Maslov S. Evolutionary capacitance and control of protein stability in protein-protein interaction networks. PLoS Comput Biol 2013; 9:e1003023. [PMID: 23592969 PMCID: PMC3617028 DOI: 10.1371/journal.pcbi.1003023] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2012] [Accepted: 02/20/2013] [Indexed: 11/19/2022] Open
Abstract
In addition to their biological function, protein complexes reduce the exposure of the constituent proteins to the risk of undesired oligomerization by reducing the concentration of the free monomeric state. We interpret this reduced risk as a stabilization of the functional state of the protein. We estimate that protein-protein interactions can account for ~2-4 k(B)T of additional stabilization; a substantial contribution to intrinsic stability. We hypothesize that proteins in the interaction network act as evolutionary capacitors which allows their binding partners to explore regions of the sequence space which correspond to less stable proteins. In the interaction network of baker's yeast, we find that statistically proteins that receive higher energetic benefits from the interaction network are more likely to misfold. A simplified fitness landscape wherein the fitness of an organism is inversely proportional to the total concentration of unfolded proteins provides an evolutionary justification for the proposed trends. We conclude by outlining clear biophysical experiments to test our predictions.
Collapse
Affiliation(s)
- Purushottam D. Dixit
- Biology, Brookhaven National Laboratory, Upton, New York, United States of America
| | - Sergei Maslov
- Biology, Brookhaven National Laboratory, Upton, New York, United States of America
- Physics and Astronomy, Stony Brook University, Stony Brook, New York, United States of America
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, United States of America
- * E-mail:
| |
Collapse
|
55
|
Dasmeh P, Serohijos AWR, Kepp KP, Shakhnovich EI. Positively selected sites in cetacean myoglobins contribute to protein stability. PLoS Comput Biol 2013; 9:e1002929. [PMID: 23505347 PMCID: PMC3591298 DOI: 10.1371/journal.pcbi.1002929] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2012] [Accepted: 01/05/2013] [Indexed: 12/03/2022] Open
Abstract
Since divergence ∼50 Ma ago from their terrestrial ancestors, cetaceans underwent a series of adaptations such as a ∼10-20 fold increase in myoglobin (Mb) concentration in skeletal muscle, critical for increasing oxygen storage capacity and prolonging dive time. Whereas the O2-binding affinity of Mbs is not significantly different among mammals (with typical oxygenation constants of ∼0.8-1.2 µM(-1)), folding stabilities of cetacean Mbs are ∼2-4 kcal/mol higher than for terrestrial Mbs. Using ancestral sequence reconstruction, maximum likelihood and bayesian tests to describe the evolution of cetacean Mbs, and experimentally calibrated computation of stability effects of mutations, we observe accelerated evolution in cetaceans and identify seven positively selected sites in Mb. Overall, these sites contribute to Mb stabilization with a conditional probability of 0.8. We observe a correlation between Mb folding stability and protein abundance, suggesting that a selection pressure for stability acts proportionally to higher expression. We also identify a major divergence event leading to the common ancestor of whales, during which major stabilization occurred. Most of the positively selected sites that occur later act against other destabilizing mutations to maintain stability across the clade, except for the shallow divers, where late stability relaxation occurs, probably due to the shorter aerobic dive limits of these species. The three main positively selected sites 66, 5, and 35 undergo changes that favor hydrophobic folding, structural integrity, and intra-helical hydrogen bonds.
Collapse
Affiliation(s)
- Pouria Dasmeh
- Technical University of Denmark, DTU Chemistry, Kongens Lyngby, Denmark
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Adrian W. R. Serohijos
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Kasper P. Kepp
- Technical University of Denmark, DTU Chemistry, Kongens Lyngby, Denmark
| | - Eugene I. Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
56
|
Differential requirements for mRNA folding partially explain why highly expressed proteins evolve slowly. Proc Natl Acad Sci U S A 2013; 110:E678-86. [PMID: 23382244 DOI: 10.1073/pnas.1218066110] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The cause of the tremendous among-protein variation in the rate of sequence evolution is a central subject of molecular evolution. Expression level has been identified as a leading determinant of this variation among genes encoded in the same genome, but the underlying mechanisms are not fully understood. We here propose and demonstrate that a requirement for stronger folding of more abundant mRNAs results in slower evolution of more highly expressed genes and proteins. Specifically, we show that: (i) the higher the expression level of a gene, the greater the selective pressure for its mRNA to fold; (ii) random mutations are more likely to decrease mRNA folding when occurring in highly expressed genes than in lowly expressed genes; and (iii) amino acid substitution rate is negatively correlated with mRNA folding strength, with or without the control of expression level. Furthermore, synonymous (d(S)) and nonsynonymous (d(N)) nucleotide substitution rates are both negatively correlated with mRNA folding strength. However, counterintuitively, d(S) and d(N) are differentially constrained by selection for mRNA folding, resulting in a significant correlation between mRNA folding strength and d(N)/d(S), even when gene expression level is controlled. The direction and magnitude of this correlation is determined primarily by the G+C frequency at third codon positions. Together, these findings explain why highly expressed genes evolve slowly, demonstrate a major role of natural selection at the mRNA level in constraining protein evolution, and reveal a previously unrecognized and unexpected form of nonprotein-level selection that impacts d(N)/d(S).
Collapse
|
57
|
Levy ED, De S, Teichmann SA. Cellular crowding imposes global constraints on the chemistry and evolution of proteomes. Proc Natl Acad Sci U S A 2012; 109:20461-6. [PMID: 23184996 PMCID: PMC3528536 DOI: 10.1073/pnas.1209312109] [Citation(s) in RCA: 124] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
In living cells, functional protein-protein interactions compete with a much larger number of nonfunctional, or promiscuous, interactions. Several cellular properties contribute to avoiding unwanted protein interactions, including regulation of gene expression, cellular compartmentalization, and high specificity and affinity of functional interactions. Here we investigate whether other mechanisms exist that shape the sequence and structure of proteins to favor their correct assembly into functional protein complexes. To examine this question, we project evolutionary and cellular abundance information onto 397, 196, and 631 proteins of known 3D structure from Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens, respectively. On the basis of amino acid frequencies in interface patches versus the solvent-accessible protein surface, we define a propensity or "stickiness" scale for each of the 20 amino acids. We find that the propensity to interact in a nonspecific manner is inversely correlated with abundance. In other words, high abundance proteins have less sticky surfaces. We also find that stickiness constrains protein evolution, whereby residues in sticky surface patches are more conserved than those found in nonsticky patches. Finally, we find that the constraint imposed by stickiness on protein divergence is proportional to protein abundance, which provides mechanistic insights into the correlation between protein conservation and protein abundance. Overall, the avoidance of nonfunctional interactions significantly influences the physico-chemical and evolutionary properties of proteins. Remarkably, the effects observed are consistently larger in E. coli and S. cerevisiae than in H. sapiens, suggesting that promiscuous protein-protein interactions may be freer to accumulate in the human lineage.
Collapse
Affiliation(s)
- Emmanuel D. Levy
- Medical Research Council Laboratory of Molecular Biology, Cambridge CB2 0QH, United Kingdom
- Département de Biochimie, Université de Montréal, Montréal, QC, Canada H3T 1J4
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Subhajyoti De
- Medical Research Council Laboratory of Molecular Biology, Cambridge CB2 0QH, United Kingdom
- Department of Medicine, University of Colorado School of Medicine, Aurora, CO 80045; and
- Molecular Oncology Program, University of Colorado Cancer Center, Aurora, CO 80045
| | - Sarah A. Teichmann
- Medical Research Council Laboratory of Molecular Biology, Cambridge CB2 0QH, United Kingdom
| |
Collapse
|
58
|
The ortholog conjecture is untestable by the current gene ontology but is supported by RNA sequencing data. PLoS Comput Biol 2012; 8:e1002784. [PMID: 23209392 PMCID: PMC3510086 DOI: 10.1371/journal.pcbi.1002784] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2012] [Accepted: 10/02/2012] [Indexed: 11/19/2022] Open
Abstract
The ortholog conjecture posits that orthologous genes are functionally more similar than paralogous genes. This conjecture is a cornerstone of phylogenomics and is used daily by both computational and experimental biologists in predicting, interpreting, and understanding gene functions. A recent study, however, challenged the ortholog conjecture on the basis of experimentally derived Gene Ontology (GO) annotations and microarray gene expression data in human and mouse. It instead proposed that the functional similarity of homologous genes is primarily determined by the cellular context in which the genes act, explaining why a greater functional similarity of (within-species) paralogs than (between-species) orthologs was observed. Here we show that GO-based functional similarity between human and mouse orthologs, relative to that between paralogs, has been increasing in the last five years. Further, compared with paralogs, orthologs are less likely to be included in the same study, causing an underestimation in their functional similarity. A close examination of functional studies of homologs with identical protein sequences reveals experimental biases, annotation errors, and homology-based functional inferences that are labeled in GO as experimental. These problems and the temporary nature of the GO-based finding make the current GO inappropriate for testing the ortholog conjecture. RNA sequencing (RNA-Seq) is known to be superior to microarray for comparing the expressions of different genes or in different species. Our analysis of a large RNA-Seq dataset of multiple tissues from eight mammals and the chicken shows that the expression similarity between orthologs is significantly higher than that between within-species paralogs, supporting the ortholog conjecture and refuting the cellular context hypothesis for gene expression. We conclude that the ortholog conjecture remains largely valid to the extent that it has been tested, but further scrutiny using more and better functional data is needed.
Collapse
|
59
|
Park C, Qian W, Zhang J. Genomic evidence for elevated mutation rates in highly expressed genes. EMBO Rep 2012; 13:1123-9. [PMID: 23146897 DOI: 10.1038/embor.2012.165] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2012] [Revised: 09/10/2012] [Accepted: 10/05/2012] [Indexed: 11/09/2022] Open
Abstract
Reporter gene assays have demonstrated both transcription-associated mutagenesis (TAM) and transcription-coupled repair, but the net impact of transcription on mutation rate remains unclear, especially at the genomic scale. Using comparative genomics of related species as well as mutation accumulation lines, we show in yeast that the rate of point mutation in a gene increases with the expression level of the gene. Transcription induces mutagenesis on both DNA strands, indicating simultaneous actions of several TAM mechanisms. A significant positive correlation is also detected between the human germline mutation rate and expression level. These results indicate that transcription is overall mutagenic.
Collapse
Affiliation(s)
- Chungoo Park
- Department of Ecology and Evolutionary Biology, University of Michigan, 1075 Natural Science Building, 830 North University Avenue, Ann Arbor, Michigan 48109, USA
| | | | | |
Collapse
|
60
|
Nabholz B, Ellegren H, Wolf JBW. High Levels of Gene Expression Explain the Strong Evolutionary Constraint of Mitochondrial Protein-Coding Genes. Mol Biol Evol 2012; 30:272-84. [DOI: 10.1093/molbev/mss238] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
|
61
|
Independent effects of protein core size and expression on residue-level structure-evolution relationships. PLoS One 2012; 7:e46602. [PMID: 23056364 PMCID: PMC3463513 DOI: 10.1371/journal.pone.0046602] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2012] [Accepted: 09/03/2012] [Indexed: 11/25/2022] Open
Abstract
Recently, we demonstrated that yeast protein evolutionary rate at the level of individual amino acid residues scales linearly with degree of solvent accessibility. This residue-level structure-evolution relationship is sensitive to protein core size: surface residues from large-core proteins evolve much faster than those from small-core proteins, while buried residues are equally constrained independent of protein core size. In this work, we investigate the joint effects of protein core size and expression on the residue-level structure-evolution relationship. At the whole-protein level, protein expression is a much more dominant determinant of protein evolutionary rate than protein core size. In contrast, at the residue level, protein core size and expression both have major impacts on protein structure-evolution relationships. In addition, protein core size and expression influence residue-level structure-evolution relationships in qualitatively different ways. Protein core size preferentially affects the non-synonymous substitution rates of surface residues compared to buried residues, and has little influence on synonymous substitution rates. In comparison, protein expression uniformly affects all residues independent of degree of solvent accessibility, and affects both non-synonymous and synonymous substitution rates. Protein core size and expression exert largely independent effects on protein evolution at the residue level, and can combine to produce dramatic changes in the slope of the linear relationship between residue evolutionary rate and solvent accessibility. Our residue-level findings demonstrate that protein core size and expression are both important, yet qualitatively different, determinants of protein evolution. These results underscore the complementary nature of residue-level and whole-protein analysis of protein evolution.
Collapse
|
62
|
Babbitt GA, Schulze KV. Codons support the maintenance of intrinsic DNA polymer flexibility over evolutionary timescales. Genome Biol Evol 2012; 4:954-65. [PMID: 22936074 PMCID: PMC3468960 DOI: 10.1093/gbe/evs073] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/21/2012] [Indexed: 01/02/2023] Open
Abstract
Despite our long familiarity with how the genetic code specifies the amino acid sequence, we still know little about why it is organized in the way that it is. Contrary to the view that the organization of the genetic code is a "frozen accident" of evolution, recent studies have demonstrated that it is highly nonrandom, with implications for both codon assignment and usage. We hypothesize that this inherent nonrandomness may facilitate the coexistence of both sequence and structural information in DNA. Here, we take advantage of a simple metric of intrinsic DNA flexibility to analyze mutational effects on the four phosphate linkages present in any given codon. Application of a simple evolutionary neutral model of substitution to random sequences, translated with alternative genetic codes, reveals that the standard code is highly optimized to favor synonymous substitutions that maximize DNA polymer flexibility, potentially counteracting neutral evolutionary drift toward stiffer DNA caused by spontaneous deamination. Comparison to existing mutational patterns in yeast also demonstrates evidence of strong selective constraint on DNA flexibility, especially at so-called "silent" sites. We also report a fundamental relationship between DNA flexibility, codon usage bias, and several important evolutionary descriptors of comparative genomics (e.g., base composition, transition/transversion ratio, and nonsynonymous vs. synonymous substitution rate). Recent advances in structural genomics have emphasized the role of the DNA polymer's flexibility in both gene function and whole genome folding, thereby implicating possible reasons for codons to facilitate the multiplexing of both genetic and structural information within the same molecular context.
Collapse
Affiliation(s)
- G A Babbitt
- TH Gosnell School of Life Sciences, Rochester Institute of Technology, Rochester, NY, USA.
| | | |
Collapse
|
63
|
Protein biophysics explains why highly abundant proteins evolve slowly. Cell Rep 2012; 2:249-56. [PMID: 22938865 DOI: 10.1016/j.celrep.2012.06.022] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2012] [Revised: 05/03/2012] [Accepted: 06/21/2012] [Indexed: 12/26/2022] Open
Abstract
The consistent observation across all kingdoms of life that highly abundant proteins evolve slowly demonstrates that cellular abundance is a key determinant of protein evolutionary rate. However, other empirical findings, such as the broad distribution of evolutionary rates, suggest that additional variables determine the rate of protein evolution. Here, we report that under the global selection against the cytotoxic effects of misfolded proteins, folding stability (ΔG), simultaneous with abundance, is a causal variable of evolutionary rate. Using both theoretical analysis and multiscale simulations, we demonstrate that the anticorrelation between the premutation ΔG and the arising mutational effect (ΔΔG), purely biophysical in origin, is a necessary requirement for abundance-evolutionary rate covariation. Additionally, we predict and demonstrate in bacteria that the strength of abundance-evolutionary rate correlation depends on the divergence time separating reference genomes. Altogether, these results highlight the intrinsic role of protein biophysics in the emerging universal patterns of molecular evolution.
Collapse
|
64
|
Abstract
Much molecular-evolution research is concerned with sequence analysis. Yet these sequences represent real, three-dimensional molecules with complex structure and function. Here I highlight a growing trend in the field to incorporate molecular structure and function into computational molecular-evolution work. I consider three focus areas: reconstruction and analysis of past evolutionary events, such as phylogenetic inference or methods to infer selection pressures; development of toy models and simulations to identify fundamental principles of molecular evolution; and atom-level, highly realistic computational modeling of molecular structure and function aimed at making predictions about possible future evolutionary events.
Collapse
Affiliation(s)
- Claus O Wilke
- Institute of Cell and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America.
| |
Collapse
|
65
|
Qian W, Yang JR, Pearson NM, Maclean C, Zhang J. Balanced codon usage optimizes eukaryotic translational efficiency. PLoS Genet 2012; 8:e1002603. [PMID: 22479199 PMCID: PMC3315465 DOI: 10.1371/journal.pgen.1002603] [Citation(s) in RCA: 220] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2011] [Accepted: 02/05/2012] [Indexed: 11/18/2022] Open
Abstract
Cellular efficiency in protein translation is an important fitness determinant in rapidly growing organisms. It is widely believed that synonymous codons are translated with unequal speeds and that translational efficiency is maximized by the exclusive use of rapidly translated codons. Here we estimate the in vivo translational speeds of all sense codons from the budding yeast Saccharomyces cerevisiae. Surprisingly, preferentially used codons are not translated faster than unpreferred ones. We hypothesize that this phenomenon is a result of codon usage in proportion to cognate tRNA concentrations, the optimal strategy in enhancing translational efficiency under tRNA shortage. Our predicted codon–tRNA balance is indeed observed from all model eukaryotes examined, and its impact on translational efficiency is further validated experimentally. Our study reveals a previously unsuspected mechanism by which unequal codon usage increases translational efficiency, demonstrates widespread natural selection for translational efficiency, and offers new strategies to improve synthetic biology. Although an amino acid can be encoded by multiple synonymous codons, these codons are not used equally frequently in a genome. Biased codon usage is believed to improve translational efficiency because it is thought that preferentially used codons are translated faster than unpreferred ones. Surprisingly, we find similar translational speeds among synonymous codons. We show that translational efficiency is optimized by a previously unknown mechanism that relies on proportional use of codons according to their cognate tRNA concentrations. Our results provide important molecular details of protein translation, answer why codon usage is unequal, demonstrate widespread natural selection for translational efficiency, and can guide designs of synthetic genomes and cells with efficient translation systems.
Collapse
Affiliation(s)
- Wenfeng Qian
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jian-Rong Yang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Nathaniel M. Pearson
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Calum Maclean
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail:
| |
Collapse
|
66
|
Abstract
Horizontal gene transfer (HGT), the movement of genetic material from one species to another, is a common phenomenon in prokaryotic evolution. Although the rate of HGT is known to vary among genes, our understanding of the cause of this variation, currently summarized by two rules, is far from complete. The first rule states that informational genes, which are involved in DNA replication, transcription, and translation, have lower transferabilities than operational genes. The second rule asserts that protein interactivity negatively impacts gene transferability. Here, we hypothesize that high expression hampers HGT, because the fitness cost of an HGT to the recipient, arising from the 1) energy expenditure in transcription and translation, 2) cytotoxic protein misfolding, 3) reduction in cellular translational efficiency, 4) detrimental protein misinteraction, and 5) disturbance of the optimal protein concentration or cell physiology, increases with the expression level of the transferred gene. To test this hypothesis, we examined laboratory and natural HGTs to Escherichia coli. We observed lower transferabilities of more highly expressed genes, even after controlling the confounding factors from the two established rules and the genic GC content. Furthermore, expression level predicts gene transferability better than all other factors examined. We also confirmed the significant negative impact of gene expression on the rate of HGTs to 127 of 133 genomes of eubacteria and archaebacteria. Together, these findings establish the gene expression level as a major determinant of horizontal gene transferability. They also suggest that most successful HGTs are initially slightly deleterious, fixed because of their negligibly low costs rather than high benefits to the recipient.
Collapse
Affiliation(s)
- Chungoo Park
- Department of Ecology and Evolutionary Biology, University of Michigan, MI, USA
| | | |
Collapse
|
67
|
Bogumil D, Landan G, Ilhan J, Dagan T. Chaperones divide yeast proteins into classes of expression level and evolutionary rate. Genome Biol Evol 2012; 4:618-25. [PMID: 22417914 PMCID: PMC3381671 DOI: 10.1093/gbe/evs025] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
It has long been known that many proteins require folding via molecular chaperones for their function. Although it has become apparent that folding imposes constraints on protein sequence evolution, the effects exerted by different chaperone classes are so far unknown. We have analyzed data of protein interaction with the chaperones in Saccharomycescerevisiae using network methods. The results reveal a distinct community structure within the network that was hitherto undetectable with standard statistical tools. Sixty-four yeast chaperones comprise ten distinct modules that are defined by interaction specificity for their 2,691 interacting proteins. The classes of interacting proteins that are in turn defined by their dedicated chaperone modules are distinguished by various physiochemical protein properties and are characterized by significantly different protein expression levels, codon usage, and amino acid substitution rates. Correlations between substitution rate, codon bias, and gene expression level that have long been known for yeast are apparent at the level of the chaperone-defined modules. This indicates that correlated expression, conservation, and codon bias levels for yeast genes are attributable to previously unrecognized effects of protein folding. Proteome-wide categories of chaperone–substrate specificity uncover novel hubs of functional constraint in protein evolution that are conserved across 20 fungal genomes.
Collapse
Affiliation(s)
- David Bogumil
- Institute of Molecular Evolution, Heinrich-Heine University Düsseldorf, Germany
| | | | | | | |
Collapse
|
68
|
Protein misinteraction avoidance causes highly expressed proteins to evolve slowly. Proc Natl Acad Sci U S A 2012; 109:E831-40. [PMID: 22416125 DOI: 10.1073/pnas.1117408109] [Citation(s) in RCA: 129] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The tempo and mode of protein evolution have been central questions in biology. Genomic data have shown a strong influence of the expression level of a protein on its rate of sequence evolution (E-R anticorrelation), which is currently explained by the protein misfolding avoidance hypothesis. Here, we show that this hypothesis does not fully explain the E-R anticorrelation, especially for protein surface residues. We propose that natural selection against protein-protein misinteraction, which wastes functional molecules and is potentially toxic, constrains the evolution of surface residues. Because highly expressed proteins are under stronger pressures to avoid misinteraction, surface residues are expected to show an E-R anticorrelation. Our molecular-level evolutionary simulation and yeast genomic analysis confirm multiple predictions of the hypothesis. These findings show a pluralistic origin of the E-R anticorrelation and reveal the role of protein misinteraction, an inherent property of complex cellular systems, in constraining protein evolution.
Collapse
|
69
|
Managadze D, Rogozin IB, Chernikova D, Shabalina SA, Koonin EV. Negative correlation between expression level and evolutionary rate of long intergenic noncoding RNAs. Genome Biol Evol 2011; 3:1390-404. [PMID: 22071789 PMCID: PMC3242500 DOI: 10.1093/gbe/evr116] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Mammalian genomes contain numerous genes for long noncoding RNAs (lncRNAs). The functions of the lncRNAs remain largely unknown but their evolution appears to be constrained by purifying selection, albeit relatively weakly. To gain insights into the mode of evolution and the functional range of the lncRNA, they can be compared with much better characterized protein-coding genes. The evolutionary rate of the protein-coding genes shows a universal negative correlation with expression: highly expressed genes are on average more conserved during evolution than the genes with lower expression levels. This correlation was conceptualized in the misfolding-driven protein evolution hypothesis according to which misfolding is the principal cost incurred by protein expression. We sought to determine whether long intergenic ncRNAs (lincRNAs) follow the same evolutionary trend and indeed detected a moderate but statistically significant negative correlation between the evolutionary rate and expression level of human and mouse lincRNA genes. The magnitude of the correlation for the lincRNAs is similar to that for equal-sized sets of protein-coding genes with similar levels of sequence conservation. Additionally, the expression level of the lincRNAs is significantly and positively correlated with the predicted extent of lincRNA molecule folding (base-pairing), however, the contributions of evolutionary rates and folding to the expression level are independent. Thus, the anticorrelation between evolutionary rate and expression level appears to be a general feature of gene evolution that might be caused by similar deleterious effects of protein and RNA misfolding and/or other factors, for example, the number of interacting partners of the gene product.
Collapse
Affiliation(s)
- David Managadze
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | | | | | | | | |
Collapse
|
70
|
Testing hypotheses on the rate of molecular evolution in relation to gene expression using microRNAs. Proc Natl Acad Sci U S A 2011; 108:15942-7. [PMID: 21911382 DOI: 10.1073/pnas.1110098108] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
There exists an inverse relationship between the rate of molecular evolution and the level of gene expression. Among the many explanations, the "toxic-error" hypothesis is a most general one, which posits that processing errors may often be toxic to the cells. However, toxic errors that constrain the evolution of highly expressed genes are often difficult to measure. In this study, we test the toxic-error hypothesis by using microRNA (miRNA) genes because their processing errors can be directly measured by deep sequencing. A miRNA gene consists of a small mature product (≈22 nt long) and a "backbone." Our analysis shows that (i) like the mature miRNA, the backbone is highly conserved; (ii) the rate of sequence evolution in the backbone is negatively correlated with expression; and (iii) although conserved between distantly related species, the error rate in miRNA processing is also negatively correlated with the expression level. The observations suggest that, as a miRNA gene becomes more highly (or more ubiquitously) expressed, its sequence evolves toward a structure that minimizes processing errors.
Collapse
|
71
|
Determinants of translation efficiency and accuracy. Mol Syst Biol 2011; 7:481. [PMID: 21487400 PMCID: PMC3101949 DOI: 10.1038/msb.2011.14] [Citation(s) in RCA: 325] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2010] [Accepted: 02/15/2011] [Indexed: 12/17/2022] Open
Abstract
A given protein sequence can be encoded by an astronomical number of alternative nucleotide sequences. Recent research has revealed that this flexibility provides evolution with multiple ways to tune the efficiency and fidelity of protein translation and folding. Proper functioning of biological cells requires that the process of protein expression be carried out with high efficiency and fidelity. Given an amino-acid sequence of a protein, multiple degrees of freedom still remain that may allow evolution to tune efficiency and fidelity for each gene under various conditions and cell types. Particularly, the redundancy of the genetic code allows the choice between alternative codons for the same amino acid, which, although ‘synonymous,' may exert dramatic effects on the process of translation. Here we review modern developments in genomics and systems biology that have revolutionized our understanding of the multiple means by which translation is regulated. We suggest new means to model the process of translation in a richer framework that will incorporate information about gene sequences, the tRNA pool of the organism and the thermodynamic stability of the mRNA transcripts. A practical demonstration of a better understanding of the process would be a more accurate prediction of the proteome, given the transcriptome at a diversity of biological conditions.
Collapse
|
72
|
Slow protein evolutionary rates are dictated by surface-core association. Proc Natl Acad Sci U S A 2011; 108:11151-6. [PMID: 21690394 DOI: 10.1073/pnas.1015994108] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Why do certain proteins evolve much slower than others? We compared not only rates per protein, but also rates per position within individual proteins. For ∼90% of proteins, the distribution of positional rates exhibits three peaks: a peak of slow evolving residues, with average log(2)[normalized rate], log(2)μ, of ca. -2, corresponding primarily to core residues; a peak of fast evolving residues (log(2)μ ∼ 0.5) largely corresponding to surface residues; and a very fast peak (log(2)μ ∼ 2) associated with disordered segments. However, a unique fraction of proteins that evolve very slowly exhibit not only a negligible fast peak, but also a peak with a log(2)μ ∼ -4, rather than the standard core peak of -2. Thus, a "freeze" of a protein's surface seems to stop core evolution as well. We also observed a much higher fraction of substitutions in potentially interacting residues than expected by chance, including substitutions in pairs of contacting surface-core residues. Overall, the data suggest that accumulation of surface substitutions enables the acceptance of substitutions in core positions. The underlying reason for slow evolution might therefore be a highly constrained surface due to protein-protein interactions or the need to prevent misfolding or aggregation. If the surface is inaccessible to substitutions, so becomes the core, thus resulting in very slow overall rates.
Collapse
|
73
|
Abstract
Despite our extensive knowledge about the rate of protein sequence evolution for thousands of genes in hundreds of species, the corresponding rate of protein function evolution is virtually unknown, especially at the genomic scale. This lack of knowledge is primarily because of the huge diversity in protein function and the consequent difficulty in gauging and comparing rates of protein function evolution. Nevertheless, most proteins function through interacting with other proteins, and protein-protein interaction (PPI) can be tested by standard assays. Thus, the rate of protein function evolution may be measured by the rate of PPI evolution. Here, we experimentally examine 87 potential interactions between Kluyveromyces waltii proteins, whose one to one orthologs in the related budding yeast Saccharomyces cerevisiae have been reported to interact. Combining our results with available data from other eukaryotes, we estimate that the evolutionary rate of protein interaction is (2.6 ± 1.6) × 10(-10) per PPI per year, which is three orders of magnitude lower than the rate of protein sequence evolution measured by the number of amino acid substitutions per protein per year. The extremely slow evolution of protein molecular function may account for the remarkable conservation of life at molecular and cellular levels and allow for studying the mechanistic basis of human disease in much simpler organisms.
Collapse
|
74
|
Castillo V, Graña-Montes R, Sabate R, Ventura S. Prediction of the aggregation propensity of proteins from the primary sequence: Aggregation properties of proteomes. Biotechnol J 2011; 6:674-85. [DOI: 10.1002/biot.201000331] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2011] [Revised: 02/23/2011] [Accepted: 03/03/2011] [Indexed: 12/14/2022]
|
75
|
Impact of gene expression noise on organismal fitness and the efficacy of natural selection. Proc Natl Acad Sci U S A 2011; 108:E67-76. [PMID: 21464323 DOI: 10.1073/pnas.1100059108] [Citation(s) in RCA: 160] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Gene expression noise is a universal phenomenon across all life forms. Although beneficial under certain circumstances, expression noise is generally thought to be deleterious. However, neither the magnitude of the deleterious effect nor the primary mechanism of this effect is known. Here, we model the impact of expression noise on the fitness of unicellular organisms by considering the influence of suboptimal expressions of enzymes on the rate of biomass production and the energetic cost associated with imprecise amounts of protein synthesis. Our theoretical modeling and empirical analysis of yeast data show four findings. (i) Expression noise reduces the mean fitness of a cell by at least 25%, and this reduction cannot be substantially alleviated by gene overexpression. (ii) Higher sensitivity of fitness to the expression fluctuations of essential genes than nonessential genes creates stronger selection against noise in essential genes, resulting in a decrease in their noise. (iii) Reduction of expression noise by genome doubling offers a substantial fitness advantage to diploids over haploids, even in the absence of sex. (iv) Expression noise generates fitness variation among isogenic cells, which lowers the efficacy of natural selection similar to the effect of population shrinkage. Thus, expression noise renders organisms both less adapted and less adaptable. Because expression noise is only one of many manifestations of the stochasticity in cellular molecular processes, our results suggest a much more fundamental role of molecular stochasticity in evolution than is currently appreciated.
Collapse
|
76
|
Chen SCC, Chuang TJ, Li WH. The relationships among microRNA regulation, intrinsically disordered regions, and other indicators of protein evolutionary rate. Mol Biol Evol 2011; 28:2513-20. [PMID: 21398349 DOI: 10.1093/molbev/msr068] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Many indicators of protein evolutionary rate have been proposed, but some of them are interrelated. The purpose of this study is to disentangle their correlations. We assess the strength of each indicator by controlling for the other indicators under study. We find that the number of microRNA (miRNA) types that regulate a gene is the strongest rate indicator (a negative correlation), followed by disorder content (the percentage of disordered regions in a protein, a positive correlation); the strength of disorder content as a rate indicator is substantially increased after controlling for the number of miRNA types. By dividing proteins into lowly and highly intrinsically disordered proteins (L-IDPs and H-IDPs), we find that proteins interacting with more H-IDPs tend to evolve more slowly, which largely explains the previous observation of a negative correlation between the number of protein-protein interactions and evolutionary rate. Moreover, all of the indicators examined here, except for the number of miRNA types, have different strengths in L-IDPs and in H-IDPs. Finally, the number of phosphorylation sites is weakly correlated with the number of miRNA types, and its strength as a rate indicator is substantially reduced when other indicators are considered. Our study reveals the relative strength of each rate indicator and increases our understanding of protein evolution.
Collapse
Affiliation(s)
- Sean Chun-Chang Chen
- Institute of BioMedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | | | | |
Collapse
|