1
|
Shovlin CL, Aldred MA. When "loss-of-function" means proteostasis burden: Thinking again about coding DNA variants. Am J Hum Genet 2025; 112:3-10. [PMID: 39753117 PMCID: PMC11739917 DOI: 10.1016/j.ajhg.2024.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 12/02/2024] [Accepted: 12/03/2024] [Indexed: 01/20/2025] Open
Abstract
Each human genome has approximately 5 million DNA variants. Even for complete loss-of-function variants causing inherited, monogenic diseases, current understanding based on gene-specific molecular function does not adequately predict variability observed between people with identical mutations or fluctuating disease trajectories. We present a parallel paradigm for loss-of-function variants based on broader consequences to the cell when aberrant polypeptide chains of amino acids are translated from mutant RNA to generate mutated proteins. Missense variants that modify primary amino acid sequence, and nonsense/frameshift variants that generate premature termination codons (PTCs), are placed in context alongside emergent themes of chaperone binding, protein quality control capacity, and cellular adaptation to stress. Relatively stable proteostasis burdens are contrasted with rapid changes after induction of gene expression, or stress responses that suppress nonsense mediated decay (NMD) leading to higher PTC transcript levels where mutant proteins can augment cellular stress. For known disease-causal mutations, an adjunctive variant categorization system enhances clinical predictive power and precision therapeutic opportunities. Additionally, with typically more than 100 nonsense and frameshift variants, and ∼10,000 missense variants per human DNA, the paradigm focuses attention on all protein-coding DNA variants, and their potential contributions to multimorbid states beyond classically designated inherited diseases. Experimental testing in clinically relevant systems is encouraged to augment current atlases of protein expression at single-cell resolution, and high-throughput experimental data and deep-learning models that predict which amino acid substitutions generate enhanced degradative burdens. Incorporating additional dimensions such as pan-proteome competition for chaperones, and age-related loss of proteostasis capacity, should further accelerate health impacts.
Collapse
Affiliation(s)
- Claire L Shovlin
- National Heart and Lung Institute, Imperial College London, London, UK.
| | - Micheala A Aldred
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
| |
Collapse
|
2
|
Tchebotarev L, Herzel L. Secret code: Encoding promoters by synonymous codons. Proc Natl Acad Sci U S A 2024; 121:e2416360121. [PMID: 39312671 PMCID: PMC11459169 DOI: 10.1073/pnas.2416360121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/25/2024] Open
Affiliation(s)
- Lieve Tchebotarev
- Department of Biology, Chemistry and Pharmacy, Institute of Chemistry and Biochemistry, Freie Universität Berlin, Berlin14195, Germany
| | - Lydia Herzel
- Department of Biology, Chemistry and Pharmacy, Institute of Chemistry and Biochemistry, Freie Universität Berlin, Berlin14195, Germany
| |
Collapse
|
3
|
Vasco A, Taylor RJ, Méndez Y, Bernardes GJL. On-Demand Thio-Succinimide Hydrolysis for the Assembly of Stable Protein-Protein Conjugates. J Am Chem Soc 2024; 146:20709-20719. [PMID: 39012647 PMCID: PMC11295205 DOI: 10.1021/jacs.4c03721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 06/19/2024] [Accepted: 06/21/2024] [Indexed: 07/17/2024]
Abstract
Chemical post-translational protein-protein conjugation is an important technique with growing applications in biotechnology and pharmaceutical research. Maleimides represent one of the most widely employed bioconjugation reagents. However, challenges associated with the instability of first- and second-generation maleimide technologies are yet to be fully addressed. We report the development of a novel class of maleimide reagents that can undergo on-demand ring-opening hydrolysis of the resulting thio-succinimide. This strategy enables rapid post-translational assembly of protein-protein conjugates. Thio-succinimide hydrolysis, triggered upon application of chemical, photochemical, or enzymatic stimuli, allowed homobifunctional bis-maleimide reagents to be applied in the production of stable protein-protein conjugates, with complete temporal control. Bivalent and bispecific protein-protein dimers constructed from small binders targeting antigens of oncological importance, PD-L1 and HER2, were generated with high purity, stability, and improved functionality compared to monomeric building blocks. The modularity of the approach was demonstrated through elaboration of the linker moiety through a bioorthogonal propargyl handle to produce protein-protein-fluorophore conjugates. Furthermore, extending the functionality of the homobifunctional reagents by temporarily masking reactive thiols included in the linker allowed the assembly of higher order trimeric and tetrameric single-domain antibody conjugates. The potential for the approach to be extended to proteins of greater biochemical complexity was demonstrated in the production of immunoglobulin single-domain antibody conjugates. On-demand control of thio-succinimide hydrolysis combined with the facile assembly of chemically defined homo- and heterodimers constitutes an important expansion of the chemical methods available for generating stable protein-protein conjugates.
Collapse
Affiliation(s)
| | | | - Yanira Méndez
- Yusuf Hamied Department of
Chemistry, University of Cambridge, CB2 1EW Cambridge, U.K.
| | | |
Collapse
|
4
|
Salman A, Biziaev N, Shuvalova E, Alkalaeva E. mRNA context and translation factors determine decoding in alternative nuclear genetic codes. Bioessays 2024; 46:e2400058. [PMID: 38724251 DOI: 10.1002/bies.202400058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 04/19/2024] [Accepted: 04/23/2024] [Indexed: 06/27/2024]
Abstract
The genetic code is a set of instructions that determine how the information in our genetic material is translated into amino acids. In general, it is universal for all organisms, from viruses and bacteria to humans. However, in the last few decades, exceptions to this rule have been identified both in pro- and eukaryotes. In this review, we discuss the 16 described alternative eukaryotic nuclear genetic codes and observe theories of their appearance in evolution. We consider possible molecular mechanisms that allow codon reassignment. Most reassignments in nuclear genetic codes are observed for stop codons. Moreover, in several organisms, stop codons can simultaneously encode amino acids and serve as termination signals. In this case, the meaning of the codon is determined by the additional factors besides the triplets. A comprehensive review of various non-standard coding events in the nuclear genomes provides a new insight into the translation mechanism in eukaryotes.
Collapse
Affiliation(s)
- Ali Salman
- Engelhardt Institute of Molecular Biology, the Russian Academy of Sciences, Moscow, Russia
| | - Nikita Biziaev
- Engelhardt Institute of Molecular Biology, the Russian Academy of Sciences, Moscow, Russia
| | - Ekaterina Shuvalova
- Engelhardt Institute of Molecular Biology, the Russian Academy of Sciences, Moscow, Russia
| | - Elena Alkalaeva
- Engelhardt Institute of Molecular Biology, the Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
5
|
Ardern Z. Alternative Reading Frames are an Underappreciated Source of Protein Sequence Novelty. J Mol Evol 2023; 91:570-580. [PMID: 37326679 DOI: 10.1007/s00239-023-10122-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 05/31/2023] [Indexed: 06/17/2023]
Abstract
Protein-coding DNA sequences can be translated into completely different amino acid sequences if the nucleotide triplets used are shifted by a non-triplet amount on the same DNA strand or by translating codons from the opposite strand. Such "alternative reading frames" of protein-coding genes are a major contributor to the evolution of novel protein products. Recent studies demonstrating this include examples across the three domains of cellular life and in viruses. These sequences increase the number of trials potentially available for the evolutionary invention of new genes and also have unusual properties which may facilitate gene origin. There is evidence that the structure of the standard genetic code contributes to the features and gene-likeness of some alternative frame sequences. These findings have important implications across diverse areas of molecular biology, including for genome annotation, structural biology, and evolutionary genomics.
Collapse
|
6
|
Cuevas-Zuviría B, Adam ZR, Goldman AD, Kaçar B. Informatic Capabilities of Translation and Its Implications for the Origins of Life. J Mol Evol 2023; 91:567-569. [PMID: 37526692 DOI: 10.1007/s00239-023-10125-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 06/22/2023] [Indexed: 08/02/2023]
Abstract
The ability to encode and convert heritable information into molecular function is a defining feature of life as we know it. The conversion of information into molecular function is performed by the translation process, in which triplets of nucleotides in a nucleic acid polymer (mRNA) encode specific amino acids in a protein polymer that folds into a three-dimensional structure. The folded protein then performs one or more molecular activities, often as one part of a complex and coordinated physiological network. Prebiotic systems, lacking the ability to explicitly translate information between genotype and phenotype, would have depended upon either chemosynthetic pathways to generate its components-constraining its complexity and evolvability- or on the ambivalence of RNA as both carrier of information and of catalytic functions-a possibility which is still supported by a very limited set of catalytic RNAs. Thus, the emergence of translation during early evolutionary history may have allowed life to unmoor from the setting of its origin. The origin of translation machinery also represents an entirely novel and distinct threshold of behavior for which there is no abiotic counterpart-it could be the only known example of computing that emerged naturally at the chemical level. Here we describe translation machinery's decoding system as the basis of cellular translation's information-processing capabilities, and the four operation types that find parallels in computer systems engineering that this biological machinery exhibits.
Collapse
Affiliation(s)
- Bruno Cuevas-Zuviría
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA.
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid, Madrid, Spain.
| | - Zachary R Adam
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
- Department of Geosciences, University of Wisconsin-Madison, Madison, WI, USA
| | | | - Betül Kaçar
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
7
|
Omachi Y, Saito N, Furusawa C. Rare-event sampling analysis uncovers the fitness landscape of the genetic code. PLoS Comput Biol 2023; 19:e1011034. [PMID: 37068098 PMCID: PMC10138212 DOI: 10.1371/journal.pcbi.1011034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 04/27/2023] [Accepted: 03/16/2023] [Indexed: 04/18/2023] Open
Abstract
The genetic code refers to a rule that maps 64 codons to 20 amino acids. Nearly all organisms, with few exceptions, share the same genetic code, the standard genetic code (SGC). While it remains unclear why this universal code has arisen and been maintained during evolution, it may have been preserved under selection pressure. Theoretical studies comparing the SGC and numerically created hypothetical random genetic codes have suggested that the SGC has been subject to strong selection pressure for being robust against translation errors. However, these prior studies have searched for random genetic codes in only a small subspace of the possible code space due to limitations in computation time. Thus, how the genetic code has evolved, and the characteristics of the genetic code fitness landscape, remain unclear. By applying multicanonical Monte Carlo, an efficient rare-event sampling method, we efficiently sampled random codes from a much broader random ensemble of genetic codes than in previous studies, estimating that only one out of every 1020 random codes is more robust than the SGC. This estimate is significantly smaller than the previous estimate, one in a million. We also characterized the fitness landscape of the genetic code that has four major fitness peaks, one of which includes the SGC. Furthermore, genetic algorithm analysis revealed that evolution under such a multi-peaked fitness landscape could be strongly biased toward a narrow peak, in an evolutionary path-dependent manner.
Collapse
Affiliation(s)
- Yuji Omachi
- Graduate School of Sciences, The University of Tokyo, Hongo, Tokyo, Japan
| | - Nen Saito
- Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima City, Hiroshima, Japan
- Exploratory Research Center on Life and Living Systems, National Institutes of Natural Sciences, Okazaki, Aichi, Japan
- Universal Biology Institute, The University of Tokyo, Hongo, Tokyo, Japan
| | - Chikara Furusawa
- Graduate School of Sciences, The University of Tokyo, Hongo, Tokyo, Japan
- Universal Biology Institute, The University of Tokyo, Hongo, Tokyo, Japan
- Center for Biosystems Dynamics Research, RIKEN, Suita, Osaka, Japan
| |
Collapse
|
8
|
Romero Romero ML, Landerer C, Poehls J, Toth‐Petroczy A. Phenotypic mutations contribute to protein diversity and shape protein evolution. Protein Sci 2022; 31:e4397. [PMID: 36040266 PMCID: PMC9375231 DOI: 10.1002/pro.4397] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 06/14/2022] [Accepted: 07/04/2022] [Indexed: 11/16/2022]
Abstract
Errors in DNA replication generate genetic mutations, while errors in transcription and translation lead to phenotypic mutations. Phenotypic mutations are orders of magnitude more frequent than genetic ones, yet they are less understood. Here, we review the types of phenotypic mutations, their quantifications, and their role in protein evolution and disease. The diversity generated by phenotypic mutation can facilitate adaptive evolution. Indeed, phenotypic mutations, such as ribosomal frameshift and stop codon readthrough, sometimes serve to regulate protein expression and function. Phenotypic mutations have often been linked to fitness decrease and diseases. Thus, understanding the protein heterogeneity and phenotypic diversity caused by phenotypic mutations will advance our understanding of protein evolution and have implications on human health and diseases.
Collapse
Affiliation(s)
- Maria Luisa Romero Romero
- Max Planck Institute of Molecular Cell Biology and GeneticsDresdenGermany
- Center for Systems Biology DresdenDresdenGermany
| | - Cedric Landerer
- Max Planck Institute of Molecular Cell Biology and GeneticsDresdenGermany
- Center for Systems Biology DresdenDresdenGermany
| | - Jonas Poehls
- Max Planck Institute of Molecular Cell Biology and GeneticsDresdenGermany
- Center for Systems Biology DresdenDresdenGermany
| | - Agnes Toth‐Petroczy
- Max Planck Institute of Molecular Cell Biology and GeneticsDresdenGermany
- Center for Systems Biology DresdenDresdenGermany
- Cluster of Excellence Physics of LifeTU DresdenDresdenGermany
| |
Collapse
|
9
|
Taylor RJ, Geeson MB, Journeaux T, Bernardes GJL. Chemical and Enzymatic Methods for Post-Translational Protein-Protein Conjugation. J Am Chem Soc 2022; 144:14404-14419. [PMID: 35912579 PMCID: PMC9389620 DOI: 10.1021/jacs.2c00129] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Indexed: 11/28/2022]
Abstract
Fusion proteins play an essential role in the biosciences but suffer from several key limitations, including the requirement for N-to-C terminal ligation, incompatibility of constituent domains, incorrect folding, and loss of biological activity. This perspective focuses on chemical and enzymatic approaches for the post-translational generation of well-defined protein-protein conjugates, which overcome some of the limitations faced by traditional fusion techniques. Methods discussed range from chemical modification of nucleophilic canonical amino acid residues to incorporation of unnatural amino acid residues and a range of enzymatic methods, including sortase-mediated ligation. Through summarizing the progress in this rapidly growing field, the key successes and challenges associated with using chemical and enzymatic approaches are highlighted and areas requiring further development are discussed.
Collapse
Affiliation(s)
- Ross J. Taylor
- Department
of Chemistry, University of Cambridge, Lensfield Road, CB2 1EW Cambridge, U.K.
| | - Michael B. Geeson
- Department
of Chemistry, University of Cambridge, Lensfield Road, CB2 1EW Cambridge, U.K.
| | - Toby Journeaux
- Department
of Chemistry, University of Cambridge, Lensfield Road, CB2 1EW Cambridge, U.K.
| | - Gonçalo J. L. Bernardes
- Department
of Chemistry, University of Cambridge, Lensfield Road, CB2 1EW Cambridge, U.K.
- Instituto
de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, Avenida Professor Egas Moniz, 1649-028, Lisboa, Portugal
| |
Collapse
|
10
|
Wang X, Dong Q, Chen G, Zhang J, Liu Y, Cai Y. Frameshift and wild-type proteins are often highly similar because the genetic code and genomes were optimized for frameshift tolerance. BMC Genomics 2022; 23:416. [PMID: 35655139 PMCID: PMC9164415 DOI: 10.1186/s12864-022-08435-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 03/02/2022] [Indexed: 11/10/2022] Open
Abstract
Frameshift mutations have been considered of significant importance for the molecular evolution of proteins and their coding genes, while frameshift protein sequences encoded in the alternative reading frames of coding genes have been considered to be meaningless. However, functional frameshifts have been found widely existing. It was puzzling how a frameshift protein kept its structure and functionality while substantial changes occurred in its primary amino-acid sequence. This study shows that the similarities among frameshifts and wild types are higher than random similarities and are determined at different levels. Frameshift substitutions are more conservative than random substitutions in the standard genetic code (SGC). The frameshift substitutions score of SGC ranks in the top 2.0-3.5% of alternative genetic codes, showing that SGC is nearly optimal for frameshift tolerance. In many genes and certain genomes, frameshift-resistant codons and codon pairs appear more frequently than expected, suggesting that frameshift tolerance is achieved through not only the optimality of the genetic code but, more importantly, the further optimization of a specific gene or genome through the usages of codons/codon pairs, which sheds light on the role of frameshift mutations in molecular and genomic evolution.
Collapse
Affiliation(s)
- Xiaolong Wang
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China.
| | - Quanjiang Dong
- Qingdao Municipal Hospital, Qingdao, Shandong, 266003, P. R. China
| | - Gang Chen
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China
| | - Jianye Zhang
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China
| | - Yongqiang Liu
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China
| | - Yujia Cai
- Department of Biotechnology, College of Marine Life Sciences, Ocean University of China, No. 5 Yushan Road, Shandong, Qingdao, 266003, P. R. China
| |
Collapse
|
11
|
Li DJ. Distributional features of triplet codons in genomes underlie the diversification of life. Biosystems 2022; 217:104681. [DOI: 10.1016/j.biosystems.2022.104681] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2021] [Revised: 04/04/2022] [Accepted: 04/07/2022] [Indexed: 11/02/2022]
|
12
|
Kebabci N, Timucin AC, Timucin E. Toward Compilation of Balanced Protein Stability Data Sets: Flattening the ΔΔ G Curve through Systematic Enrichment. J Chem Inf Model 2022; 62:1345-1355. [PMID: 35201762 DOI: 10.1021/acs.jcim.2c00054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Often studies analyzing stability data sets and/or predictors ignore neutral mutations and use a binary classification scheme labeling only destabilizing and stabilizing mutations. Recognizing that highly concentrated neutral mutations interfere with data set quality, we have explored three protein stability data sets: S2648, PON-tstab, and the symmetric Ssym that differ in size and quality. A characteristic leptokurtic shape in the ΔΔG distributions of all three data sets including the curated and symmetric ones was reported due to concentrated neutral mutations. To further investigate the impact of neutral mutations on ΔΔG predictions, we have comprehensively assessed the performance of 11 predictors on the PON-tstab data set. Correlation and error analyses showed that all of the predictors performed the best on the neutral mutations, while their performance became gradually worse as the ΔΔG of the mutations departed further from the neutral zone regardless of the direction, implying a bias toward dense mutations. To this end, after unraveling the role of concentrated neutral mutations in biases of stability data sets, we described a systematic enrichment approach to balance the ΔΔG distributions. Before enrichment, mutations were clustered based on their biochemical and/or structural features, and then three mutations were selected from every 2 kcal/mol of each cluster. Upon implementation of this approach by distinct clustering schemes, we generated five subsets varying in size and ΔΔG distributions. All subsets showed improved ΔΔG and frequency distributions. We ultimately reported that the errors toward enriched subsets were higher than those toward the parent data sets, confirming the enrichment of difficult-to-predict mutations in the subsets. In summary, we elaborated the prediction bias toward a concentrated neutral zone and also implemented a rational strategy to tackle this and other forms of biases. Ultimately, this study equipping us with an extended view of shortcomings of stability data sets is a step taken toward development of an unbiased predictor.
Collapse
Affiliation(s)
- Narod Kebabci
- Department of Biostatistics and Bioinformatics, Institute of Health Sciences, Acibadem University, Istanbul 34752, Turkey
| | - Ahmet Can Timucin
- Department of Molecular Biology and Genetics, Faculty of Arts and Sciences, Acibadem University, Istanbul 34752, Turkey
| | - Emel Timucin
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem University, Istanbul 34752, Turkey
| |
Collapse
|
13
|
Transcription factor specificity limits the number of DNA-binding motifs. PLoS One 2022; 17:e0263307. [PMID: 35089985 PMCID: PMC8797260 DOI: 10.1371/journal.pone.0263307] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 01/15/2022] [Indexed: 11/19/2022] Open
Abstract
We study the limits imposed by transcription factor specificity on the maximum number of binding motifs that can coexist in a gene regulatory network, using the SwissRegulon Fantom5 collection of 684 human transcription factor binding sites as a model. We describe transcription factor specificity using regular expressions and find that most human transcription factor binding site motifs are separated in sequence space by one to three motif-discriminating positions. We apply theorems based on the pigeonhole principle to calculate the maximum number of transcription factors that can coexist given this degree of specificity, which is in the order of ten thousand and would fully utilize the space of DNA subsequences. Taking into account an expanded DNA alphabet with modified bases can further raise this limit by several orders of magnitude, at a lower level of sequence space usage. Our results may guide the design of transcription factors at both the molecular and system scale.
Collapse
|
14
|
Ferrer-I-Cancho R, Gómez-Rodríguez C, Esteban JL, Alemany-Puig L. Optimality of syntactic dependency distances. Phys Rev E 2022; 105:014308. [PMID: 35193296 DOI: 10.1103/physreve.105.014308] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Accepted: 11/10/2021] [Indexed: 06/14/2023]
Abstract
It is often stated that human languages, as other biological systems, are shaped by cost-cutting pressures but, to what extent? Attempts to quantify the degree of optimality of languages by means of an optimality score have been scarce and focused mostly on English. Here we recast the problem of the optimality of the word order of a sentence as an optimization problem on a spatial network where the vertices are words, arcs indicate syntactic dependencies, and the space is defined by the linear order of the words in the sentence. We introduce a score to quantify the cognitive pressure to reduce the distance between linked words in a sentence. The analysis of sentences from 93 languages representing 19 linguistic families reveals that half of languages are optimized to a 70% or more. The score indicates that distances are not significantly reduced in a few languages and confirms two theoretical predictions: that longer sentences are more optimized and that distances are more likely to be longer than expected by chance in short sentences. We present a hierarchical ranking of languages by their degree of optimization. The score has implications for various fields of language research (dependency linguistics, typology, historical linguistics, clinical linguistics, and cognitive science). Finally, the principles behind the design of the score have implications for network science.
Collapse
Affiliation(s)
- Ramon Ferrer-I-Cancho
- Complexity and Quantitative Linguistics Lab, LARCA Research Group, Departament de Ciències de la Computació, Universitat Politècnica de Catalunya, Campus Nord, Edifici Omega, Jordi Girona Salgado 1-3 08034 Barcelona, Catalonia, Spain
| | - Carlos Gómez-Rodríguez
- Universidade da Coruña, CITIC, FASTPARSE Lab, LyS Research Group, Departamento de Ciencias de la Computación y Tecnologías de la Información, Facultade de Informática, Elviña, 15071, A Coruña, Spain
| | - Juan Luis Esteban
- Departament de Ciències de la Computació, Universitat Politècnica de Catalunya (UPC), Campus Nord, Edifici Omega, Jordi Girona Salgado 1-3 08034 Barcelona, Catalonia, Spain
| | - Lluís Alemany-Puig
- Complexity and Quantitative Linguistics Lab, LARCA Research Group, Departament de Ciències de la Computació, Universitat Politècnica de Catalunya, Campus Nord, Edifici Omega, Jordi Girona Salgado 1-3 08034 Barcelona, Catalonia, Spain
| |
Collapse
|
15
|
Abstract
The standard genetic code (SGC) has been extensively analyzed for the biological ramifications of its nonrandom structure. For instance, mismatch errors due to point mutation or mistranslation have an overall smaller effect on the amino acid polar requirement under the SGC than under random genetic codes (RGCs). A similar observation was recently made for frameshift errors, prompting the assertion that the SGC has been shaped by natural selection for frameshift-robustness-conservation of certain amino acid properties upon a frameshift mutation or translational frameshift. However, frameshift-robustness confers no benefit because frameshifts usually create premature stop codons that cause nonsense-mediated mRNA decay or production of nonfunctional truncated proteins. We here propose that the frameshift-robustness of the SGC is a byproduct of its mismatch-robustness. Of 564 amino acid properties considered, the SGC exhibits mismatch-robustness in 93-133 properties and frameshift-robustness in 55 properties, respectively, and that the latter is largely a subset of the former. For each of the 564 real and 564 randomly constructed fake properties of amino acids, there is a positive correlation between mismatch-robustness and frameshift-robustness across one million RGCs; this correlation arises because most amino acid changes resulting from a frameshift are also achievable by a mismatch error. Importantly, the SGC does not show significantly higher frameshift-robustness in any of the 55 properties than RGCs of comparable mismatch-robustness. These findings support that the frameshift-robustness of the SGC need not originate through direct selection and can instead be a site effect of its mismatch-robustness.
Collapse
Affiliation(s)
- Haiqing Xu
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
16
|
Callens M, Pradier L, Finnegan M, Rose C, Bedhomme S. Read between the lines: Diversity of non-translational selection pressures on local codon usage. Genome Biol Evol 2021; 13:6263832. [PMID: 33944930 PMCID: PMC8410138 DOI: 10.1093/gbe/evab097] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/28/2021] [Indexed: 12/14/2022] Open
Abstract
Protein coding genes can contain specific motifs within their nucleotide sequence that function as a signal for various biological pathways. The presence of such sequence motifs within a gene can have beneficial or detrimental effects on the phenotype and fitness of an organism, and this can lead to the enrichment or avoidance of this sequence motif. The degeneracy of the genetic code allows for the existence of alternative synonymous sequences that exclude or include these motifs, while keeping the encoded amino acid sequence intact. This implies that locally, there can be a selective pressure for preferentially using a codon over its synonymous alternative in order to avoid or enrich a specific sequence motif. This selective pressure could -in addition to mutation, drift and selection for translation efficiency and accuracy- contribute to shape the codon usage bias. In this review, we discuss patterns of avoidance of (or enrichment for) the various biological signals contained in specific nucleotide sequence motifs: transcription and translation initiation and termination signals, mRNA maturation signals, and antiviral immune system targets. Experimental data on the phenotypic or fitness effects of synonymous mutations in these sequence motifs confirm that they can be targets of local selection pressures on codon usage. We also formulate the hypothesis that transposable elements could have a similar impact on codon usage through their preferred integration sequences. Overall, selection on codon usage appears to be a combination of a global selection pressure imposed by the translation machinery, and a patchwork of local selection pressures related to biological signals contained in specific sequence motifs.
Collapse
Affiliation(s)
- Martijn Callens
- Centre d'Ecologie Fonctionnelle et Evolutive, CNRS, Université de Montpellier, Université Paul Valéry Montpellier 3, Ecole Pratique des Hautes Etudes, Institut de Recherche pour le Développement, 34000 Montpellier, France
| | - Léa Pradier
- Centre d'Ecologie Fonctionnelle et Evolutive, CNRS, Université de Montpellier, Université Paul Valéry Montpellier 3, Ecole Pratique des Hautes Etudes, Institut de Recherche pour le Développement, 34000 Montpellier, France
| | - Michael Finnegan
- Centre d'Ecologie Fonctionnelle et Evolutive, CNRS, Université de Montpellier, Université Paul Valéry Montpellier 3, Ecole Pratique des Hautes Etudes, Institut de Recherche pour le Développement, 34000 Montpellier, France
| | - Caroline Rose
- Centre d'Ecologie Fonctionnelle et Evolutive, CNRS, Université de Montpellier, Université Paul Valéry Montpellier 3, Ecole Pratique des Hautes Etudes, Institut de Recherche pour le Développement, 34000 Montpellier, France
| | - Stéphanie Bedhomme
- Centre d'Ecologie Fonctionnelle et Evolutive, CNRS, Université de Montpellier, Université Paul Valéry Montpellier 3, Ecole Pratique des Hautes Etudes, Institut de Recherche pour le Développement, 34000 Montpellier, France
| |
Collapse
|
17
|
Giannerini S, Gonzalez DL, Goracci G, Danielli A. A role for circular code properties in translation. Sci Rep 2021; 11:9218. [PMID: 33911089 PMCID: PMC8080828 DOI: 10.1038/s41598-021-87534-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 03/23/2021] [Indexed: 11/19/2022] Open
Abstract
Circular codes represent a form of coding allowing detection/correction of frame-shift errors. Building on recent theoretical advances on circular codes, we provide evidence that protein coding sequences exhibit in-frame circular code marks, that are absent in introns and are intimately linked to the keto-amino transformation of codon bases. These properties strongly correlate with translation speed, codon influence and protein synthesis levels. Strikingly, circular code marks are absent at the beginning of coding sequences, but stably occur 40 codons after the initiator codon, hinting at the translation elongation process. Finally, we use the lens of circular codes to show that codon influence on translation correlates with the strong-weak dichotomy of the first two bases of the codon. The results can lead to defining new universal tools for sequence indicators and sequence optimization for bioinformatics and biotechnological applications, and can shed light on the molecular mechanisms behind the decoding process.
Collapse
Affiliation(s)
- Simone Giannerini
- Department of Statistical Sciences, University of Bologna, Bologna, 40126, Italy.
| | - Diego Luis Gonzalez
- Department of Statistical Sciences, University of Bologna, Bologna, 40126, Italy.,Institute for Microelectronics and Microsystems - Bologna Unit, CNR, Bologna, 40129, Italy
| | - Greta Goracci
- Department of Statistical Sciences, University of Bologna, Bologna, 40126, Italy
| | - Alberto Danielli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, 40126, Italy
| |
Collapse
|
18
|
Groenevelt JM, Corey DJ, Fehl C. Chemical Synthesis and Biological Applications of O-GlcNAcylated Peptides and Proteins. Chembiochem 2021; 22:1854-1870. [PMID: 33450137 DOI: 10.1002/cbic.202000843] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Revised: 01/15/2021] [Indexed: 12/25/2022]
Abstract
All human cells use O-GlcNAc protein modifications (O-linked N-acetylglucosamine) to rapidly adapt to changing nutrient and stress conditions through signaling, epigenetic, and proteostasis mechanisms. A key challenge for biologists in defining precise roles for specific O-GlcNAc sites is synthetic access to homogenous isoforms of O-GlcNAc proteins, a result of the non-genetically templated, transient, and heterogeneous nature of O-GlcNAc modifications. Toward a solution, this review details the state of the art of two strategies for O-GlcNAc protein modification: advances in "bottom-up" O-GlcNAc peptide synthesis and direct "top-down" installation of O-GlcNAc on full proteins. We also describe key applications of synthetic O-GlcNAc peptide and protein tools as therapeutics, biophysical structure-function studies, biomarkers, and as disease mechanistic probes to advance translational O-GlcNAc biology.
Collapse
Affiliation(s)
- Jessica M Groenevelt
- Department of Chemistry, Wayne State University, 5101 Cass Avenue, Detroit, MI, 48202, USA
| | - Daniel J Corey
- Department of Chemistry, Wayne State University, 5101 Cass Avenue, Detroit, MI, 48202, USA
| | - Charlie Fehl
- Department of Chemistry, Wayne State University, 5101 Cass Avenue, Detroit, MI, 48202, USA
| |
Collapse
|
19
|
Zapelloni F, Jurado-Rivera JA, Jaume D, Juan C, Pons J. Comparative Mitogenomics in Hyalella (Amphipoda: Crustacea). Genes (Basel) 2021; 12:genes12020292. [PMID: 33669879 PMCID: PMC7923271 DOI: 10.3390/genes12020292] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Revised: 02/16/2021] [Accepted: 02/17/2021] [Indexed: 02/02/2023] Open
Abstract
We present the sequencing and comparative analysis of 17 mitochondrial genomes of Nearctic and Neotropical amphipods of the genus Hyalella, most from the Andean Altiplano. The mitogenomes obtained comprised the usual 37 gene-set of the metazoan mitochondrial genome showing a gene rearrangement (a reverse transposition and a reversal) between the North and South American Hyalella mitogenomes. Hyalella mitochondrial genomes show the typical AT-richness and strong nucleotide bias among codon sites and strands of pancrustaceans. Protein-coding sequences are biased towards AT-rich codons, with a preference for leucine and serine amino acids. Numerous base changes (539) were found in tRNA stems, with 103 classified as fully compensatory, 253 hemi-compensatory and the remaining base mismatches and indels. Most compensatory Watson–Crick switches were AU -> GC linked in the same haplotype, whereas most hemi-compensatory changes resulted in wobble GU and a few AC pairs. These results suggest a pairing fitness increase in tRNAs after crossing low fitness valleys. Branch-site level models detected positive selection for several amino acid positions in up to eight mitochondrial genes, with atp6 and nad5 as the genes displaying more sites under selection.
Collapse
Affiliation(s)
- Francesco Zapelloni
- Department of Biology, University of the Balearic Islands, Ctra. Valldemossa km 7,5, 07122 Palma, Spain; (F.Z.); (J.A.J.-R.); (C.J.)
| | - José A. Jurado-Rivera
- Department of Biology, University of the Balearic Islands, Ctra. Valldemossa km 7,5, 07122 Palma, Spain; (F.Z.); (J.A.J.-R.); (C.J.)
| | - Damià Jaume
- IMEDEA (CSIC-UIB), Mediterranean Institute for Advanced Studies, C/Miquel Marquès 21, 07190 Esporles, Spain;
| | - Carlos Juan
- Department of Biology, University of the Balearic Islands, Ctra. Valldemossa km 7,5, 07122 Palma, Spain; (F.Z.); (J.A.J.-R.); (C.J.)
- IMEDEA (CSIC-UIB), Mediterranean Institute for Advanced Studies, C/Miquel Marquès 21, 07190 Esporles, Spain;
| | - Joan Pons
- IMEDEA (CSIC-UIB), Mediterranean Institute for Advanced Studies, C/Miquel Marquès 21, 07190 Esporles, Spain;
- Correspondence: ; Tel.: +34-971-173-332
| |
Collapse
|
20
|
Demongeot J, Moreira A, Seligmann H. Negative CG dinucleotide bias: An explanation based on feedback loops between Arginine codon assignments and theoretical minimal RNA rings. Bioessays 2020; 43:e2000071. [PMID: 33319381 DOI: 10.1002/bies.202000071] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 11/23/2020] [Accepted: 11/26/2020] [Indexed: 01/05/2023]
Abstract
Theoretical minimal RNA rings are candidate primordial genes evolved for non-redundant coding of the genetic code's 22 coding signals (one codon per biogenic amino acid, a start and a stop codon) over the shortest possible length: 29520 22-nucleotide-long RNA rings solve this min-max constraint. Numerous RNA ring properties are reminiscent of natural genes. Here we present analyses showing that all RNA rings lack dinucleotide CG (a mutable, chemically instable dinucleotide coding for Arginine), bearing a resemblance to known CG-depleted genomes. CG in "incomplete" RNA rings (not coding for all coding signals, with only 3-12 nucleotides) gradually decreases towards CG absence in complete, 22-nucleotide-long RNA rings. Presumably, feedback loops during RNA ring growth during evolution (when amino acid assignment fixed the genetic code) assigned Arg to codons lacking CG (AGR) to avoid CG. Hence, as a chemical property of base pairs, CG mutability restructured the genetic code, thereby establishing itself as genetically encoded biological information.
Collapse
Affiliation(s)
- Jacques Demongeot
- Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecom4Health, Faculty of Medicine, Université Grenoble Alpes, La Tronche, France
| | - Andrés Moreira
- Departamento de Informática, Universidad Técnica Federico Santa María, Santiago, Chile
| | - Hervé Seligmann
- Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecom4Health, Faculty of Medicine, Université Grenoble Alpes, La Tronche, France.,The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem, Israel.,Institute of Microstructure Technology, Karlsruhe Institute of Technology (KIT), Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
21
|
Humphrey S, Kerr A, Rattray M, Dive C, Miller CJ. A model of k-mer surprisal to quantify local sequence information content surrounding splice regions. PeerJ 2020; 8:e10063. [PMID: 33194378 PMCID: PMC7648452 DOI: 10.7717/peerj.10063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Accepted: 09/08/2020] [Indexed: 12/22/2022] Open
Abstract
Molecular sequences carry information. Analysis of sequence conservation between homologous loci is a proven approach with which to explore the information content of molecular sequences. This is often done using multiple sequence alignments to support comparisons between homologous loci. These methods therefore rely on sufficient underlying sequence similarity with which to construct a representative alignment. Here we describe a method using a formal metric of information, surprisal, to analyse biological sub-sequences without alignment constraints. We applied our model to the genomes of five different species to reveal similar patterns across a panel of eukaryotes. As the surprisal of a sub-sequence is inversely proportional to its occurrence within the genome, the optimal size of the sub-sequences was selected for each species under consideration. With the model optimized, we found a strong correlation between surprisal and CG dinucleotide usage. The utility of our model was tested by examining the sequences of genes known to undergo splicing. We demonstrate that our model can identify biological features of interest such as known donor and acceptor sites. Analysis across all annotated coding exon junctions in Homo sapiens reveals the information content of coding exons to be greater than the surrounding intron regions, a consequence of increased suppression of the CG dinucleotide in intronic space. Sequences within coding regions proximal to exon junctions exhibited novel patterns within DNA and coding mRNA that are not a function of the encoded amino acid sequence. Our findings are consistent with the presence of secondary information encoding features such as DNA and RNA binding sites, multiplexed through the coding sequence and independent of the information required to define the corresponding amino-acid sequence. We conclude that surprisal provides a complementary methodology with which to locate regions of interest in the genome, particularly in situations that lack an appropriate multiple sequence alignment.
Collapse
Affiliation(s)
- Sam Humphrey
- CRUK Manchester Institute Cancer Biomarker Centre, The University of Manchester, Manchester, United Kingdom
- CRUK Manchester Institute, CRUK Lung Cancer Centre of Excellence, Manchester, United Kingdom
| | - Alastair Kerr
- CRUK Manchester Institute Cancer Biomarker Centre, The University of Manchester, Manchester, United Kingdom
- CRUK Manchester Institute, CRUK Lung Cancer Centre of Excellence, Manchester, United Kingdom
| | - Magnus Rattray
- Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, United Kingdom
| | - Caroline Dive
- CRUK Manchester Institute Cancer Biomarker Centre, The University of Manchester, Manchester, United Kingdom
- CRUK Manchester Institute, CRUK Lung Cancer Centre of Excellence, Manchester, United Kingdom
| | - Crispin J. Miller
- Computational Biology Group, CRUK Beatson Institute, Glasgow, United Kingdom
- Institute of Cancer Sciences, University of Glasgow, Glasgow, United Kingdom
| |
Collapse
|
22
|
Dila G, Michel CJ, Thompson JD. Optimality of circular codes versus the genetic code after frameshift errors. Biosystems 2020; 195:104134. [DOI: 10.1016/j.biosystems.2020.104134] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 03/23/2020] [Accepted: 03/25/2020] [Indexed: 12/24/2022]
|
23
|
A search for the physical basis of the genetic code. Biosystems 2020; 195:104148. [DOI: 10.1016/j.biosystems.2020.104148] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 04/09/2020] [Accepted: 04/09/2020] [Indexed: 01/01/2023]
|
24
|
Demongeot J, Seligmann H. Why Is AUG the Start Codon?: Theoretical Minimal RNA Rings: Maximizing Coded Information Biases 1st Codon for the Universal Initiation Codon AUG. Bioessays 2020; 42:e1900201. [PMID: 32227358 DOI: 10.1002/bies.201900201] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2019] [Revised: 02/09/2020] [Indexed: 01/04/2023]
Abstract
The rational design of theoretical minimal RNA rings predetermines AUG as the universal start codon. This design maximizes coded amino acid diversity over minimal sequence length, defining in silico theoretical minimal RNA rings, candidate ancestral genes. RNA rings code for 21 amino acids and a stop codon after three consecutive translation rounds, and form a degradation-delaying stem-loop hairpin. Twenty-five RNA rings match these constraints, ten start with the universal initiation codon AUG. No first codon bias exists among remaining RNA rings. RNA ring design predetermines AUG as initiation codon. This is the only explanation yet for AUG as start codon. RNA ring design determines additional RNA ring gene- and tRNA-like properties described previously, because it presumably mimics constraints on life's primordial RNAs.
Collapse
Affiliation(s)
- Jacques Demongeot
- Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecom4Health, Faculty of Medicine, Université Grenoble Alpes, La Tronche, F-38700, France
| | - Hervé Seligmann
- Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecom4Health, Faculty of Medicine, Université Grenoble Alpes, La Tronche, F-38700, France.,The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem, 91404, Israel
| |
Collapse
|
25
|
Abstract
Frameshifts in protein coding sequences are widely perceived as resulting in either nonfunctional or even deleterious protein products. Indeed, frameshifts typically lead to markedly altered protein sequences and premature stop codons. By analyzing complete proteomes from all three domains of life, we demonstrate that, in contrast, several key physicochemical properties of protein sequences exhibit significant robustness against +1 and -1 frameshifts. In particular, we show that hydrophobicity profiles of many protein sequences remain largely invariant upon frameshifting. For example, over 2,900 human proteins exhibit a Pearson's correlation coefficient R between the hydrophobicity profiles of the original and the +1-frameshifted variants greater than 0.7, despite an average sequence identity between the two of only 6.5% in this group. We observe a similar effect for protein sequence profiles of affinity for certain nucleobases as well as protein sequence profiles of intrinsic disorder. Finally, analysis of significance and optimality demonstrates that frameshift stability is embedded in the structure of the universal genetic code and may have contributed to shaping it. Our results suggest that frameshifting may be a powerful evolutionary mechanism for creating new proteins with vastly different sequences, yet similar physicochemical properties to the proteins from which they originate.
Collapse
|
26
|
Demongeot J, Seligmann H. Deamination gradients within codons after 1<->2 position swap predict amino acid hydrophobicity and parallel β-sheet conformational preference. Biosystems 2020; 191-192:104116. [PMID: 32081715 DOI: 10.1016/j.biosystems.2020.104116] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 12/04/2019] [Accepted: 02/10/2020] [Indexed: 12/16/2022]
Abstract
Deaminations C->T and A->G are frequent mutations producing nucleotide content gradients across genomes proportional to singlestrandedness during replication/transcription. Hence, within single codons, deamination risks increase from first to third codon positions, while second codon positions are functionally most crucial. Here genetic codes are analyzed assuming that after anticodons protected codons from deaminations, first and second codon positions swapped (N2N1N3->N1N2N3), with lowest deamination risks for N2 in presumed primitive N2N1N3 codons. N2N1N3, not standard N1N2N3, codon structure minimizes deaminations inversely proportionally to cognate amino acid hydrophobicity and parallel betasheet conformational preference. For N1N2N3, deamination minimization increases with genetic code integration order of cognate amino acids: during the presumed N2N1N3->N1N2N3 codon structure transition, protein synthesis combined direct codon-amino acid interactions for late amino acids and tRNA-based translation for early amino acids. Hence N2N1N3 codons would correspond to tRNA-free translation by spontaneous codon-amino acid affinities, and tRNA-mediated translation presumably caused N2N1N3->N1N2N3 swaps. Results show that rational, not arbitrary rules link codon and amino acid structures. Some analyses detect mitochondrial RNAs and peptides in public data corresponding to systematic position swaps, suggesting occasional swapping polymerase activity.
Collapse
Affiliation(s)
- Jacques Demongeot
- Université Grenoble Alpes, Faculty of Medicine, Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical, F-38700, La Tronche, France.
| | - Hervé Seligmann
- Université Grenoble Alpes, Faculty of Medicine, Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical, F-38700, La Tronche, France; The National Natural History Collections, The Hebrew University of Jerusalem, 91404, Jerusalem, Israel.
| |
Collapse
|
27
|
Pentamers with Non-redundant Frames: Bias for Natural Circular Code Codons. J Mol Evol 2020; 88:194-201. [DOI: 10.1007/s00239-019-09925-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 12/17/2019] [Indexed: 02/06/2023]
|
28
|
Wichmann S, Ardern Z. Optimality in the standard genetic code is robust with respect to comparison code sets. Biosystems 2019; 185:104023. [DOI: 10.1016/j.biosystems.2019.104023] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 08/22/2019] [Accepted: 08/24/2019] [Indexed: 01/22/2023]
|
29
|
Barbieri M. Evolution of the genetic code: The ambiguity-reduction theory. Biosystems 2019; 185:104024. [DOI: 10.1016/j.biosystems.2019.104024] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 08/26/2019] [Accepted: 08/26/2019] [Indexed: 10/26/2022]
|
30
|
Demongeot J, Seligmann H. Theoretical minimal RNA rings designed according to coding constraints mimic deamination gradients. THE SCIENCE OF NATURE - NATURWISSENSCHAFTEN 2019; 106:44. [DOI: 10.1007/s00114-019-1638-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2018] [Revised: 06/18/2019] [Accepted: 06/19/2019] [Indexed: 11/27/2022]
|
31
|
A general model on the origin of biological codes. Biosystems 2019; 181:11-19. [DOI: 10.1016/j.biosystems.2019.04.010] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Revised: 04/16/2019] [Accepted: 04/16/2019] [Indexed: 01/09/2023]
|
32
|
Optimization of the standard genetic code in terms of two mutation types: Point mutations and frameshifts. Biosystems 2019; 181:44-50. [DOI: 10.1016/j.biosystems.2019.04.012] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 04/27/2019] [Indexed: 02/08/2023]
|
33
|
Demongeot J, Seligmann H. Spontaneous evolution of circular codes in theoretical minimal RNA rings. Gene 2019; 705:95-102. [DOI: 10.1016/j.gene.2019.03.069] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 03/08/2019] [Accepted: 03/29/2019] [Indexed: 02/06/2023]
|
34
|
Seligmann H. Localized Context-Dependent Effects of the "Ambush" Hypothesis: More Off-Frame Stop Codons Downstream of Shifty Codons. DNA Cell Biol 2019; 38:786-795. [PMID: 31157984 DOI: 10.1089/dna.2019.4725] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The ambush hypothesis speculates that off-frame stop codons increase translational efficiency after ribosomal frameshifts by stopping early frameshifted translation. Some evidences fit this hypothesis: (1) synonymous codon usages increase with their potential contribution to off-frame stops; (2) the genetic code assigns frequent amino acids to codon families contributing to off-frame stops; (3) positive biases for off-frame stops (AT rich) occur despite adverse nucleotide (GC) biases; and (4) mitochondrial off-frame stop codon densities increase with ribosomal structural instability, potential proxy of frameshift frequencies. In this study, analyses of vertebrate mitogenes and tRNA synthetase genes from all superkingdoms and viruses test a new prediction of the ambush hypothesis: sequences immediately downstream of frameshift-inducing homopolymer codons (AAA, CCC, GGG, and TTT) are off-frame stop rich. Codons immediately downstream of homopolymer codons form more than average off-frame stops, biases are stronger than for corresponding upstream distances and for any other group of synonymous codons. Sequences downstream of that high-density region are off-frame stop depleted. This decrease suggests that off-frame stops, combined with suppressor tRNAs regulate translation of overlapping coding sequences. Results show the predictive power of the ambush hypothesis, from macroevolutionary (genetic code structure) to detailed gene sequence anatomy.
Collapse
Affiliation(s)
- Hervé Seligmann
- The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
35
|
Fontrodona N, Aubé F, Claude JB, Polvèche H, Lemaire S, Tranchevent LC, Modolo L, Mortreux F, Bourgeois CF, Auboeuf D. Interplay between coding and exonic splicing regulatory sequences. Genome Res 2019; 29:711-722. [PMID: 30962178 PMCID: PMC6499313 DOI: 10.1101/gr.241315.118] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Accepted: 03/28/2019] [Indexed: 01/24/2023]
Abstract
The inclusion of exons during the splicing process depends on the binding of splicing factors to short low-complexity regulatory sequences. The relationship between exonic splicing regulatory sequences and coding sequences is still poorly understood. We demonstrate that exons that are coregulated by any given splicing factor share a similar nucleotide composition bias and preferentially code for amino acids with similar physicochemical properties because of the nonrandomness of the genetic code. Indeed, amino acids sharing similar physicochemical properties correspond to codons that have the same nucleotide composition bias. In particular, we uncover that the TRA2A and TRA2B splicing factors that bind to adenine-rich motifs promote the inclusion of adenine-rich exons coding preferentially for hydrophilic amino acids that correspond to adenine-rich codons. SRSF2 that binds guanine/cytosine-rich motifs promotes the inclusion of GC-rich exons coding preferentially for small amino acids, whereas SRSF3 that binds cytosine-rich motifs promotes the inclusion of exons coding preferentially for uncharged amino acids, like serine and threonine that can be phosphorylated. Finally, coregulated exons encoding amino acids with similar physicochemical properties correspond to specific protein features. In conclusion, the regulation of an exon by a splicing factor that relies on the affinity of this factor for specific nucleotide(s) is tightly interconnected with the exon-encoded physicochemical properties. We therefore uncover an unanticipated bidirectional interplay between the splicing regulatory process and its biological functional outcome.
Collapse
Affiliation(s)
- Nicolas Fontrodona
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Fabien Aubé
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Jean-Baptiste Claude
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Hélène Polvèche
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Sébastien Lemaire
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Léon-Charles Tranchevent
- Proteome and Genome Research Unit, Department of Oncology, Luxembourg Institute of Health (LIH), L-1445 Strassen, Luxembourg
| | - Laurent Modolo
- LBMC Biocomputing Center, CNRS UMR 5239, INSERM U1210, F-69007, Lyon, France
| | - Franck Mortreux
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Cyril F Bourgeois
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Didier Auboeuf
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| |
Collapse
|
36
|
Demongeot J, Seligmann H. More Pieces of Ancient than Recent Theoretical Minimal Proto-tRNA-Like RNA Rings in Genes Coding for tRNA Synthetases. J Mol Evol 2019; 87:152-174. [DOI: 10.1007/s00239-019-09892-6] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 03/22/2019] [Indexed: 12/19/2022]
|
37
|
Boivin V, Faucher-Giguère L, Scott M, Abou-Elela S. The cellular landscape of mid-size noncoding RNA. WILEY INTERDISCIPLINARY REVIEWS-RNA 2019; 10:e1530. [PMID: 30843375 PMCID: PMC6619189 DOI: 10.1002/wrna.1530] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Revised: 02/08/2019] [Accepted: 02/09/2019] [Indexed: 01/06/2023]
Abstract
Noncoding RNA plays an important role in all aspects of the cellular life cycle, from the very basic process of protein synthesis to specialized roles in cell development and differentiation. However, many noncoding RNAs remain uncharacterized and the function of most of them remains unknown. Mid-size noncoding RNAs (mncRNAs), which range in length from 50 to 400 nucleotides, have diverse regulatory functions but share many fundamental characteristics. Most mncRNAs are produced from independent promoters although others are produced from the introns of other genes. Many are found in multiple copies in genomes. mncRNAs are highly structured and carry many posttranscriptional modifications. Both of these facets dictate their RNA-binding protein partners and ultimately their function. mncRNAs have already been implicated in translation, catalysis, as guides for RNA modification, as spliceosome components and regulatory RNA. However, recent studies are adding new mncRNA functions including regulation of gene expression and alternative splicing. In this review, we describe the different classes, characteristics and emerging functions of mncRNAs and their relative expression patterns. Finally, we provide a portrait of the challenges facing their detection and annotation in databases. This article is categorized under: Regulatory RNAs/RNAi/Riboswitches > Regulatory RNAs RNA Structure and Dynamics > RNA Structure, Dynamics, and Chemistry RNA Structure and Dynamics > Influence of RNA Structure in Biological Systems RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution.
Collapse
Affiliation(s)
- Vincent Boivin
- Department of Biochemistry, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, Quebec, Canada
| | - Laurence Faucher-Giguère
- Department of Microbiology and Infectious Disease, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, Quebec, Canada
| | - Michelle Scott
- Department of Biochemistry, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, Quebec, Canada
| | - Sherif Abou-Elela
- Department of Microbiology and Infectious Disease, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, Quebec, Canada
| |
Collapse
|
38
|
Abrahams L, Hurst LD. Refining the Ambush Hypothesis: Evidence That GC- and AT-Rich Bacteria Employ Different Frameshift Defence Strategies. Genome Biol Evol 2018; 10:1153-1173. [PMID: 29617761 PMCID: PMC5909447 DOI: 10.1093/gbe/evy075] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2018] [Indexed: 12/13/2022] Open
Abstract
Stop codons are frequently selected for beyond their regular termination function for error control. The “ambush hypothesis” proposes out-of-frame stop codons (OSCs) terminating frameshifted translations are selected for. Although early indirect evidence was partially supportive, recent evidence suggests OSC frequencies are not exceptional when considering underlying nucleotide content. However, prior null tests fail to control amino acid/codon usages or possible local mutational biases. We therefore return to the issue using bacterial genomes, considering several tests defining and testing against a null. We employ simulation approaches preserving amino acid order but shuffling synonymous codons or preserving codons while shuffling amino acid order. Additionally, we compare codon usage in amino acid pairs, where one codon can but the next, otherwise identical codon, cannot encode an OSC. OSC frequencies exceed expectations typically in AT-rich genomes, the +1 frame and for TGA/TAA but not TAG. With this complex evidence, simply rejecting or accepting the ambush hypothesis is not warranted. We propose a refined post hoc model, whereby AT-rich genomes have more accidental frameshifts, handled by RF2–RF3 complexes (associated with TGA/TAA) and are mostly +1 (or −2) slips. Supporting this, excesses positively correlate with in silico predicted frameshift probabilities. Thus, we propose a more viable framework, whereby genomes broadly adopt one of the two strategies to combat frameshifts: preventing frameshifting (GC-rich) or permitting frameshifts but minimizing impacts when most are caught early (AT-rich). Our refined framework holds promise yet some features, such as the bias of out-of-frame sense codons, remain unexplained.
Collapse
Affiliation(s)
- Liam Abrahams
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, United Kingdom
| | - Laurence D Hurst
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, United Kingdom
| |
Collapse
|
39
|
Wan J, Gao X, Mao Y, Zhang X, Qian SB. A Coding Sequence-Embedded Principle Governs Translational Reading Frame Fidelity. RESEARCH (WASHINGTON, D.C.) 2018; 2018:7089174. [PMID: 31549036 PMCID: PMC6750092 DOI: 10.1155/2018/7089174] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Accepted: 08/28/2018] [Indexed: 06/10/2023]
Abstract
Upon initiation at a start codon, the ribosome must maintain the correct reading frame for hundreds of codons in order to produce functional proteins. While some sequence elements are able to trigger programmed ribosomal frameshifting (PRF), very little is known about how the ribosome normally prevents spontaneous frameshift errors that can have dire consequences if uncorrected. Using high resolution ribosome profiling data sets, we discovered that the translating ribosome uses the 3' end of 18S rRNA to scan the AUG-like codons after the decoding process. The postdecoding mRNA:rRNA interaction not only contributes to predominant translational pausing, but also provides a retrospective mechanism to safeguard the ribosome in the correct reading frame. Partially eliminating the AUG-like "sticky" codons in the reporter message leads to increased +1 frameshift errors. Remarkably, mutating the highly conserved CAU triplet of 18S rRNA globally changes the codon "stickiness". Further supporting the role of "sticky" sequences in reading frame maintenance, the codon composition of open reading frames is highly optimized across eukaryotic genomes. These results suggest an important layer of information embedded within the protein-coding sequences that instructs the ribosome to ensure reading frame fidelity during translation.
Collapse
Affiliation(s)
- Ji Wan
- Division of Nutritional Sciences, Cornell University, Ithaca, NY 14853, USA
| | - Xiangwei Gao
- Division of Nutritional Sciences, Cornell University, Ithaca, NY 14853, USA
| | - Yuanhui Mao
- Division of Nutritional Sciences, Cornell University, Ithaca, NY 14853, USA
| | - Xingqian Zhang
- Division of Nutritional Sciences, Cornell University, Ithaca, NY 14853, USA
| | - Shu-Bing Qian
- Division of Nutritional Sciences, Cornell University, Ithaca, NY 14853, USA
- Graduate Programs in Genetics Genomics and Development, Biochemistry Molecular and Cellular Biology, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
40
|
Kasman A. The Duplexing of the Genetic Code and Sequence-Dependent DNA Geometry. Bull Math Biol 2018; 80:2734-2760. [PMID: 30097915 DOI: 10.1007/s11538-018-0486-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Accepted: 08/03/2018] [Indexed: 11/30/2022]
Abstract
It is well known that sequences of bases in DNA are translated into sequences of amino acids in cells via the genetic code. More recently, it has been discovered that the sequence of DNA bases also influences the geometry and deformability of the DNA. These two correspondences represent a naturally arising example of duplexed codes, providing two different ways of interpreting the same DNA sequence. This paper will set up the notation and basic results necessary to mathematically investigate the relationship between these two natural DNA codes. It then undertakes two very different such investigations: one graphical approach based only on expected values and another analytic approach incorporating the deformability of the DNA molecule and approximating the mutual information of the two codes. Special emphasis is paid to whether there is evidence that pressure to maximize the duplexing efficiency influenced the evolution of the genetic code. Disappointingly, the results fail to support the hypothesis that the genetic code was influenced in this way. In fact, applying both methods to samples of realistic alternative genetic codes shows that the duplexing of the genetic code found in nature is just slightly less efficient than average. The implications of this negative result are considered in the final section of the paper.
Collapse
|
41
|
Abrahams L, Hurst LD. Adenine Enrichment at the Fourth CDS Residue in Bacterial Genes Is Consistent with Error Proofing for +1 Frameshifts. Mol Biol Evol 2018; 34:3064-3080. [PMID: 28961919 PMCID: PMC5850271 DOI: 10.1093/molbev/msx223] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Beyond selection for optimal protein functioning, coding sequences (CDSs) are under selection at the RNA and DNA levels. Here, we identify a possible signature of “dual-coding,” namely extensive adenine (A) enrichment at bacterial CDS fourth sites. In 99.07% of studied bacterial genomes, fourth site A use is greater than expected given genomic A-starting codon use. Arguing for nucleotide level selection, A-starting serine and arginine second codons are heavily utilized when compared with their non-A starting synonyms. Several models have the ability to explain some of this trend. In part, A-enrichment likely reduces 5′ mRNA stability, promoting translation initiation. However T/U, which may also reduce stability, is avoided. Further, +1 frameshifts on the initiating ATG encode a stop codon (TGA) provided A is the fourth residue, acting either as a frameshift “catch and destroy” or a frameshift stop and adjust mechanism and hence implicated in translation initiation. Consistent with both, genomes lacking TGA stop codons exhibit weaker fourth site A-enrichment. Sequences lacking a Shine–Dalgarno sequence and those without upstream leader genes, that may be more error prone during initiation, have greater utilization of A, again suggesting a role in initiation. The frameshift correction model is consistent with the notion that many genomic features are error-mitigation factors and provides the first evidence for site-specific out of frame stop codon selection. We conjecture that the NTG universal start codon may have evolved as a consequence of TGA being a stop codon and the ability of NTGA to rapidly terminate or adjust a ribosome.
Collapse
Affiliation(s)
- Liam Abrahams
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| | - Laurence D Hurst
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| |
Collapse
|
42
|
Dadová J, Galan SR, Davis BG. Synthesis of modified proteins via functionalization of dehydroalanine. Curr Opin Chem Biol 2018; 46:71-81. [PMID: 29913421 DOI: 10.1016/j.cbpa.2018.05.022] [Citation(s) in RCA: 112] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Revised: 05/02/2018] [Accepted: 05/29/2018] [Indexed: 12/17/2022]
Abstract
Dehydroalanine has emerged in recent years as a non-proteinogenic residue with strong chemical utility in proteins for the study of biology. In this review we cover the several methods now available for its flexible and site-selective incorporation via a variety of complementary chemical and biological techniques and examine its reactivity, allowing both creation of modified protein side-chains through a variety of bond-forming methods (C-S, C-N, C-Se, C-C) and as an activity-based probe in its own right. We illustrate its utility with selected examples of biological and technological discovery and application.
Collapse
Affiliation(s)
- Jitka Dadová
- Department of Chemistry, University of Oxford, Chemistry Research Laboratory, Mansfield Road, Oxford OX1 3TA, United Kingdom
| | - Sébastien Rg Galan
- Department of Chemistry, University of Oxford, Chemistry Research Laboratory, Mansfield Road, Oxford OX1 3TA, United Kingdom
| | - Benjamin G Davis
- Department of Chemistry, University of Oxford, Chemistry Research Laboratory, Mansfield Road, Oxford OX1 3TA, United Kingdom.
| |
Collapse
|
43
|
Alignment-based and alignment-free methods converge with experimental data on amino acids coded by stop codons at split between nuclear and mitochondrial genetic codes. Biosystems 2018; 167:33-46. [DOI: 10.1016/j.biosystems.2018.03.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 03/18/2018] [Accepted: 03/19/2018] [Indexed: 12/11/2022]
|
44
|
Yona AH, Alm EJ, Gore J. Random sequences rapidly evolve into de novo promoters. Nat Commun 2018; 9:1530. [PMID: 29670097 PMCID: PMC5906472 DOI: 10.1038/s41467-018-04026-w] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Accepted: 03/28/2018] [Indexed: 11/09/2022] Open
Abstract
How new functions arise de novo is a fundamental question in evolution. We studied de novo evolution of promoters in Escherichia coli by replacing the lac promoter with various random sequences of the same size (~100 bp) and evolving the cells in the presence of lactose. We find that ~60% of random sequences can evolve expression comparable to the wild-type with only one mutation, and that ~10% of random sequences can serve as active promoters even without evolution. Such a short mutational distance between random sequences and active promoters may improve the evolvability, yet may also lead to accidental promoters inside genes that interfere with normal expression. Indeed, our bioinformatic analyses indicate that E. coli was under selection to reduce accidental promoters inside genes by avoiding promoter-like sequences. We suggest that a low threshold for functionality balanced by selection against undesired targets can increase the evolvability by making new beneficial features more accessible. Bacterial promoters initiate gene transcription and have distinct sequence features. Here, the authors show that random sequences that contain no information are just on the verge of functioning as promoters in Escherichia coli.
Collapse
Affiliation(s)
- Avihu H Yona
- Physics of Living Systems, Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA. .,Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| | - Eric J Alm
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Jeff Gore
- Physics of Living Systems, Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| |
Collapse
|
45
|
DNA partitions into triplets under tension in the presence of organic cations, with sequence evolutionary age predicting the stability of the triplet phase. Q Rev Biophys 2018; 50:e15. [PMID: 29233227 DOI: 10.1017/s0033583517000130] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Using atomistic simulations, we show the formation of stable triplet structure when particular GC-rich DNA duplexes are extended in solution over a timescale of hundreds of nanoseconds, in the presence of organic salt. We present planar-stacked triplet disproportionated DNA (Σ DNA) as a possible solution phase of the double helix under tension, subject to sequence and the presence of stabilising co-factors. Considering the partitioning of the duplexes into triplets of base pairs as the first step of operation of recombinase enzymes like RecA, we emphasise the structure-function relationship in Σ DNA. We supplement atomistic calculations with thermodynamic arguments to show that codons for 'phase 1' amino acids (those appearing early in evolution) are more likely than a lower entropy GC-rich sequence to form triplets under tension. We further observe that the four amino acids supposed (in the 'GADV world' hypothesis) to constitute the minimal set to produce functional globular proteins have the strongest triplet-forming propensity within the phase 1 set, showing a series of decreasing triplet propensity with evolutionary newness. The weak form of our observation provides a physical mechanism to minimise read frame and recombination alignment errors in the early evolution of the genetic code.
Collapse
|
46
|
Borello U, Berarducci B, Delahaye E, Price DJ, Dehay C. SP8 Transcriptional Regulation of Cyclin D1 During Mouse Early Corticogenesis. Front Neurosci 2018; 12:119. [PMID: 29599703 PMCID: PMC5863514 DOI: 10.3389/fnins.2018.00119] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Accepted: 02/14/2018] [Indexed: 11/13/2022] Open
Abstract
Multiple signals control the balance between proliferation and differentiation of neural progenitor cells during corticogenesis. A key point of this regulation is the control of G1 phase length, which is regulated by the Cyclin/Cdks complexes. Using genome-wide chromatin immunoprecipitation assay and mouse genetics, we have explored the transcriptional regulation of Cyclin D1 (Ccnd1) during the early developmental stages of the mouse cerebral cortex. We found evidence that SP8 binds to the Ccnd1 locus on exon regions. In vitro experiments show SP8 binding activity on Ccnd1 gene 3'-end, and point to a putative role for SP8 in modulating PAX6-mediated repression of Ccnd1 along the dorso-ventral axis of the developing pallium, creating a medialLow-lateralHigh gradient of neuronal differentiation. Activation of Ccnd1 through the promoter/5'-end of the gene does not depend on SP8, but on βcatenin (CTNNB1). Importantly, alteration of the Sp8 level of expression in vivo affects Ccnd1 expression during early corticogenesis. Our results indicate that Ccnd1 regulation is the result of multiple signals and that SP8 is a player in this regulation, revealing an unexpected and potentially novel mechanism of transcriptional activation.
Collapse
Affiliation(s)
- Ugo Borello
- Université de Lyon, Université Claude Bernard Lyon 1, Inserm, Stem Cell and Brain Research Institute U1208, Bron, France
- Inovarion, Paris, France
| | - Barbara Berarducci
- Université de Lyon, Université Claude Bernard Lyon 1, Inserm, Stem Cell and Brain Research Institute U1208, Bron, France
| | - Edwige Delahaye
- Université de Lyon, Université Claude Bernard Lyon 1, Inserm, Stem Cell and Brain Research Institute U1208, Bron, France
| | - David J. Price
- Centre for Integrative Physiology, University of Edinburgh, Edinburgh, United Kingdom
| | - Colette Dehay
- Université de Lyon, Université Claude Bernard Lyon 1, Inserm, Stem Cell and Brain Research Institute U1208, Bron, France
| |
Collapse
|
47
|
Triplet-Based Codon Organization Optimizes the Impact of Synonymous Mutation on Nucleic Acid Molecular Dynamics. J Mol Evol 2018; 86:91-102. [PMID: 29344693 PMCID: PMC5846835 DOI: 10.1007/s00239-018-9828-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 01/06/2018] [Indexed: 11/22/2022]
Abstract
Since the elucidation of the genetic code almost 50 years ago, many nonrandom aspects of its codon organization remain only partly resolved. Here, we investigate the recent hypothesis of ‘dual-use’ codons which proposes that in addition to allowing adjustment of codon optimization to tRNA abundance, the degeneracy in the triplet-based genetic code also multiplexes information regarding DNA’s helical shape and protein-binding dynamics while avoiding interference with other protein-level characteristics determined by amino acid properties. How such structural optimization of the code within eukaryotic chromatin could have arisen from an RNA world is a mystery, but would imply some preadaptation in an RNA context. We analyzed synonymous (protein-silent) and nonsynonymous (protein-altering) mutational impacts on molecular dynamics in 13823 identically degenerate alternative codon reorganizations, defined by codon transitions in 7680 GPU-accelerated molecular dynamic simulations of implicitly and explicitly solvated double-stranded aRNA and bDNA structures. When compared to all possible alternative codon assignments, the standard genetic code minimized the impact of synonymous mutations on the random atomic fluctuations and correlations of carbon backbone vector trajectories while facilitating the specific movements that contribute to DNA polymer flexibility. This trend was notably stronger in the context of RNA supporting the idea that dual-use codon optimization and informational multiplexing in DNA resulted from the preadaptation of the RNA duplex to resist changes to thermostability. The nonrandom and divergent molecular dynamics of synonymous mutations also imply that the triplet-based code may have resulted from adaptive functional expansion enabling a primordial doublet code to multiplex gene regulatory information via the shape and charge of the minor groove.
Collapse
|
48
|
de Oliveira LL, Freitas AA, Tinós R. Multi-objective genetic algorithms in the study of the genetic code’s adaptability. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2017.10.022] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
49
|
Bijective codon transformations show genetic code symmetries centered on cytosine's coding properties. Theory Biosci 2017; 137:17-31. [PMID: 29147851 DOI: 10.1007/s12064-017-0258-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Accepted: 11/13/2017] [Indexed: 12/11/2022]
Abstract
Homology of some RNAs with template DNA requires systematic exchanges between nucleotides. Such exchanges produce 'swinger' RNA along 23 bijective transformations (nine symmetric, X ↔ Y; and 14 asymmetric, X → Y → Z → X, for example A ↔ C and A → C → G → A, respectively). Here, analyses compare amino acids coded by swinger-transformed codons to those coded by untransformed codons, defining coding invariance after transformations. Swinger transformations cluster according to coding invariance in four groups characterized by transformations into cytosine (C = C, T → C, A → C, and G → C). C's central mutational coding role shows that swinger transformations constrained genetic code genesis. Coding invariance post-transformations correlate positively/negatively with mitochondrial swinger transcription/lepidosaurian body temperature. Presumably, low/high temperatures stabilize/revert rare swinger polymerization modes, producing long swinger sequences/point mutations, respectively. Coding invariance after swinger transformations might compensate effects of swinger polymerizations in species with low body temperatures. Hypothetically, swinger transcription increased coding potential of RNA self-replicating protolife systems under heating/cooling cycles.
Collapse
|
50
|
Seligmann H, Warthi G. Genetic Code Optimization for Cotranslational Protein Folding: Codon Directional Asymmetry Correlates with Antiparallel Betasheets, tRNA Synthetase Classes. Comput Struct Biotechnol J 2017; 15:412-424. [PMID: 28924459 PMCID: PMC5591391 DOI: 10.1016/j.csbj.2017.08.001] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Revised: 07/20/2017] [Accepted: 08/05/2017] [Indexed: 12/14/2022] Open
Abstract
A new codon property, codon directional asymmetry in nucleotide content (CDA), reveals a biologically meaningful genetic code dimension: palindromic codons (first and last nucleotides identical, codon structure XZX) are symmetric (CDA = 0), codons with structures ZXX/XXZ are 5'/3' asymmetric (CDA = - 1/1; CDA = - 0.5/0.5 if Z and X are both purines or both pyrimidines, assigning negative/positive (-/+) signs is an arbitrary convention). Negative/positive CDAs associate with (a) Fujimoto's tetrahedral codon stereo-table; (b) tRNA synthetase class I/II (aminoacylate the 2'/3' hydroxyl group of the tRNA's last ribose, respectively); and (c) high/low antiparallel (not parallel) betasheet conformation parameters. Preliminary results suggest CDA-whole organism associations (body temperature, developmental stability, lifespan). Presumably, CDA impacts spatial kinetics of codon-anticodon interactions, affecting cotranslational protein folding. Some synonymous codons have opposite CDA sign (alanine, leucine, serine, and valine), putatively explaining how synonymous mutations sometimes affect protein function. Correlations between CDA and tRNA synthetase classes are weaker than between CDA and antiparallel betasheet conformation parameters. This effect is stronger for mitochondrial genetic codes, and potentially drives mitochondrial codon-amino acid reassignments. CDA reveals information ruling nucleotide-protein relations embedded in reversed (not reverse-complement) sequences (5'-ZXX-3'/5'-XXZ-3').
Collapse
Affiliation(s)
- Hervé Seligmann
- Aix-Marseille Univ, Unité de Recherche sur les Maladies Infectieuses et Tropicales Emergentes, UM 63, CNRS UMR7278, IRD 198, INSERM U1095, Institut Hospitalo-Universitaire Méditerranée-Infection, Marseille, Postal code 13385, France
- Dept. Ecol Evol Behav, Alexander Silberman Inst Life Sci, The Hebrew University of Jerusalem, IL-91904 Jerusalem, Israel
| | - Ganesh Warthi
- Aix-Marseille Univ, Unité de Recherche sur les Maladies Infectieuses et Tropicales Emergentes, UM 63, CNRS UMR7278, IRD 198, INSERM U1095, Institut Hospitalo-Universitaire Méditerranée-Infection, Marseille, Postal code 13385, France
| |
Collapse
|