1
|
Husev M, Rovenchak A. On the Verge of Life: Distribution of Nucleotide Sequences in Viral RNAs. BIOSEMIOTICS 2021; 14:253-269. [PMID: 33613787 PMCID: PMC7887720 DOI: 10.1007/s12304-021-09403-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 01/25/2021] [Indexed: 06/12/2023]
Abstract
The aim of the study is to analyze viruses using parameters obtained from distributions of nucleotide sequences in the viral RNA. Seeking for the input data homogeneity, we analyze single-stranded RNA viruses only. Two approaches are used to obtain the nucleotide sequences; In the first one, chunks of equal length (four nucleotides) are considered. In the second approach, the whole RNA genome is divided into parts by adenine or the most frequent nucleotide as a "space". Rank-frequency distributions are studied in both cases. The defined nucleotide sequences are signs comparable to a certain extent to syllables or words as seen from the nature of their rank-frequency distributions. Within the first approach, the Pólya and the negative hypergeometric distribution yield the best fit. For the distributions obtained within the second approach, we have calculated a set of parameters, including entropy, mean sequence length, and its dispersion. The calculated parameters became the basis for the classification of viruses. We observed that proximity of viruses on planes spanned on various pairs of parameters corresponds to related species. In certain cases, such a proximity is observed for unrelated species as well calling thus for the expansion of the set of parameters used in the classification. We also observed that the fifth most frequent nucleotide sequences obtained within the second approach are of different nature in case of human coronaviruses (different nucleotides for MERS, SARS-CoV, and SARS-CoV-2 versus identical nucleotides for four other coronaviruses). We expect that our findings will be useful as a supplementary tool in the classification of diseases caused by RNA viruses with respect to severity and contagiousness.
Collapse
Affiliation(s)
- Mykola Husev
- Department for Theoretical Physics, Ivan Franko National University of Lviv, 12 Drahomanov St, UA-79005 Lviv, Ukraine
| | - Andrij Rovenchak
- Department for Theoretical Physics, Ivan Franko National University of Lviv, 12 Drahomanov St, UA-79005 Lviv, Ukraine
| |
Collapse
|
2
|
Auboeuf D. Physicochemical Foundations of Life that Direct Evolution: Chance and Natural Selection are not Evolutionary Driving Forces. Life (Basel) 2020; 10:life10020007. [PMID: 31973071 PMCID: PMC7175370 DOI: 10.3390/life10020007] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 01/15/2020] [Accepted: 01/16/2020] [Indexed: 12/11/2022] Open
Abstract
The current framework of evolutionary theory postulates that evolution relies on random mutations generating a diversity of phenotypes on which natural selection acts. This framework was established using a top-down approach as it originated from Darwinism, which is based on observations made of complex multicellular organisms and, then, modified to fit a DNA-centric view. In this article, it is argued that based on a bottom-up approach starting from the physicochemical properties of nucleic and amino acid polymers, we should reject the facts that (i) natural selection plays a dominant role in evolution and (ii) the probability of mutations is independent of the generated phenotype. It is shown that the adaptation of a phenotype to an environment does not correspond to organism fitness, but rather corresponds to maintaining the genome stability and integrity. In a stable environment, the phenotype maintains the stability of its originating genome and both (genome and phenotype) are reproduced identically. In an unstable environment (i.e., corresponding to variations in physicochemical parameters above a physiological range), the phenotype no longer maintains the stability of its originating genome, but instead influences its variations. Indeed, environment- and cellular-dependent physicochemical parameters define the probability of mutations in terms of frequency, nature, and location in a genome. Evolution is non-deterministic because it relies on probabilistic physicochemical rules, and evolution is driven by a bidirectional interplay between genome and phenotype in which the phenotype ensures the stability of its originating genome in a cellular and environmental physicochemical parameter-depending manner.
Collapse
Affiliation(s)
- Didier Auboeuf
- Laboratory of Biology and Modelling of the Cell, Univ Lyon, ENS de Lyon, Univ Claude Bernard, CNRS UMR 5239, INSERM U1210, 46 Allée d'Italie, Site Jacques Monod, F-69007, Lyon, France
| |
Collapse
|
3
|
Preiner M, Xavier JC, Vieira ADN, Kleinermanns K, Allen JF, Martin WF. Catalysts, autocatalysis and the origin of metabolism. Interface Focus 2019; 9:20190072. [PMID: 31641438 PMCID: PMC6802133 DOI: 10.1098/rsfs.2019.0072] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Accepted: 08/30/2019] [Indexed: 12/24/2022] Open
Abstract
If life on Earth started out in geochemical environments like hydrothermal vents, then it started out from gasses like CO2, N2 and H2. Anaerobic autotrophs still live from these gasses today, and they still inhabit the Earth's crust. In the search for connections between abiotic processes in ancient geological systems and biotic processes in biological systems, it becomes evident that chemical activation (catalysis) of these gasses and a constant source of energy are key. The H2–CO2 redox reaction provides a constant source of energy and anabolic inputs, because the equilibrium lies on the side of reduced carbon compounds. Identifying geochemical catalysts that activate these gasses en route to nitrogenous organic compounds and small autocatalytic networks will be an important step towards understanding prebiotic chemistry that operates only on the basis of chemical energy, without input from solar radiation. So, if life arose in the dark depths of hydrothermal vents, then understanding reactions and catalysts that operate under such conditions is crucial for understanding origins.
Collapse
Affiliation(s)
- Martina Preiner
- Institute for Molecular Evolution, Heinrich-Heine-University, 40225 Düsseldorf, Germany
| | - Joana C Xavier
- Institute for Molecular Evolution, Heinrich-Heine-University, 40225 Düsseldorf, Germany
| | | | - Karl Kleinermanns
- Institute for Physical Chemistry, Heinrich-Heine-University, 40225 Düsseldorf, Germany
| | - John F Allen
- Research Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
| | - William F Martin
- Institute for Molecular Evolution, Heinrich-Heine-University, 40225 Düsseldorf, Germany
| |
Collapse
|
4
|
Mezquita-Pla J. Gordon H. Dixon's trace in my personal career and the quantic jump experienced in regulatory information. Syst Biol Reprod Med 2018; 64:448-468. [PMID: 30136864 DOI: 10.1080/19396368.2018.1503752] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Even before Rosalin Franklin had discovered the DNA double helix, in her impressive X-ray diffraction image pattern, Erwin Schröedinger, described, in his excellent book, What is Life, how the finding of aperiodic crystals in biological systems surprised him (an aperiodic crystal, which, in my opinion is the material carrier of life). In the 21st century and still far from being able to define life, we are attending to a quick acceleration of knowledge on regulatory information. With the discovery of new codes and punctuation marks, we will greatly increase our understanding in front of an impressive avalanche of genomic sequences. Trifonov et al. defined a genetic code as a widespread DNA sequence pattern that carries a message with an impact on biology. These patterns are largely captured in transcribed messages that give meaning and identity to the particular cells. In this review, I will go through my personal career in and after my years of work in the laboratory of Gordon H. Dixon, extending toward the impressive acquisition of new knowledge on regulatory information and genetic codes provided by remarkable scientists in the field. Abbreviations: CA II: carbonic anhydridase II (chicken); Car2: carbonic anhydridase 2 (mouse); CpG islands: short (>0.5 kb) stretches of DNA with a G+C content ≥55%; DNMT1: DNA methyltransferases 1; DNMT3b: DNA methyltransferases 3B; DSB: double-strand DNA breaks; ERT: endogenous retrotransposon; ERV: endogenous retroviruses; ES cells: embryonic stem cells; GAPDH: glyceraldehide phosphate dehydrogenase; H1: histone H1; HATs: histone acetyltransferases; HDACs: histone deacetylases; H3K4me3: histone 3 trimethylated at lys 4; H3K79me2: histone 3 dimethylated at lys 79; HMG: high mobility group proteins; HMT: histone methyltransferase; HP1: heterochromatin protein 1; HR: homologous recombination; HSE: heat-shock element; ICRs: imprinted control regions; IRF: interferon regulatory factor; LDH-A/-B: lactate dehydrogenase A/B; LTR: long terminal repeats; MeCP2: methyl CpG binding protein 2; OCT4: octamer-binding transcription factor 4; PAF1: RNA Polymerase II associated factor 1; piRNA: PIWI-interacting RNA; poly(A) tails: poly-adenine tails; PRC2: polycomb repressive complex 2; PTMs: post-translational modifications; SIRT 1: sirtuin 1, silent information regulator; STAT3: signal transducer and activator of transcription; tRNAs: transfer RNA; tRFs: tRNA-derived fragments; TSS: transcription start site; TE: transposable elements; UB I: polyubiquitin I; UB II: polyubiquitin II; UBE 2N: ubiquitin conjugating enzyme E2N; 5'-UTR: 5'-untranslated sequences; 3'-UTR: 3'-untranslated sequences.
Collapse
Affiliation(s)
- Jovita Mezquita-Pla
- a Molecular Genetics and Control of Pluripotency Laboratory, Department of Biomedicine, IDIBAPS, Faculty of Medicine , University of Barcelona , Catalonia , Spain
| |
Collapse
|
5
|
Faure G, Ogurtsov AY, Shabalina SA, Koonin EV. Adaptation of mRNA structure to control protein folding. RNA Biol 2017; 14:1649-1654. [PMID: 28722509 DOI: 10.1080/15476286.2017.1349047] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
Abstract
Comparison of mRNA and protein structures shows that highly structured mRNAs typically encode compact protein domains suggesting that mRNA structure controls protein folding. This function is apparently performed by distinct structural elements in the mRNA, which implies 'fine tuning' of mRNA structure under selection for optimal protein folding. We find that, during evolution, changes in the mRNA folding energy follow amino acid replacements, reinforcing the notion of an intimate connection between the structures of a mRNA and the protein it encodes, and the double encoding of protein sequence and folding in the mRNA.
Collapse
Affiliation(s)
- Guilhem Faure
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Bethesda , MD , USA
| | - Aleksey Y Ogurtsov
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Bethesda , MD , USA
| | - Svetlana A Shabalina
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Bethesda , MD , USA
| | - Eugene V Koonin
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Bethesda , MD , USA
| |
Collapse
|
6
|
Raabe CA, Brosius J. Does every transcript originate from a gene? Ann N Y Acad Sci 2015; 1341:136-48. [PMID: 25847549 DOI: 10.1111/nyas.12741] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Revised: 02/05/2015] [Accepted: 02/11/2015] [Indexed: 12/20/2022]
Abstract
Outdated gene definitions favored regions corresponding to mature messenger RNAs, in particular, the open reading frame. In eukaryotes, the intergenic space was widely regarded nonfunctional and devoid of RNA transcription. Original concepts were based on the assumption that RNA expression was restricted to known protein-coding genes and a few so-called structural RNA genes, such as ribosomal RNAs or transfer RNAs. With the discovery of introns and, more recently, sensitive techniques for monitoring genome-wide transcription, this view had to be substantially modified. Tiling microarrays and RNA deep sequencing revealed myriads of transcripts, which cover almost entire genomes. The tremendous complexity of non-protein-coding RNA transcription has to be integrated into novel gene definitions. Despite an ever-growing list of functional RNAs, questions concerning the mass of identified transcripts are under dispute. Here, we examined genome-wide transcription from various angles, including evolutionary considerations, and suggest, in analogy to novel alternative splice variants that do not persist, that the vast majority of transcripts represent raw material for potential, albeit rare, exaptation events.
Collapse
Affiliation(s)
- Carsten A Raabe
- Institute of Experimental Pathology, ZMBE, University of Münster, Münster, Germany
| | | |
Collapse
|
7
|
Frenkel ZM, Barzily Z, Volkovich Z, Trifonov EN. Hidden ancient repeats in DNA: Mapping and quantification. Gene 2013; 528:282-7. [DOI: 10.1016/j.gene.2013.06.059] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2013] [Accepted: 06/21/2013] [Indexed: 01/27/2023]
|
8
|
Shabalina SA, Spiridonov NA, Kashina A. Sounds of silence: synonymous nucleotides as a key to biological regulation and complexity. Nucleic Acids Res 2013; 41:2073-94. [PMID: 23293005 PMCID: PMC3575835 DOI: 10.1093/nar/gks1205] [Citation(s) in RCA: 187] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Messenger RNA is a key component of an intricate regulatory network of its own. It accommodates numerous nucleotide signals that overlap protein coding sequences and are responsible for multiple levels of regulation and generation of biological complexity. A wealth of structural and regulatory information, which mRNA carries in addition to the encoded amino acid sequence, raises the question of how these signals and overlapping codes are delineated along non-synonymous and synonymous positions in protein coding regions, especially in eukaryotes. Silent or synonymous codon positions, which do not determine amino acid sequences of the encoded proteins, define mRNA secondary structure and stability and affect the rate of translation, folding and post-translational modifications of nascent polypeptides. The RNA level selection is acting on synonymous sites in both prokaryotes and eukaryotes and is more common than previously thought. Selection pressure on the coding gene regions follows three-nucleotide periodic pattern of nucleotide base-pairing in mRNA, which is imposed by the genetic code. Synonymous positions of the coding regions have a higher level of hybridization potential relative to non-synonymous positions, and are multifunctional in their regulatory and structural roles. Recent experimental evidence and analysis of mRNA structure and interspecies conservation suggest that there is an evolutionary tradeoff between selective pressure acting at the RNA and protein levels. Here we provide a comprehensive overview of the studies that define the role of silent positions in regulating RNA structure and processing that exert downstream effects on proteins and their functions.
Collapse
Affiliation(s)
- Svetlana A Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20984, USA.
| | | | | |
Collapse
|
9
|
Caporale LH. Overview of the creative genome: effects of genome structure and sequence on the generation of variation and evolution. Ann N Y Acad Sci 2012; 1267:1-10. [PMID: 22954209 DOI: 10.1111/j.1749-6632.2012.06749.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
This overview of a special issue of Annals of the New York Academy of Sciences discusses uneven distribution of distinct types of variation across the genome, the dependence of specific types of variation upon distinct classes of DNA sequences and/or the induction of specific proteins, the circumstances in which distinct variation-generating systems are activated, and the implications of this work for our understanding of evolution and of cancer. Also discussed is the value of non text-based computational methods for analyzing information carried by DNA, early insights into organizational frameworks that affect genome behavior, and implications of this work for comparative genomics.
Collapse
|