1
|
Vakirlis N, Kupczok A. Large-scale investigation of species-specific orphan genes in the human gut microbiome elucidates their evolutionary origins. Genome Res 2024; 34:888-903. [PMID: 38977308 DOI: 10.1101/gr.278977.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 06/12/2024] [Indexed: 07/10/2024]
Abstract
Species-specific genes, also known as orphans, are ubiquitous across life's domains. In prokaryotes, species-specific orphan genes (SSOGs) are mostly thought to originate in external elements such as viruses followed by horizontal gene transfer, whereas the scenario of native origination, through rapid divergence or de novo, is mostly dismissed. However, quantitative evidence supporting either scenario is lacking. Here, we systematically analyzed genomes from 4644 human gut microbiome species and identified more than 600,000 unique SSOGs, representing an average of 2.6% of a given species' pangenome. These sequences are mostly rare within each species yet show signs of purifying selection. Overall, SSOGs use optimal codons less frequently, and their proteins are more disordered than those of conserved genes (i.e., non-SSOGs). Importantly, across species, the GC content of SSOGs closely matches that of conserved ones. In contrast, the ∼5% of SSOGs that share similarity to known viral sequences have distinct characteristics, including lower GC content. Thus, SSOGs with similarity to viruses differ from the remaining SSOGs, contrasting an external origination scenario for most of them. By examining the orthologous genomic region in closely related species, we show that a small subset of SSOGs likely evolved natively de novo and find that these genes also differ in their properties from the remaining SSOGs. Our results challenge the notion that external elements are the dominant source of prokaryotic genetic novelty and will enable future studies into the biological role and relevance of species-specific genes in the human gut.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Institute For Fundamental Biomedical Research, B.S.R.C. "Alexander Fleming," Vari 166 72, Greece;
- Institute for General Microbiology, Kiel University, 24118 Kiel, Germany
| | - Anne Kupczok
- Bioinformatics Group, Wageningen University, 6700 PB Wageningen, The Netherlands
| |
Collapse
|
2
|
Iyengar BR, Grandchamp A, Bornberg-Bauer E. How antisense transcripts can evolve to encode novel proteins. Nat Commun 2024; 15:6187. [PMID: 39043684 PMCID: PMC11266595 DOI: 10.1038/s41467-024-50550-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 07/12/2024] [Indexed: 07/25/2024] Open
Abstract
Protein coding features can emerge de novo in non coding transcripts, resulting in emergence of new protein coding genes. Studies across many species show that a large fraction of evolutionarily novel non-coding RNAs have an antisense overlap with protein coding genes. The open reading frames (ORFs) in these antisense RNAs could also overlap with existing ORFs. In this study, we investigate how the evolution an ORF could be constrained by its overlap with an existing ORF in three different reading frames. Using a combination of mathematical modeling and genome/transcriptome data analysis in two different model organisms, we show that antisense overlap can increase the likelihood of ORF emergence and reduce the likelihood of ORF loss, especially in one of the three reading frames. In addition to rationalising the repeatedly reported prevalence of de novo emerged genes in antisense transcripts, our work also provides a generic modeling and an analytical framework that can be used to understand evolution of antisense genes.
Collapse
Affiliation(s)
- Bharat Ravi Iyengar
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, Germany.
| | - Anna Grandchamp
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, Germany
- Aix-Marseille Université, INSERM, TAGC, Marseille, France
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, Münster, Germany
- Department of Protein Evolution, Max Planck Institute for Biology Tübingen, Max-Planck-Ring 5, Tübingen, Germany
| |
Collapse
|
3
|
Legarda EG, Elena SF, Mushegian AR. Emergence of two distinct spatial folds in a pair of plant virus proteins encoded by nested genes. J Biol Chem 2024; 300:107218. [PMID: 38522515 PMCID: PMC11044054 DOI: 10.1016/j.jbc.2024.107218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/15/2024] [Accepted: 03/19/2024] [Indexed: 03/26/2024] Open
Abstract
Virus genomes may encode overlapping or nested open reading frames that increase their coding capacity. It is not known whether the constraints on spatial structures of the two encoded proteins limit the evolvability of nested genes. We examine the evolution of a pair of proteins, p22 and p19, encoded by nested genes in plant viruses from the genus Tombusvirus. The known structure of p19, a suppressor of RNA silencing, belongs to the RAGNYA fold from the alpha+beta class. The structure of p22, the cell-to-cell movement protein from the 30K family widespread in plant viruses, is predicted with the AlphaFold approach, suggesting a single jelly-roll fold core from the all-beta class, structurally similar to capsid proteins from plant and animal viruses. The nucleotide and codon preferences impose modest constraints on the types of secondary structures encoded in the alternative reading frames, nonetheless allowing for compact, well-ordered folds from different structural classes in two similarly-sized nested proteins. Tombusvirus p22 emerged through radiation of the widespread 30K family, which evolved by duplication of a virus capsid protein early in the evolution of plant viruses, whereas lineage-specific p19 may have emerged by a stepwise increase in the length of the overprinted gene and incremental acquisition of functionally active secondary structure elements by the protein product. This evolution of p19 toward the RAGNYA fold represents one of the first documented examples of protein structure convergence in naturally occurring proteins.
Collapse
Affiliation(s)
- Esmeralda G Legarda
- Instituto de Biología Integrativa de Sistemas (I2SysBio), CSIC-Universitat de València, Paterna, València, Spain
| | - Santiago F Elena
- Instituto de Biología Integrativa de Sistemas (I2SysBio), CSIC-Universitat de València, Paterna, València, Spain; The Santa Fe Institute, Santa Fe, New Mexico, USA
| | - Arcady R Mushegian
- Division of Molecular and Cellular Biosciences, National Science Foundation, Arlington, Virginia, USA.
| |
Collapse
|
4
|
uz-Zaman MH, D’Alton S, Barrick JE, Ochman H. Promoter recruitment drives the emergence of proto-genes in a long-term evolution experiment with Escherichia coli. PLoS Biol 2024; 22:e3002418. [PMID: 38713714 PMCID: PMC11101190 DOI: 10.1371/journal.pbio.3002418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 05/17/2024] [Accepted: 04/18/2024] [Indexed: 05/09/2024] Open
Abstract
The phenomenon of de novo gene birth-the emergence of genes from non-genic sequences-has received considerable attention due to the widespread occurrence of genes that are unique to particular species or genomes. Most instances of de novo gene birth have been recognized through comparative analyses of genome sequences in eukaryotes, despite the abundance of novel, lineage-specific genes in bacteria and the relative ease with which bacteria can be studied in an experimental context. Here, we explore the genetic record of the Escherichia coli long-term evolution experiment (LTEE) for changes indicative of "proto-genic" phases of new gene birth in which non-genic sequences evolve stable transcription and/or translation. Over the time span of the LTEE, non-genic regions are frequently transcribed, translated and differentially expressed, with levels of transcription across low-expressed regions increasing in later generations of the experiment. Proto-genes formed downstream of new mutations result either from insertion element activity or chromosomal translocations that fused preexisting regulatory sequences to regions that were not expressed in the LTEE ancestor. Additionally, we identified instances of proto-gene emergence in which a previously unexpressed sequence was transcribed after formation of an upstream promoter, although such cases were rare compared to those caused by recruitment of preexisting promoters. Tracing the origin of the causative mutations, we discovered that most occurred early in the history of the LTEE, often within the first 20,000 generations, and became fixed soon after emergence. Our findings show that proto-genes emerge frequently within evolving populations, can persist stably, and can serve as potential substrates for new gene formation.
Collapse
Affiliation(s)
- Md. Hassan uz-Zaman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Simon D’Alton
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Jeffrey E. Barrick
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Howard Ochman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
5
|
Takeuchi N, Fullmer MS, Maddock DJ, Poole AM. The Constructive Black Queen hypothesis: new functions can evolve under conditions favouring gene loss. THE ISME JOURNAL 2024; 18:wrae011. [PMID: 38366199 PMCID: PMC10942775 DOI: 10.1093/ismejo/wrae011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 01/17/2024] [Accepted: 01/19/2024] [Indexed: 02/18/2024]
Abstract
Duplication is a major route for the emergence of new gene functions. However, the emergence of new gene functions via this route may be reduced in prokaryotes, as redundant genes are often rapidly purged. In lineages with compact, streamlined genomes, it thus appears challenging for novel function to emerge via duplication and divergence. A further pressure contributing to gene loss occurs under Black Queen dynamics, as cheaters that lose the capacity to produce a public good can instead acquire it from neighbouring producers. We propose that Black Queen dynamics can favour the emergence of new function because, under an emerging Black Queen dynamic, there is high gene redundancy spread across a community of interacting cells. Using computational modelling, we demonstrate that new gene functions can emerge under Black Queen dynamics. This result holds even if there is deletion bias due to low duplication rates and selection against redundant gene copies resulting from the high cost associated with carrying a locus. However, when the public good production costs are high, Black Queen dynamics impede the fixation of new functions. Our results expand the mechanisms by which new gene functions can emerge in prokaryotic systems.
Collapse
Affiliation(s)
- Nobuto Takeuchi
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
- Universal Biology Institute, University of Tokyo, Tokyo 113-0033, Japan
- Department of Biology, Faculty of Sciences, Kyushu University, Fukuoka 819-0395, Japan
| | - Matthew S Fullmer
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
| | - Danielle J Maddock
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
| | - Anthony M Poole
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
6
|
Bukhnikashvili L. Overlaps Between CDS Regions of Protein-Coding Genes in the Human Genome: A Case Study on the NR1D1-THRA Gene Pair. J Mol Evol 2023; 91:963-975. [PMID: 38006429 DOI: 10.1007/s00239-023-10147-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 11/12/2023] [Indexed: 11/27/2023]
Abstract
For several decades, it has been known that a substantial number of genes within human DNA exhibit overlap; however, the biological and evolutionary significance of these overlaps remain poorly understood. This study focused on investigating specific instances of overlap where the overlapping DNA region encompasses the coding DNA sequences (CDSs) of protein-coding genes. The results revealed that proteins encoded by overlapping CDSs exhibit greater disorder than those from nonoverlapping CDSs. Additionally, these DNA regions were identified as GC-rich. This could be partially attributed to the absence of stop codons from two distinct reading frames rather than one. Furthermore, these regions were found to harbour fewer single-nucleotide polymorphism (SNP) sites, possibly due to constraints arising from the overlapping state where mutations could affect two genes simultaneously.While elucidating these properties, the NR1D1-THRA gene pair emerged as an exceptional case with highly structured proteins and a distinctly conserved sequence across eutherian mammals. Both NR1D1 and THRA are nuclear receptors lacking a ligand-binding domain at their C-terminus, which is the region where these gene pairs overlap. The NR1D1 gene is involved in the regulation of circadian rhythm, while the THRA gene encodes a thyroid hormone receptor, and both play crucial roles in various physiological processes. This study suggests that, in addition to their well-established functions, the specifically overlapping CDS regions of these genes may encode protein segments with additional, yet undiscovered, biological roles.
Collapse
|
7
|
Uz-Zaman MH, D'Alton S, Barrick JE, Ochman H. Promoter capture drives the emergence of proto-genes in Escherichia coli. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.15.567300. [PMID: 38013999 PMCID: PMC10680751 DOI: 10.1101/2023.11.15.567300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
The phenomenon of de novo gene birth-the emergence of genes from non-genic sequences-has received considerable attention due to the widespread occurrence of genes that are unique to particular species or genomes. Most instances of de novo gene birth have been recognized through comparative analyses of genome sequences in eukaryotes, despite the abundance of novel, lineage-specific genes in bacteria and the relative ease with which bacteria can be studied in an experimental context. Here, we explore the genetic record of the Escherichia coli Long-Term Evolution Experiment (LTEE) for changes indicative of "proto-genic" phases of new gene birth in which non-genic sequences evolve stable transcription and/or translation. Over the time-span of the LTEE, non-genic regions are frequently transcribed, translated and differentially expressed, thereby serving as raw material for new gene emergence. Most proto-genes result either from insertion element activity or chromosomal translocations that fused pre-existing regulatory sequences to regions that were not expressed in the LTEE ancestor. Additionally, we identified instances of proto-gene emergence in which a previously unexpressed sequence was transcribed after formation of an upstream promoter. Tracing the origin of the causative mutations, we discovered that most occurred early in the history of the LTEE, often within the first 20,000 generations, and became fixed soon after emergence. Our findings show that proto-genes emerge frequently within evolving populations, persist stably, and can serve as potential substrates for new gene formation.
Collapse
|
8
|
Ardern Z. Alternative Reading Frames are an Underappreciated Source of Protein Sequence Novelty. J Mol Evol 2023; 91:570-580. [PMID: 37326679 DOI: 10.1007/s00239-023-10122-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 05/31/2023] [Indexed: 06/17/2023]
Abstract
Protein-coding DNA sequences can be translated into completely different amino acid sequences if the nucleotide triplets used are shifted by a non-triplet amount on the same DNA strand or by translating codons from the opposite strand. Such "alternative reading frames" of protein-coding genes are a major contributor to the evolution of novel protein products. Recent studies demonstrating this include examples across the three domains of cellular life and in viruses. These sequences increase the number of trials potentially available for the evolutionary invention of new genes and also have unusual properties which may facilitate gene origin. There is evidence that the structure of the standard genetic code contributes to the features and gene-likeness of some alternative frame sequences. These findings have important implications across diverse areas of molecular biology, including for genome annotation, structural biology, and evolutionary genomics.
Collapse
|
9
|
Muñoz-Baena L, Wade KE, Poon AFY. HexSE: Simulating evolution in overlapping reading frames. Virus Evol 2023; 9:vead009. [PMID: 36846827 PMCID: PMC9949996 DOI: 10.1093/ve/vead009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 01/11/2023] [Accepted: 01/27/2023] [Indexed: 02/04/2023] Open
Abstract
Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may provide a mechanism to increase the information content of compact genomes. The presence of overlapping reading frames (OvRFs) can skew estimates of selection based on the rates of non-synonymous and synonymous substitutions, since a substitution that is synonymous in one reading frame may be non-synonymous in another and vice versa. To understand the impact of OvRFs on molecular evolution, we implemented a versatile simulation model of nucleotide sequence evolution along a phylogeny with any distribution of open reading frames in linear or circular genomes. We use a custom data structure to track the substitution rates at every nucleotide site, which is determined by the stationary nucleotide frequencies, transition bias and the distribution of selection biases (dN/dS) in the respective reading frames. Our simulation model is implemented in the Python scripting language. All source code is released under the GNU General Public License version 3 and are available at https://github.com/PoonLab/HexSE.
Collapse
Affiliation(s)
| | - Kaitlyn E Wade
- Department of Pathology and Laboratory Medicine, Western University, Dental Sciences Building 4044, London N6A 5C1, Canada
| | | |
Collapse
|
10
|
Graf F, Zehentner B, Fellner L, Scherer S, Neuhaus K. Three Novel Antisense Overlapping Genes in E. coli O157:H7 EDL933. Microbiol Spectr 2023; 11:e0235122. [PMID: 36533921 PMCID: PMC9927249 DOI: 10.1128/spectrum.02351-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 12/03/2022] [Indexed: 12/23/2022] Open
Abstract
The abundance of long overlapping genes in prokaryotic genomes is likely to be significantly underestimated. To date, only a few examples of such genes are fully established. Using RNA sequencing and ribosome profiling, we found expression of novel overlapping open reading frames in Escherichia coli O157:H7 EDL933 (EHEC). Indeed, the overlapping candidate genes are equipped with typical structural elements required for transcription and translation, i.e., promoters, transcription start sites, as well as terminators, all of which were experimentally verified. Translationally arrested mutants, unable to produce the overlapping encoded protein, were found to have a growth disadvantage when grown competitively against the wild type. Thus, the phenotypes found imply biological functionality of the genes at the level of proteins produced. The addition of 3 more examples of prokaryotic overlapping genes to the currently limited, yet constantly growing pool of such genes emphasizes the underestimated coding capacity of bacterial genomes. IMPORTANCE The abundance of long overlapping genes in prokaryotic genomes is likely to be significantly underestimated, since such genes are not allowed in genome annotations. However, ribosome profiling catches mRNA in the moment of being template for protein production. Using this technique and subsequent experiments, we verified 3 novel overlapping genes encoded in antisense of known genes. This adds more examples of prokaryotic overlapping genes to the currently limited, yet constantly growing pool of such genes.
Collapse
Affiliation(s)
- Franziska Graf
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technische Universität München, Freising, Germany
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Freising, Germany
| | - Barbara Zehentner
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Freising, Germany
| | - Lea Fellner
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Freising, Germany
| | - Siegfried Scherer
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technische Universität München, Freising, Germany
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Freising, Germany
| | - Klaus Neuhaus
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technische Universität München, Freising, Germany
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Freising, Germany
| |
Collapse
|
11
|
W B Jr M, A S R, P M, F B. Cellular and Natural Viral Engineering in Cognition-Based Evolution. Commun Integr Biol 2023; 16:2196145. [PMID: 37153718 PMCID: PMC10155641 DOI: 10.1080/19420889.2023.2196145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2023] Open
Abstract
Neo-Darwinism conceptualizes evolution as the continuous succession of predominately random genetic variations disciplined by natural selection. In that frame, the primary interaction between cells and the virome is relegated to host-parasite dynamics governed by selective influences. Cognition-Based Evolution regards biological and evolutionary development as a reciprocating cognition-based informational interactome for the protection of self-referential cells. To sustain cellular homeorhesis, cognitive cells collaborate to assess the validity of ambiguous biological information. That collective interaction involves coordinate measurement, communication, and active deployment of resources as Natural Cellular Engineering. These coordinated activities drive multicellularity, biological development, and evolutionary change. The virome participates as the vital intercessory among the cellular domains to ensure their shared permanent perpetuation. The interactions between the virome and the cellular domains represent active virocellular cross-communications for the continual exchange of resources. Modular genetic transfers between viruses and cells carry bioactive potentials. Those exchanges are deployed as nonrandom flexible tools among the domains in their continuous confrontation with environmental stresses. This alternative framework fundamentally shifts our perspective on viral-cellular interactions, strengthening established principles of viral symbiogenesis. Pathogenesis can now be properly appraised as one expression of a range of outcomes between cells and viruses within a larger conceptual framework of Natural Viral Engineering as a co-engineering participant with cells. It is proposed that Natural Viral Engineering should be viewed as a co-existent facet of Natural Cellular Engineering within Cognition-Based Evolution.
Collapse
Affiliation(s)
- Miller W B Jr
- Banner Health Systems - Medicine, Paradise Valley, Arizona, AZ, USA
- CONTACT Miller W B Jr Paradise Valley, Arizona, AZ85253, USA
| | - Reber A S
- Department of Psychology, University of British Columbia, Vancouver, BC, Canada
| | - Marshall P
- Department of Engineering, Evolution 2.0, Oak Park, IL, USA
| | - Baluška F
- Institute of Cellular and Molecular Botany, University of Bonn, Bonn, Germany
| |
Collapse
|
12
|
Jayaraman V, Toledo‐Patiño S, Noda‐García L, Laurino P. Mechanisms of protein evolution. Protein Sci 2022; 31:e4362. [PMID: 35762715 PMCID: PMC9214755 DOI: 10.1002/pro.4362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 05/11/2022] [Accepted: 05/14/2022] [Indexed: 11/06/2022]
Abstract
How do proteins evolve? How do changes in sequence mediate changes in protein structure, and in turn in function? This question has multiple angles, ranging from biochemistry and biophysics to evolutionary biology. This review provides a brief integrated view of some key mechanistic aspects of protein evolution. First, we explain how protein evolution is primarily driven by randomly acquired genetic mutations and selection for function, and how these mutations can even give rise to completely new folds. Then, we also comment on how phenotypic protein variability, including promiscuity, transcriptional and translational errors, may also accelerate this process, possibly via "plasticity-first" mechanisms. Finally, we highlight open questions in the field of protein evolution, with respect to the emergence of more sophisticated protein systems such as protein complexes, pathways, and the emergence of pre-LUCA enzymes.
Collapse
Affiliation(s)
- Vijay Jayaraman
- Department of Molecular Cell BiologyWeizmann Institute of ScienceRehovotIsrael
| | - Saacnicteh Toledo‐Patiño
- Protein Engineering and Evolution UnitOkinawa Institute of Science and Technology Graduate UniversityOkinawaJapan
| | - Lianet Noda‐García
- Department of Plant Pathology and Microbiology, Institute of Environmental Sciences, Robert H. Smith Faculty of Agriculture, Food and EnvironmentHebrew University of JerusalemRehovotIsrael
| | - Paola Laurino
- Protein Engineering and Evolution UnitOkinawa Institute of Science and Technology Graduate UniversityOkinawaJapan
| |
Collapse
|
13
|
Muñoz-Baena L, Poon AFY. Using networks to analyze and visualize the distribution of overlapping genes in virus genomes. PLoS Pathog 2022; 18:e1010331. [PMID: 35202429 PMCID: PMC8903798 DOI: 10.1371/journal.ppat.1010331] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 03/08/2022] [Accepted: 02/02/2022] [Indexed: 11/19/2022] Open
Abstract
Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (−0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.
Collapse
Affiliation(s)
- Laura Muñoz-Baena
- Department of Microbiology and Immunology, Western University, London, ON, Canada
| | - Art F. Y. Poon
- Department of Microbiology and Immunology, Western University, London, ON, Canada
- Department of Pathology and Laboratory Medicine, Western University, London, ON, Canada
- * E-mail:
| |
Collapse
|
14
|
Gene Overlapping as a Modulator of Begomovirus Evolution. Microorganisms 2022; 10:microorganisms10020366. [PMID: 35208820 PMCID: PMC8875319 DOI: 10.3390/microorganisms10020366] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 02/01/2022] [Accepted: 02/01/2022] [Indexed: 02/06/2023] Open
Abstract
In RNA viruses, which have high mutation—and fast evolutionary— rates, gene overlapping (i.e., genomic regions that encode more than one protein) is a major factor controlling mutational load and therefore the virus evolvability. Although DNA viruses use host high-fidelity polymerases for their replication, and therefore should have lower mutation rates, it has been shown that some of them have evolutionary rates comparable to those of RNA viruses. Notably, these viruses have large proportions of their genes with at least one overlapping instance. Hence, gene overlapping could be a modulator of virus evolution beyond the RNA world. To test this hypothesis, we use the genus Begomovirus of plant viruses as a model. Through comparative genomic approaches, we show that terminal gene overlapping decreases the rate of virus evolution, which is associated with lower frequency of both synonymous and nonsynonymous mutations. In contrast, terminal overlapping has little effect on the pace of virus evolution. Overall, our analyses support a role for gene overlapping in the evolution of begomoviruses and provide novel information on the factors that shape their genetic diversity.
Collapse
|
15
|
New Genomic Signals Underlying the Emergence of Human Proto-Genes. Genes (Basel) 2022; 13:genes13020284. [PMID: 35205330 PMCID: PMC8871994 DOI: 10.3390/genes13020284] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/20/2022] [Accepted: 01/24/2022] [Indexed: 12/04/2022] Open
Abstract
De novo genes are novel genes which emerge from non-coding DNA. Until now, little is known about de novo genes’ properties, correlated to their age and mechanisms of emergence. In this study, we investigate four related properties: introns, upstream regulatory motifs, 5′ Untranslated regions (UTRs) and protein domains, in 23,135 human proto-genes. We found that proto-genes contain introns, whose number and position correlates with the genomic position of proto-gene emergence. The origin of these introns is debated, as our results suggest that 41% of proto-genes might have captured existing introns, and 13.7% of them do not splice the ORF. We show that proto-genes which emerged via overprinting tend to be more enriched in core promotor motifs, while intergenic and intronic genes are more enriched in enhancers, even if the TATA motif is most commonly found upstream in these genes. Intergenic and intronic 5′ UTRs of proto-genes have a lower potential to stabilise mRNA structures than exonic proto-genes and established human genes. Finally, we confirm that proteins expressed by proto-genes gain new putative domains with age. Overall, we find that regulatory motifs inducing transcription and translation of previously non-coding sequences may facilitate proto-gene emergence. Our study demonstrates that introns, 5′ UTRs, and domains have specific properties in proto-genes. We also emphasize that the genomic positions of de novo genes strongly impacts these properties.
Collapse
|
16
|
Pavesi A, Romerio F. Extending the Coding Potential of Viral Genomes with Overlapping Antisense ORFs: A Case for the De Novo Creation of the Gene Encoding the Antisense Protein ASP of HIV-1. Viruses 2022; 14:v14010146. [PMID: 35062351 PMCID: PMC8781085 DOI: 10.3390/v14010146] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 01/11/2022] [Accepted: 01/12/2022] [Indexed: 02/04/2023] Open
Abstract
Gene overprinting occurs when point mutations within a genomic region with an existing coding sequence create a new one in another reading frame. This process is quite frequent in viral genomes either to maximize the amount of information that they encode or in response to strong selective pressure. The most frequent scenario involves two different reading frames in the same DNA strand (sense overlap). Much less frequent are cases of overlapping genes that are encoded on opposite DNA strands (antisense overlap). One such example is the antisense ORF, asp in the minus strand of the HIV-1 genome overlapping the env gene. The asp gene is highly conserved in pandemic HIV-1 strains of group M, and it is absent in non-pandemic HIV-1 groups, HIV-2, and lentiviruses infecting non-human primates, suggesting that the ~190-amino acid protein that is expressed from this gene (ASP) may play a role in virus spread. While the function of ASP in the virus life cycle remains to be elucidated, mounting evidence from several research groups indicates that ASP is expressed in vivo. There are two alternative hypotheses that could be envisioned to explain the origin of the asp ORF. On one hand, asp may have originally been present in the ancestor of contemporary lentiviruses, and subsequently lost in all descendants except for most HIV-1 strains of group M due to selective advantage. Alternatively, the asp ORF may have originated very recently with the emergence of group M HIV-1 strains from SIVcpz. Here, we used a combination of computational and statistical approaches to study the genomic region of env in primate lentiviruses to shed light on the origin, structure, and sequence evolution of the asp ORF. The results emerging from our studies support the hypothesis of a recent de novo addition of the antisense ORF to the HIV-1 genome through a process that entailed progressive removal of existing internal stop codons from SIV strains to HIV-1 strains of group M, and fine tuning of the codon sequence in env that reduced the chances of new stop codons occurring in asp. Altogether, the study supports the notion that the HIV-1 asp gene encodes an accessory protein, providing a selective advantage to the virus.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, 43124 Parma, Italy;
| | - Fabio Romerio
- Department of Molecular and Comparative Pathobiology, Johns Hopkins University School of Medicine, Baltimore, MD 21205-2196, USA
- Correspondence:
| |
Collapse
|
17
|
Wichmann S, Scherer S, Ardern Z. Biological factors in the synthetic construction of overlapping genes. BMC Genomics 2021; 22:888. [PMID: 34895142 PMCID: PMC8665328 DOI: 10.1186/s12864-021-08181-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Accepted: 11/17/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Overlapping genes (OLGs) with long protein-coding overlapping sequences are disallowed by standard genome annotation programs, outside of viruses. Recently however they have been discovered in Archaea, diverse Bacteria, and Mammals. The biological factors underlying life's ability to create overlapping genes require more study, and may have important applications in understanding evolution and in biotechnology. A previous study claimed that protein domains from viruses were much better suited to forming overlaps than those from other cellular organisms - in this study we assessed this claim, in order to discover what might underlie taxonomic differences in the creation of gene overlaps. RESULTS After overlapping arbitrary Pfam domain pairs and evaluating them with Hidden Markov Models we find OLG construction to be much less constrained than expected. For instance, close to 10% of the constructed sequences cannot be distinguished from typical sequences in their protein family. Most are also indistinguishable from natural protein sequences regarding identity and secondary structure. Surprisingly, contrary to a previous study, virus domains were much less suitable for designing OLGs than bacterial or eukaryotic domains were. In general, the amount of amino acid change required to force a domain to overlap is approximately equal to the variation observed within a typical domain family. The resulting high similarity between natural sequences and those altered so as to overlap is mostly due to the combination of high redundancy in the genetic code and the evolutionary exchangeability of many amino acids. CONCLUSIONS Synthetic overlapping genes which closely resemble natural gene sequences, as measured by HMM profiles, are remarkably easy to construct, and most arbitrary domain pairs can be altered so as to overlap while retaining high similarity to the original sequences. Future work however will need to assess important factors not considered such as intragenic interactions which affect protein folding. While the analysis here is not sufficient to guarantee functional folding proteins, further analysis of constructed OLGs will improve our understanding of the origin of these remarkable genetic elements across life and opens up exciting possibilities for synthetic biology.
Collapse
Affiliation(s)
- Stefan Wichmann
- Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Siegfried Scherer
- Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Zachary Ardern
- Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
18
|
Watson AK, Lopez P, Bapteste E. Hundreds of out-of-frame remodelled gene families in the E. coli pangenome. Mol Biol Evol 2021; 39:6430988. [PMID: 34792602 PMCID: PMC8788219 DOI: 10.1093/molbev/msab329] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
All genomes include gene families with very limited taxonomic distributions that potentially represent new genes and innovations in protein-coding sequence, raising questions on the origins of such genes. Some of these genes are hypothesized to have formed de novo, from noncoding sequences, and recent work has begun to elucidate the processes by which de novo gene formation can occur. A special case of de novo gene formation, overprinting, describes the origin of new genes from noncoding alternative reading frames of existing open reading frames (ORFs). We argue that additionally, out-of-frame gene fission/fusion events of alternative reading frames of ORFs and out-of-frame lateral gene transfers could contribute to the origin of new gene families. To demonstrate this, we developed an original pattern-search in sequence similarity networks, enhancing the use of these graphs, commonly used to detect in-frame remodeled genes. We applied this approach to gene families in 524 complete genomes of Escherichia coli. We identified 767 gene families whose evolutionary history likely included at least one out-of-frame remodeling event. These genes with out-of-frame components represent ∼2.5% of all genes in the E. coli pangenome, suggesting that alternative reading frames of existing ORFs can contribute to a significant proportion of de novo genes in bacteria.
Collapse
Affiliation(s)
- Andrew K Watson
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| | - Philippe Lopez
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| | - Eric Bapteste
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, 7, quai Saint Bernard, Paris, 75005, France
| |
Collapse
|
19
|
Pavesi A. Prediction of two novel overlapping ORFs in the genome of SARS-CoV-2. Virology 2021; 562:149-157. [PMID: 34339929 PMCID: PMC8317007 DOI: 10.1016/j.virol.2021.07.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 07/21/2021] [Accepted: 07/21/2021] [Indexed: 10/25/2022]
Abstract
Six candidate overlapping genes have been detected in SARS-CoV-2, yet current methods struggle to detect overlapping genes that recently originated. However, such genes might encode proteins beneficial to the virus, and provide a model system to understand gene birth. To complement existing detection methods, I first demonstrated that selection pressure to avoid stop codons in alternative reading frames is a driving force in the origin and retention of overlapping genes. I then built a detection method, CodScr, based on this selection pressure. Finally, I combined CodScr with methods that detect other properties of overlapping genes, such as a biased nucleotide and amino acid composition. I detected two novel ORFs (ORF-Sh and ORF-Mh), overlapping the spike and membrane genes respectively, which are under selection pressure and may be beneficial to SARS-CoV-2. ORF-Sh and ORF-Mh are present, as ORF uninterrupted by stop codons, in 100% and 95% of the SARS-CoV-2 genomes, respectively.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area Delle Scienze 23/A, I-43124, Parma, Italy.
| |
Collapse
|
20
|
Positive selection and intrinsic disorder are associated with multifunctional C4(AC4) proteins and geminivirus diversification. Sci Rep 2021; 11:11150. [PMID: 34045539 PMCID: PMC8160170 DOI: 10.1038/s41598-021-90557-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 05/13/2021] [Indexed: 02/06/2023] Open
Abstract
Viruses within the Geminiviridae family cause extensive agricultural losses. Members of four genera of geminiviruses contain a C4 gene (AC4 in geminiviruses with bipartite genomes). C4(AC4) genes are entirely overprinted on the C1(AC1) genes, which encode the replication-associated proteins. The C4(AC4) proteins exhibit diverse functions that may be important for geminivirus diversification. In this study, the influence of natural selection on the evolutionary diversity of 211 C4(AC4) genes relative to the C1(AC1) sequences they overlap was determined from isolates of the Begomovirus and Curtovirus genera. The ratio of nonsynonymous (dN) to synonymous (dS) nucleotide substitutions indicated that C4(AC4) genes are under positive selection, while the overlapped C1(AC1) sequences are under purifying selection. Ninety-one of 200 Begomovirus C4(AC4) genes encode elongated proteins with the extended regions being under neutral selection. C4(AC4) genes from begomoviruses isolated from tomato from native versus exotic regions were under similar levels of positive selection. Analysis of protein structure suggests that C4(AC4) proteins are entirely intrinsically disordered. Our data suggest that non-synonymous mutations and mutations that increase the length of C4(AC4) drive protein diversity that is intrinsically disordered, which could explain C4/AC4 functional variation and contribute to both geminivirus diversification and host jumping.
Collapse
|
21
|
Pavesi A. Origin, Evolution and Stability of Overlapping Genes in Viruses: A Systematic Review. Genes (Basel) 2021; 12:genes12060809. [PMID: 34073395 PMCID: PMC8227390 DOI: 10.3390/genes12060809] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 05/22/2021] [Accepted: 05/24/2021] [Indexed: 12/11/2022] Open
Abstract
During their long evolutionary history viruses generated many proteins de novo by a mechanism called “overprinting”. Overprinting is a process in which critical nucleotide substitutions in a pre-existing gene can induce the expression of a novel protein by translation of an alternative open reading frame (ORF). Overlapping genes represent an intriguing example of adaptive conflict, because they simultaneously encode two proteins whose freedom to change is constrained by each other. However, overlapping genes are also a source of genetic novelties, as the constraints under which alternative ORFs evolve can give rise to proteins with unusual sequence properties, most importantly the potential for novel functions. Starting with the discovery of overlapping genes in phages infecting Escherichia coli, this review covers a range of studies dealing with detection of overlapping genes in small eukaryotic viruses (genomic length below 30 kb) and recognition of their critical role in the evolution of pathogenicity. Origin of overlapping genes, what factors favor their birth and retention, and how they manage their inherent adaptive conflict are extensively reviewed. Special attention is paid to the assembly of overlapping genes into ad hoc databases, suitable for future studies, and to the development of statistical methods for exploring viral genome sequences in search of undiscovered overlaps.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 23/A, I-43124 Parma, Italy
| |
Collapse
|
22
|
Zarai Y, Zafrir Z, Siridechadilok B, Suphatrakul A, Roopin M, Julander J, Tuller T. Evolutionary selection against short nucleotide sequences in viruses and their related hosts. DNA Res 2021; 27:5825729. [PMID: 32339222 PMCID: PMC7320823 DOI: 10.1093/dnares/dsaa008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 04/20/2020] [Indexed: 11/13/2022] Open
Abstract
Viruses are under constant evolutionary pressure to effectively interact with the host intracellular factors, while evading its immune system. Understanding how viruses co-evolve with their hosts is a fundamental topic in molecular evolution and may also aid in developing novel viral based applications such as vaccines, oncologic therapies, and anti-bacterial treatments. Here, based on a novel statistical framework and a large-scale genomic analysis of 2,625 viruses from all classes infecting 439 host organisms from all kingdoms of life, we identify short nucleotide sequences that are under-represented in the coding regions of viruses and their hosts. These sequences cannot be explained by the coding regions’ amino acid content, codon, and dinucleotide frequencies. We specifically show that short homooligonucleotide and palindromic sequences tend to be under-represented in many viruses probably due to their effect on gene expression regulation and the interaction with the host immune system. In addition, we show that more sequences tend to be under-represented in dsDNA viruses than in other viral groups. Finally, we demonstrate, based on in vitro and in vivo experiments, how under-represented sequences can be used to attenuated Zika virus strains.
Collapse
Affiliation(s)
- Yoram Zarai
- Biomedical Engineering Department, Tel Aviv University, Tel Aviv 69978, Israel
| | - Zohar Zafrir
- Biomedical Engineering Department, Tel Aviv University, Tel Aviv 69978, Israel.,SynVaccine Ltd., Ramat Hachayal, Tel Aviv, Israel
| | | | - Amporn Suphatrakul
- National Center for Genetic Engineering and Biotechnology, Pathumthani 12120, Thailand
| | - Modi Roopin
- Biomedical Engineering Department, Tel Aviv University, Tel Aviv 69978, Israel.,SynVaccine Ltd., Ramat Hachayal, Tel Aviv, Israel
| | - Justin Julander
- Institute for Antiviral Research, Utah State University, Logan, UT, USA
| | - Tamir Tuller
- Biomedical Engineering Department, Tel Aviv University, Tel Aviv 69978, Israel.,SynVaccine Ltd., Ramat Hachayal, Tel Aviv, Israel
| |
Collapse
|
23
|
Gholizadeh Z, Iqbal MS, Li R, Romerio F. The HIV-1 Antisense Gene ASP: The New Kid on the Block. Vaccines (Basel) 2021; 9:vaccines9050513. [PMID: 34067514 PMCID: PMC8156140 DOI: 10.3390/vaccines9050513] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 05/04/2021] [Accepted: 05/13/2021] [Indexed: 01/14/2023] Open
Abstract
Viruses have developed incredibly creative ways of making a virtue out of necessity, including taking full advantage of their small genomes. Indeed, viruses often encode multiple proteins within the same genomic region by using two or more reading frames in both orientations through a process called overprinting. Complex retroviruses provide compelling examples of that. The human immunodeficiency virus type 1 (HIV-1) genome expresses sixteen proteins from nine genes that are encoded in the three positive-sense reading frames. In addition, the genome of some HIV-1 strains contains a tenth gene in one of the negative-sense reading frames. The so-called Antisense Protein (ASP) gene overlaps the HIV-1 Rev Response Element (RRE) and the envelope glycoprotein gene, and encodes a highly hydrophobic protein of ~190 amino acids. Despite being identified over thirty years ago, relatively few studies have investigated the role that ASP may play in the virus lifecycle, and its expression in vivo is still questioned. Here we review the current knowledge about ASP, and we discuss some of the many unanswered questions.
Collapse
|
24
|
Li R, Sklutuis R, Groebner JL, Romerio F. HIV-1 Natural Antisense Transcription and Its Role in Viral Persistence. Viruses 2021; 13:v13050795. [PMID: 33946840 PMCID: PMC8145503 DOI: 10.3390/v13050795] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 04/26/2021] [Accepted: 04/27/2021] [Indexed: 12/11/2022] Open
Abstract
Natural antisense transcripts (NATs) represent a class of RNA molecules that are transcribed from the opposite strand of a protein-coding gene, and that have the ability to regulate the expression of their cognate protein-coding gene via multiple mechanisms. NATs have been described in many prokaryotic and eukaryotic systems, as well as in the viruses that infect them. The human immunodeficiency virus (HIV-1) is no exception, and produces one or more NAT from a promoter within the 3’ long terminal repeat. HIV-1 antisense transcripts have been the focus of several studies spanning over 30 years. However, a complete appreciation of the role that these transcripts play in the virus lifecycle is still lacking. In this review, we cover the current knowledge about HIV-1 NATs, discuss some of the questions that are still open and identify possible areas of future research.
Collapse
Affiliation(s)
- Rui Li
- Department of Molecular and Comparative Pathobiology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA;
| | - Rachel Sklutuis
- HIV Dynamics and Replication Program, Host-Virus Interaction Branch, National Cancer Institute, National Institutes of Health, Frederick, MD 21702, USA; (R.S.); (J.L.G.)
| | - Jennifer L. Groebner
- HIV Dynamics and Replication Program, Host-Virus Interaction Branch, National Cancer Institute, National Institutes of Health, Frederick, MD 21702, USA; (R.S.); (J.L.G.)
| | - Fabio Romerio
- Department of Molecular and Comparative Pathobiology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA;
- Correspondence:
| |
Collapse
|
25
|
Moelling K, Broecker F. Viroids and the Origin of Life. Int J Mol Sci 2021; 22:ijms22073476. [PMID: 33800543 PMCID: PMC8036462 DOI: 10.3390/ijms22073476] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Revised: 03/24/2021] [Accepted: 03/24/2021] [Indexed: 11/16/2022] Open
Abstract
Viroids are non-coding circular RNA molecules with rod-like or branched structures. They are often ribozymes, characterized by catalytic RNA. They can perform many basic functions of life and may have played a role in evolution since the beginning of life on Earth. They can cleave, join, replicate, and undergo Darwinian evolution. Furthermore, ribozymes are the essential elements for protein synthesis of cellular organisms as parts of ribosomes. Thus, they must have preceded DNA and proteins during evolution. Here, we discuss the current evidence for viroids or viroid-like RNAs as a likely origin of life on Earth. As such, they may also be considered as models for life on other planets or moons in the solar system as well as on exoplanets.
Collapse
Affiliation(s)
- Karin Moelling
- Institute of Medical Microbiology, University of Zurich, Gloriastr 30, 8006 Zurich, Switzerland
- Max Planck Institute for molecular Genetics, Ihnestr. 63-73, 14195 Berlin, Germany
- Correspondence: ; Tel.: +49-(172)-3274306
| | - Felix Broecker
- Vaxxilon Deutschland GmbH, Magnusstr. 11, 12489 Berlin, Germany;
| |
Collapse
|
26
|
Carter CW. Simultaneous codon usage, the origin of the proteome, and the emergence of de-novo proteins. Curr Opin Struct Biol 2021; 68:142-148. [PMID: 33529785 DOI: 10.1016/j.sbi.2021.01.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 01/05/2021] [Indexed: 12/21/2022]
Abstract
Genetic coding generally uses only one of a gene's two strands; its complement serving as template for replication. Aminoacyl-tRNA synthetases, aaRS, apparently first emerged as pairs on bidirectional genes, in which anticodons in the template strand served as codons for an entirely different protein. Interpreting both strands in frame constrained such genes sufficiently that it was rapidly superseded, leaving only traces in the elevated pairing between codon middle bases in antiparallel alignments. Codon assignments actually promote using information from both strands in multiple reading frames. Related phenomena, known as overprinting, are widely associated with viruses. In-frame bidirectional coding and overprinting nevertheless imply different structural and functional relationships, and different roles in generating folded proteins throughout the evolution of the proteome.
Collapse
Affiliation(s)
- Charles W Carter
- Department of Biochemistry, Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7260, United States.
| |
Collapse
|
27
|
Douglas J, Drummond AJ, Kingston RL. Evolutionary history of cotranscriptional editing in the paramyxoviral phosphoprotein gene. Virus Evol 2021; 7:veab028. [PMID: 34141448 PMCID: PMC8204654 DOI: 10.1093/ve/veab028] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The phosphoprotein gene of the paramyxoviruses encodes multiple protein products. The P, V, and W proteins are generated by transcriptional slippage. This process results in the insertion of non-templated guanosine nucleosides into the mRNA at a conserved edit site. The P protein is an essential component of the viral RNA polymerase and is encoded by a faithful copy of the gene in the majority of paramyxoviruses. However, in some cases, the non-essential V protein is encoded by default and guanosines must be inserted into the mRNA in order to encode P. The number of guanosines inserted into the P gene can be described by a probability distribution, which varies between viruses. In this article, we review the nature of these distributions, which can be inferred from mRNA sequencing data, and reconstruct the evolutionary history of cotranscriptional editing in the paramyxovirus family. Our model suggests that, throughout known history of the family, the system has switched from a P default to a V default mode four times; complete loss of the editing system has occurred twice, the canonical zinc finger domain of the V protein has been deleted or heavily mutated a further two times, and the W protein has independently evolved a novel function three times. Finally, we review the physical mechanisms of cotranscriptional editing via slippage of the viral RNA polymerase.
Collapse
Affiliation(s)
- Jordan Douglas
- Centre for Computational Evolution, University of Auckland, Auckland 1010, New Zealand
- School of Computer Science, University of Auckland, Auckland 1010, New Zealand
| | - Alexei J Drummond
- Centre for Computational Evolution, University of Auckland, Auckland 1010, New Zealand
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
| | - Richard L Kingston
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
28
|
Puntambekar S, Newhouse R, San-Miguel J, Chauhan R, Vernaz G, Willis T, Wayland MT, Umrania Y, Miska EA, Prabakaran S. Evolutionary divergence of novel open reading frames in cichlids speciation. Sci Rep 2020; 10:21570. [PMID: 33299045 PMCID: PMC7726158 DOI: 10.1038/s41598-020-78555-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 11/26/2020] [Indexed: 01/02/2023] Open
Abstract
Novel open reading frames (nORFs) with coding potential may arise from noncoding DNA. Not much is known about their emergence, functional role, fixation in a population or contribution to adaptive radiation. Cichlids fishes exhibit extensive phenotypic diversification and speciation. Encounters with new environments alone are not sufficient to explain this striking diversity of cichlid radiation because other taxa coexistent with the Cichlidae demonstrate lower species richness. Wagner et al. analyzed cichlid diversification in 46 African lakes and reported that both extrinsic environmental factors and intrinsic lineage-specific traits related to sexual selection have strongly influenced the cichlid radiation, which indicates the existence of unknown molecular mechanisms responsible for rapid phenotypic diversification, such as emergence of novel open reading frames (nORFs). In this study, we integrated transcriptomic and proteomic signatures from two tissues of two cichlids species, identified nORFs and performed evolutionary analysis on these nORF regions. Our results suggest that the time scale of speciation of the two species and evolutionary divergence of these nORF genomic regions are similar and indicate a potential role for these nORFs in speciation of the cichlid fishes.
Collapse
Affiliation(s)
- Shraddha Puntambekar
- Department of Biology, Indian Institute of Science Education and Research, Pune, Maharashtra, 411008, India
| | - Rachel Newhouse
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Jaime San-Miguel
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Ruchi Chauhan
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Grégoire Vernaz
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
- The Wellcome Trust/CRUK Gurdon Institute, University of Cambridge, Cambridge, CB2 1QN, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
| | - Thomas Willis
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Matthew T Wayland
- Department of Zoology, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
| | - Yagnesh Umrania
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK
| | - Eric A Miska
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, CB10 1SA, UK
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK
| | - Sudhakaran Prabakaran
- Department of Biology, Indian Institute of Science Education and Research, Pune, Maharashtra, 411008, India.
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK.
- St. Edmund's College, University of Cambridge, Cambridge, CB3 0BN, UK.
| |
Collapse
|
29
|
Abstract
The genomes of bacteria contain fewer genes and substantially less noncoding DNA than those of eukaryotes, and as a result, they have much less raw material to invent new traits. Yet, bacteria are vastly more taxonomically diverse, numerically abundant, and globally successful in colonizing new habitats compared to eukaryotes. Although bacterial genomes are generally considered to be optimized for efficient growth and rapid adaptation, nonadaptive processes have played a major role in shaping the size, contents, and compact organization of bacterial genomes and have allowed the establishment of deleterious traits that serve as the raw materials for genetic innovation.
Collapse
Affiliation(s)
- Paul C Kirchberger
- Department of Integrative Biology, University of Texas at Austin, Texas 78712, USA; ; ;
| | - Marian L Schmidt
- Department of Integrative Biology, University of Texas at Austin, Texas 78712, USA; ; ;
| | - Howard Ochman
- Department of Integrative Biology, University of Texas at Austin, Texas 78712, USA; ; ;
| |
Collapse
|
30
|
Seitz S, Habjanič J, Schütz AK, Bartenschlager R. The Hepatitis B Virus Envelope Proteins: Molecular Gymnastics Throughout the Viral Life Cycle. Annu Rev Virol 2020; 7:263-288. [PMID: 32600157 DOI: 10.1146/annurev-virology-092818-015508] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
New hepatitis B virions released from infected hepatocytes are the result of an intricate maturation process that starts with the formation of the nucleocapsid providing a confined space where the viral DNA genome is synthesized via reverse transcription. Virion assembly is finalized by the enclosure of the icosahedral nucleocapsid within a heterogeneous envelope. The latter contains integral membrane proteins of three sizes, collectively known as hepatitis B surface antigen, and adopts multiple conformations in the course of the viral life cycle. The nucleocapsid conformation depends on the reverse transcription status of the genome, which in turn controls nucleocapsid interaction with the envelope proteins for virus exit. In addition, after secretion the virions undergo a distinct maturation step during which a topological switch of the large envelope protein confers infectivity. Here we review molecular determinants for envelopment and models that postulate molecular signals encoded in the capsid scaffold conducive or adverse to the recruitment of envelope proteins.
Collapse
Affiliation(s)
- Stefan Seitz
- Department of Infectious Diseases, University of Heidelberg, 69120 Heidelberg, Germany;
| | - Jelena Habjanič
- Bavarian NMR Center, Department of Chemistry, Technical University of Munich, 85748 Garching, Germany.,Institute of Structural Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Anne K Schütz
- Bavarian NMR Center, Department of Chemistry, Technical University of Munich, 85748 Garching, Germany.,Institute of Structural Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Ralf Bartenschlager
- Department of Infectious Diseases, University of Heidelberg, 69120 Heidelberg, Germany; .,Division of Virus-Associated Carcinogenesis, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| |
Collapse
|
31
|
Ho JSY, Angel M, Ma Y, Sloan E, Wang G, Martinez-Romero C, Alenquer M, Roudko V, Chung L, Zheng S, Chang M, Fstkchyan Y, Clohisey S, Dinan AM, Gibbs J, Gifford R, Shen R, Gu Q, Irigoyen N, Campisi L, Huang C, Zhao N, Jones JD, van Knippenberg I, Zhu Z, Moshkina N, Meyer L, Noel J, Peralta Z, Rezelj V, Kaake R, Rosenberg B, Wang B, Wei J, Paessler S, Wise HM, Johnson J, Vannini A, Amorim MJ, Baillie JK, Miraldi ER, Benner C, Brierley I, Digard P, Łuksza M, Firth AE, Krogan N, Greenbaum BD, MacLeod MK, van Bakel H, Garcìa-Sastre A, Yewdell JW, Hutchinson E, Marazzi I. Hybrid Gene Origination Creates Human-Virus Chimeric Proteins during Infection. Cell 2020; 181:1502-1517.e23. [PMID: 32559462 PMCID: PMC7323901 DOI: 10.1016/j.cell.2020.05.035] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 02/26/2020] [Accepted: 05/18/2020] [Indexed: 01/12/2023]
Abstract
RNA viruses are a major human health threat. The life cycles of many highly pathogenic RNA viruses like influenza A virus (IAV) and Lassa virus depends on host mRNA, because viral polymerases cleave 5'-m7G-capped host transcripts to prime viral mRNA synthesis ("cap-snatching"). We hypothesized that start codons within cap-snatched host transcripts could generate chimeric human-viral mRNAs with coding potential. We report the existence of this mechanism of gene origination, which we named "start-snatching." Depending on the reading frame, start-snatching allows the translation of host and viral "untranslated regions" (UTRs) to create N-terminally extended viral proteins or entirely novel polypeptides by genetic overprinting. We show that both types of chimeric proteins are made in IAV-infected cells, generate T cell responses, and contribute to virulence. Our results indicate that during infection with IAV, and likely a multitude of other human, animal and plant viruses, a host-dependent mechanism allows the genesis of hybrid genes.
Collapse
Affiliation(s)
- Jessica Sook Yuin Ho
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Matthew Angel
- Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD 20892, USA
| | - Yixuan Ma
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Elizabeth Sloan
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | - Guojun Wang
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Carles Martinez-Romero
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Division of Infectious Diseases, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Marta Alenquer
- Instituto Gulbenkian de Ciência, 2780-156 Oeiras, Portugal
| | - Vladimir Roudko
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Hematology and Medical Oncology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Pathology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Liliane Chung
- The Roslin Institute, University of Edinburgh, Edinburgh EH25 9PS, UK
| | - Simin Zheng
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Max Chang
- Department of Medicine, School of Medicine, University of California San Diego, La Jolla, CA 92037, USA
| | - Yesai Fstkchyan
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sara Clohisey
- The Roslin Institute, University of Edinburgh, Edinburgh EH25 9PS, UK
| | - Adam M Dinan
- Division of Virology, Department of Pathology, University of Cambridge, Cambridge CB2 0SP, UK
| | - James Gibbs
- Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD 20892, USA
| | - Robert Gifford
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | - Rong Shen
- Division of Structural Biology, The Institute of Cancer Research, London SW7 3RP, UK
| | - Quan Gu
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | - Nerea Irigoyen
- Division of Virology, Department of Pathology, University of Cambridge, Cambridge CB2 0SP, UK
| | - Laura Campisi
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Cheng Huang
- Department of Pathology, the University of Texas Medical Branch, Galveston, TX 77555, USA
| | - Nan Zhao
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Joshua D Jones
- Division of Virology, Department of Pathology, University of Cambridge, Cambridge CB2 0SP, UK
| | | | - Zeyu Zhu
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Natasha Moshkina
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Léa Meyer
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | - Justine Noel
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Zuleyma Peralta
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Veronica Rezelj
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | - Robyn Kaake
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Brad Rosenberg
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Bo Wang
- The Roslin Institute, University of Edinburgh, Edinburgh EH25 9PS, UK
| | - Jiajie Wei
- Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD 20892, USA
| | - Slobodan Paessler
- Department of Pathology, the University of Texas Medical Branch, Galveston, TX 77555, USA
| | - Helen M Wise
- The Roslin Institute, University of Edinburgh, Edinburgh EH25 9PS, UK
| | - Jeffrey Johnson
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Alessandro Vannini
- Division of Structural Biology, The Institute of Cancer Research, London SW7 3RP, UK; Fondazione Human Technopole, Structural Biology Research Centre, 20157 Milan, Italy
| | | | - J Kenneth Baillie
- The Roslin Institute, University of Edinburgh, Edinburgh EH25 9PS, UK
| | - Emily R Miraldi
- Divisions of Immunobiology and Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, OH 45229, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45257, USA
| | - Christopher Benner
- Department of Medicine, School of Medicine, University of California San Diego, La Jolla, CA 92037, USA
| | - Ian Brierley
- Division of Virology, Department of Pathology, University of Cambridge, Cambridge CB2 0SP, UK
| | - Paul Digard
- The Roslin Institute, University of Edinburgh, Edinburgh EH25 9PS, UK
| | - Marta Łuksza
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Andrew E Firth
- Division of Virology, Department of Pathology, University of Cambridge, Cambridge CB2 0SP, UK
| | - Nevan Krogan
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Benjamin D Greenbaum
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Hematology and Medical Oncology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Pathology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Megan K MacLeod
- Centre for Immunobiology, Institute of Infection, Immunity and Inflammation, University of Glasgow, Glasgow G12 8QQ, UK
| | - Harm van Bakel
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Adolfo Garcìa-Sastre
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Division of Infectious Diseases, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Jonathan W Yewdell
- Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD 20892, USA
| | - Edward Hutchinson
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK.
| | - Ivan Marazzi
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| |
Collapse
|
32
|
Evolution of novel genes in three-spined stickleback populations. Heredity (Edinb) 2020; 125:50-59. [PMID: 32499660 PMCID: PMC7413265 DOI: 10.1038/s41437-020-0319-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 04/27/2020] [Accepted: 04/30/2020] [Indexed: 12/22/2022] Open
Abstract
Eukaryotic genomes frequently acquire new protein-coding genes which may significantly impact an organism’s fitness. Novel genes can be created, for example, by duplication of large genomic regions or de novo, from previously non-coding DNA. Either way, creation of a novel transcript is an essential early step during novel gene emergence. Most studies on the gain-and-loss dynamics of novel genes so far have compared genomes between species, constraining analyses to genes that have remained fixed over long time scales. However, the importance of novel genes for rapid adaptation among populations has recently been shown. Therefore, since little is known about the evolutionary dynamics of transcripts across natural populations, we here study transcriptomes from several tissues and nine geographically distinct populations of an ecological model species, the three-spined stickleback. Our findings suggest that novel genes typically start out as transcripts with low expression and high tissue specificity. Early expression regulation appears to be mediated by gene-body methylation. Although most new and narrowly expressed genes are rapidly lost, those that survive and subsequently spread through populations tend to gain broader and higher expression levels. The properties of the encoded proteins, such as disorder and aggregation propensity, hardly change. Correspondingly, young novel genes are not preferentially under positive selection but older novel genes more often overlap with FST outlier regions. Taken together, expression of the surviving novel genes is rapidly regulated, probably via epigenetic mechanisms, while structural properties of encoded proteins are non-debilitating and might only change much later.
Collapse
|
33
|
Pavesi A. New insights into the evolutionary features of viral overlapping genes by discriminant analysis. Virology 2020; 546:51-66. [PMID: 32452417 PMCID: PMC7157939 DOI: 10.1016/j.virol.2020.03.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Accepted: 03/29/2020] [Indexed: 12/18/2022]
Abstract
Overlapping genes originate by a mechanism of overprinting, in which nucleotide substitutions in a pre-existing frame induce the expression of a de novo protein from an alternative frame. In this study, I assembled a dataset of 319 viral overlapping genes, which included 82 overlaps whose expression is experimentally known and the respective 237 homologs. Principal component analysis revealed that overlapping genes have a common pattern of nucleotide and amino acid composition. Discriminant analysis separated overlapping from non-overlapping genes with an accuracy of 97%. When applied to overlapping genes with known genealogy, it separated ancestral from de novo frames with an accuracy close to 100%. This high discriminant power was crucial to computationally design variants of de novo viral proteins known to possess selective anticancer toxicity (apoptin) or protection against neurodegeneration (X protein), as well as to detect two new potential overlapping genes in the genome of the new coronavirus SARS-CoV-2.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area Delle Scienze 23/A, I-43124, Parma, Italy.
| |
Collapse
|
34
|
Zehentner B, Ardern Z, Kreitmeier M, Scherer S, Neuhaus K. A Novel pH-Regulated, Unusual 603 bp Overlapping Protein Coding Gene pop Is Encoded Antisense to ompA in Escherichia coli O157:H7 (EHEC). Front Microbiol 2020; 11:377. [PMID: 32265854 PMCID: PMC7103648 DOI: 10.3389/fmicb.2020.00377] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Accepted: 02/20/2020] [Indexed: 12/23/2022] Open
Abstract
Antisense transcription is well known in bacteria. However, translation of antisense RNAs is typically not considered, as the implied overlapping coding at a DNA locus is assumed to be highly improbable. Therefore, such overlapping genes are systematically excluded in prokaryotic genome annotation. Here we report an exceptional 603 bp long open reading frame completely embedded in antisense to the gene of the outer membrane protein ompA. An active σ70 promoter, transcription start site (TSS), Shine-Dalgarno motif and rho-independent terminator were experimentally validated, providing evidence that this open reading frame has all the structural features of a functional gene. Furthermore, ribosomal profiling revealed translation of the mRNA, the protein was detected in Western blots and a pH-dependent phenotype conferred by the protein was shown in competitive overexpression growth experiments of a translationally arrested mutant versus wild type. We designate this novel gene pop (pH-regulated overlapping protein-coding gene), thus adding another example to the growing list of overlapping, protein coding genes in bacteria.
Collapse
Affiliation(s)
- Barbara Zehentner
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany
| | - Zachary Ardern
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany
| | - Michaela Kreitmeier
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany
- ZIEL – Institute for Food & Health, Technical University of Munich, Freising, Germany
| | - Klaus Neuhaus
- ZIEL – Institute for Food & Health, Technical University of Munich, Freising, Germany
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technical University of Munich, Freising, Germany
| |
Collapse
|
35
|
Abstract
Viruses are ubiquitous parasites of cellular life and the most abundant biological entities on Earth. It is widely accepted that viruses are polyphyletic, but a consensus scenario for their ultimate origin is still lacking. Traditionally, three scenarios for the origin of viruses have been considered: descent from primordial, precellular genetic elements, reductive evolution from cellular ancestors and escape of genes from cellular hosts, achieving partial replicative autonomy and becoming parasitic genetic elements. These classical scenarios give different timelines for the origin(s) of viruses and do not explain the provenance of the two key functional modules that are responsible, respectively, for viral genome replication and virion morphogenesis. Here, we outline a 'chimeric' scenario under which different types of primordial, selfish replicons gave rise to viruses by recruiting host proteins for virion formation. We also propose that new groups of viruses have repeatedly emerged at all stages of the evolution of life, often through the displacement of ancestral structural and genome replication genes.
Collapse
|
36
|
Abstract
Overlapping genes are commonplace in viruses and play an important role in their function and evolution. However, aside from studies on specific groups of viruses, relatively little is known about the extent and nature of gene overlap and its determinants in viruses as a whole. Here, we present an extensive characterisation of gene overlap in viruses through an analysis of reference genomes present in the NCBI virus genome database. We find that over half the instances of gene overlap are very small, covering <10 nt, and 84 per cent are <50 nt in length. Despite this, 53 per cent of all viruses still contained a gene overlap of 50 nt or larger. We also investigate several predictors of gene overlap such as genome structure (single- and double-stranded RNA and DNA), virus family, genome length, and genome segmentation. This revealed that gene overlap occurs more frequently in DNA viruses than in RNA viruses, and more frequently in single-stranded viruses than in double-stranded viruses. Genome segmentation is also associated with gene overlap, particularly in single-stranded DNA viruses. Notably, we observed a large range of overlap frequencies across families of all genome types, suggesting that it is a common evolutionary trait that provides flexible genome structures in all virus families.
Collapse
Affiliation(s)
- Timothy E Schlub
- Sydney School of Public Health, Faculty of Medicine and Health,The University of Sydney, NSW, 2006, Australia
| | - Edward C Holmes
- School of Life and Environmental Sciences and School of Medical Sciences, Marie Bashir Institute for Infectious Diseases and Biosecurity, The University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
37
|
Gibbs AJ, Hajizadeh M, Ohshima K, Jones RA. The Potyviruses: An Evolutionary Synthesis Is Emerging. Viruses 2020; 12:E132. [PMID: 31979056 PMCID: PMC7077269 DOI: 10.3390/v12020132] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 01/16/2020] [Accepted: 01/20/2020] [Indexed: 12/28/2022] Open
Abstract
In this review, encouraged by the dictum of Theodosius Dobzhansky that "Nothing in biology makes sense except in the light of evolution", we outline the likely evolutionary pathways that have resulted in the observed similarities and differences of the extant molecules, biology, distribution, etc. of the potyvirids and, especially, its largest genus, the potyviruses. The potyvirids are a family of plant-infecting RNA-genome viruses. They had a single polyphyletic origin, and all share at least three of their genes (i.e., the helicase region of their CI protein, the RdRp region of their NIb protein and their coat protein) with other viruses which are otherwise unrelated. Potyvirids fall into 11 genera of which the potyviruses, the largest, include more than 150 distinct viruses found worldwide. The first potyvirus probably originated 15,000-30,000 years ago, in a Eurasian grass host, by acquiring crucial changes to its coat protein and HC-Pro protein, which enabled it to be transmitted by migrating host-seeking aphids. All potyviruses are aphid-borne and, in nature, infect discreet sets of monocotyledonous or eudicotyledonous angiosperms. All potyvirus genomes are under negative selection; the HC-Pro, CP, Nia, and NIb genes are most strongly selected, and the PIPO gene least, but there are overriding virus specific differences; for example, all turnip mosaic virus genes are more strongly conserved than those of potato virus Y. Estimates of dN/dS (ω) indicate whether potyvirus populations have been evolving as one or more subpopulations and could be used to help define species boundaries. Recombinants are common in many potyvirus populations (20%-64% in five examined), but recombination seems to be an uncommon speciation mechanism as, of 149 distinct potyviruses, only two were clear recombinants. Human activities, especially trade and farming, have fostered and spread both potyviruses and their aphid vectors throughout the world, especially over the past five centuries. The world distribution of potyviruses, especially those found on islands, indicates that potyviruses may be more frequently or effectively transmitted by seed than experimental tests suggest. Only two meta-genomic potyviruses have been recorded from animal samples, and both are probably contaminants.
Collapse
Affiliation(s)
- Adrian J. Gibbs
- Emeritus Faculty, Australian National University, Canberra, ACT 2601, Australia
| | - Mohammad Hajizadeh
- Department of Plant Protection, Faculty of Agriculture, University of Kurdistan, P.O. Box 416, Sanandaj, Iran
| | - Kazusato Ohshima
- Laboratory of Plant Virology, Department of Applied Biological Sciences, Faculty of Agriculture, Saga University, 1-banchi, Honjo-machi, Saga 840-8502, Japan;
- The United Graduate School of Agricultural Sciences, Kagoshima University, 1-21-2410 Korimoto, Kagoshima 890-0065, Japan
| | - Roger A.C. Jones
- Institute of Agriculture, University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia
| |
Collapse
|
38
|
DeRisi JL, Huber G, Kistler A, Retallack H, Wilkinson M, Yllanes D. An exploration of ambigrammatic sequences in narnaviruses. Sci Rep 2019; 9:17982. [PMID: 31784609 PMCID: PMC6884476 DOI: 10.1038/s41598-019-54181-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Accepted: 11/11/2019] [Indexed: 11/09/2022] Open
Abstract
Narnaviruses have been described as positive-sense RNA viruses with a remarkably simple genome of ~3 kb, encoding only a highly conserved RNA-dependent RNA polymerase (RdRp). Many narnaviruses, however, are 'ambigrammatic' and harbour an additional uninterrupted open reading frame (ORF) covering almost the entire length of the reverse complement strand. No function has been described for this ORF, yet the absence of stops is conserved across diverse narnaviruses, and in every case the codons in the reverse ORF and the RdRp are aligned. The >3 kb ORF overlap on opposite strands, unprecedented among RNA viruses, motivates an exploration of the constraints imposed or alleviated by the codon alignment. Here, we show that only when the codon frames are aligned can all stop codons be eliminated from the reverse strand by synonymous single-nucleotide substitutions in the RdRp gene, suggesting a mechanism for de novo gene creation within a strongly conserved amino-acid sequence. It will be fascinating to explore what implications this coding strategy has for other aspects of narnavirus biology. Beyond narnaviruses, our rapidly expanding catalogue of viral diversity may yet reveal additional examples of this broadly-extensible principle for ambigrammatic-sequence development.
Collapse
Affiliation(s)
- Joseph L DeRisi
- Chan Zuckerberg Biohub, 499 Illinois Street, San Francisco, CA, 94158, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, California, USA
| | - Greg Huber
- Chan Zuckerberg Biohub, 499 Illinois Street, San Francisco, CA, 94158, USA
| | - Amy Kistler
- Chan Zuckerberg Biohub, 499 Illinois Street, San Francisco, CA, 94158, USA
| | - Hanna Retallack
- Department of Biochemistry and Biophysics, University of California, San Francisco, California, USA
| | - Michael Wilkinson
- Chan Zuckerberg Biohub, 499 Illinois Street, San Francisco, CA, 94158, USA
- School of Mathematics and Statistics, The Open University, Walton Hall, Milton Keynes, MK7 6AA, England
| | - David Yllanes
- Chan Zuckerberg Biohub, 499 Illinois Street, San Francisco, CA, 94158, USA.
| |
Collapse
|
39
|
Affram Y, Zapata JC, Gholizadeh Z, Tolbert WD, Zhou W, Iglesias-Ussel MD, Pazgier M, Ray K, Latinovic OS, Romerio F. The HIV-1 Antisense Protein ASP Is a Transmembrane Protein of the Cell Surface and an Integral Protein of the Viral Envelope. J Virol 2019; 93:e00574-19. [PMID: 31434734 PMCID: PMC6803264 DOI: 10.1128/jvi.00574-19] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 08/14/2019] [Indexed: 12/13/2022] Open
Abstract
The negative strand of HIV-1 encodes a highly hydrophobic antisense protein (ASP) with no known homologs. The presence of humoral and cellular immune responses to ASP in HIV-1 patients indicates that ASP is expressed in vivo, but its role in HIV-1 replication remains unknown. We investigated ASP expression in multiple chronically infected myeloid and lymphoid cell lines using an anti-ASP monoclonal antibody (324.6) in combination with flow cytometry and microscopy approaches. At baseline and in the absence of stimuli, ASP shows polarized subnuclear distribution, preferentially in areas with low content of suppressive epigenetic marks. However, following treatment with phorbol 12-myristate 13-acetate (PMA), ASP translocates to the cytoplasm and is detectable on the cell surface, even in the absence of membrane permeabilization, indicating that 324.6 recognizes an ASP epitope that is exposed extracellularly. Further, surface staining with 324.6 and anti-gp120 antibodies showed that ASP and gp120 colocalize, suggesting that ASP might become incorporated in the membranes of budding virions. Indeed, fluorescence correlation spectroscopy studies showed binding of 324.6 to cell-free HIV-1 particles. Moreover, 324.6 was able to capture and retain HIV-1 virions with efficiency similar to that of the anti-gp120 antibody VRC01. Our studies indicate that ASP is an integral protein of the plasma membranes of chronically infected cells stimulated with PMA, and upon viral budding, ASP becomes a structural protein of the HIV-1 envelope. These results may provide leads to investigate the possible role of ASP in the virus replication cycle and suggest that ASP may represent a new therapeutic or vaccine target.IMPORTANCE The HIV-1 genome contains a gene expressed in the opposite, or antisense, direction to all other genes. The protein product of this antisense gene, called ASP, is poorly characterized, and its role in viral replication remains unknown. We provide evidence that the antisense protein, ASP, of HIV-1 is found within the cell nucleus in unstimulated cells. In addition, we show that after PMA treatment, ASP exits the nucleus and localizes on the cell membrane. Moreover, we demonstrate that ASP is present on the surfaces of viral particles. Altogether, our studies identify ASP as a new structural component of HIV-1 and show that ASP is an accessory protein that promotes viral replication. The presence of ASP on the surfaces of both infected cells and viral particles might be exploited therapeutically.
Collapse
Affiliation(s)
- Yvonne Affram
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Juan C Zapata
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Zahra Gholizadeh
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - William D Tolbert
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Wei Zhou
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Maria D Iglesias-Ussel
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Marzena Pazgier
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Krishanu Ray
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Olga S Latinovic
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Fabio Romerio
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
40
|
Douglas GM, Langille MGI. Current and Promising Approaches to Identify Horizontal Gene Transfer Events in Metagenomes. Genome Biol Evol 2019; 11:2750-2766. [PMID: 31504488 PMCID: PMC6777429 DOI: 10.1093/gbe/evz184] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/19/2019] [Indexed: 12/16/2022] Open
Abstract
High-throughput shotgun metagenomics sequencing has enabled the profiling of myriad natural communities. These data are commonly used to identify gene families and pathways that were potentially gained or lost in an environment and which may be involved in microbial adaptation. Despite the widespread interest in these events, there are no established best practices for identifying gene gain and loss in metagenomics data. Horizontal gene transfer (HGT) represents several mechanisms of gene gain that are especially of interest in clinical microbiology due to the rapid spread of antibiotic resistance genes in natural communities. Several additional mechanisms of gene gain and loss, including gene duplication, gene loss-of-function events, and de novo gene birth are also important to consider in the context of metagenomes but have been less studied. This review is largely focused on detecting HGT in prokaryotic metagenomes, but methods for detecting these other mechanisms are first discussed. For this article to be self-contained, we provide a general background on HGT and the different possible signatures of this process. Lastly, we discuss how improved assembly of genomes from metagenomes would be the most straight-forward approach for improving the inference of gene gain and loss events. Several recent technological advances could help improve metagenome assemblies: long-read sequencing, determining the physical proximity of contigs, optical mapping of short sequences along chromosomes, and single-cell metagenomics. The benefits and limitations of these advances are discussed and open questions in this area are highlighted.
Collapse
Affiliation(s)
- Gavin M Douglas
- Department of Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Morgan G I Langille
- Department of Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
41
|
Arendsee Z, Li J, Singh U, Bhandary P, Seetharam A, Wurtele ES. fagin: synteny-based phylostratigraphy and finer classification of young genes. BMC Bioinformatics 2019; 20:440. [PMID: 31455236 PMCID: PMC6712868 DOI: 10.1186/s12859-019-3023-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Accepted: 08/08/2019] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND With every new genome that is sequenced, thousands of species-specific genes (orphans) are found, some originating from ultra-rapid mutations of existing genes, many others originating de novo from non-genic regions of the genome. If some of these genes survive across speciations, then extant organisms will contain a patchwork of genes whose ancestors first appeared at different times. Standard phylostratigraphy, the technique of partitioning genes by their age, is based solely on protein similarity algorithms. However, this approach relies on negative evidence ─ a failure to detect a homolog of a query gene. An alternative approach is to limit the search for homologs to syntenic regions. Then, genes can be positively identified as de novo orphans by tracing them to non-coding sequences in related species. RESULTS We have developed a synteny-based pipeline in the R framework. Fagin determines the genomic context of each query gene in a focal species compared to homologous sequence in target species. We tested the fagin pipeline on two focal species, Arabidopsis thaliana (plus four target species in Brassicaseae) and Saccharomyces cerevisiae (plus six target species in Saccharomyces). Using microsynteny maps, fagin classified the homology relationship of each query gene against each target genome into three main classes, and further subclasses: AAic (has a coding syntenic homolog), NTic (has a non-coding syntenic homolog), and Unknown (has no detected syntenic homolog). fagin inferred over half the "Unknown" A. thaliana query genes, and about 20% for S. cerevisiae, as lacking a syntenic homolog because of local indels or scrambled synteny. CONCLUSIONS fagin augments standard phylostratigraphy, and extends synteny-based phylostratigraphy with an automated, customizable, and detailed contextual analysis. By comparing synteny-based phylostrata to standard phylostrata, fagin systematically identifies those orphans and lineage-specific genes that are well-supported to have originated de novo. Analyzing within-species genomes should distinguish orphan genes that may have originated through rapid divergence from de novo orphans. Fagin also delineates whether a gene has no syntenic homolog because of technical or biological reasons. These analyses indicate that some orphans may be associated with regions of high genomic perturbation.
Collapse
Affiliation(s)
- Zebulun Arendsee
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50011, USA
| | - Jing Li
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA
| | - Urminder Singh
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50011, USA
| | - Priyanka Bhandary
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50011, USA
| | - Arun Seetharam
- Genome Informatics Facility, Office of Biotechnology, Iowa State University, Ames, IA, 50011, USA
| | - Eve Syrkin Wurtele
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA, 50010, USA.
- Center for Metabolic Biology, Iowa State University, Ames, IA, 50011, USA.
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
42
|
Prabh N, Rödelsperger C. De Novo, Divergence, and Mixed Origin Contribute to the Emergence of Orphan Genes in Pristionchus Nematodes. G3 (BETHESDA, MD.) 2019; 9:2277-2286. [PMID: 31088903 PMCID: PMC6643871 DOI: 10.1534/g3.119.400326] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Accepted: 05/11/2019] [Indexed: 12/30/2022]
Abstract
Homology is a fundamental concept in comparative biology. It is extensively used at the sequence level to make phylogenetic hypotheses and functional inferences. Nonetheless, the majority of eukaryotic genomes contain large numbers of orphan genes lacking homologs in other taxa. Generally, the fraction of orphan genes is higher in genomically undersampled clades, and in the absence of closely related genomes any hypothesis about their origin and evolution remains untestable. Previously, we sequenced ten genomes with an underlying ladder-like phylogeny to establish a phylogenomic framework for studying genome evolution in diplogastrid nematodes. Here, we use this deeply sampled data set to understand the processes that generate orphan genes in our focal species Pristionchus pacificus Based on phylostratigraphic analysis and additional bioinformatic filters, we obtained 29 high-confidence candidate genes for which mechanisms of orphan origin were proposed based on manual inspection. This revealed diverse mechanisms including annotation artifacts, chimeric origin, alternative reading frame usage, and gene splitting with subsequent gain of de novo exons. In addition, we present two cases of complete de novo origination from non-coding regions, which represents one of the first reports of de novo genes in nematodes. Thus, we conclude that de novo emergence, divergence, and mixed mechanisms contribute to novel gene formation in Pristionchus nematodes.
Collapse
Affiliation(s)
- Neel Prabh
- Department of Integrative Evolutionary Biology, Max-Planck-Institute for Developmental Biology, Max-Planck-Ring 9, 72076 Tübingen, Germany
- Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, August Thienemann Str. 2, 24306 Plön, Germany
| | - Christian Rödelsperger
- Department of Integrative Evolutionary Biology, Max-Planck-Institute for Developmental Biology, Max-Planck-Ring 9, 72076 Tübingen, Germany
| |
Collapse
|
43
|
Schlub TE, Buchmann JP, Holmes EC. A Simple Method to Detect Candidate Overlapping Genes in Viruses Using Single Genome Sequences. Mol Biol Evol 2019; 35:2572-2581. [PMID: 30099499 PMCID: PMC6188560 DOI: 10.1093/molbev/msy155] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Overlapping genes in viruses maximize the coding capacity of their genomes and allow the generation of new genes without major increases in genome size. Despite their importance, the evolution and function of overlapping genes are often not well understood, in part due to difficulties in their detection. In addition, most bioinformatic approaches for the detection of overlapping genes require the comparison of multiple genome sequences that may not be available in metagenomic surveys of virus biodiversity. We introduce a simple new method for identifying candidate functional overlapping genes using single virus genome sequences. Our method uses randomization tests to estimate the expected length of open reading frames and then identifies overlapping open reading frames that significantly exceed this length and are thus predicted to be functional. We applied this method to 2548 reference RNA virus genomes and find that it has both high sensitivity and low false discovery for genes that overlap by at least 50 nucleotides. Notably, this analysis provided evidence for 29 previously undiscovered functional overlapping genes, some of which are coded in the antisense direction suggesting there are limitations in our current understanding of RNA virus replication.
Collapse
Affiliation(s)
- Timothy E Schlub
- Sydney School of Public Health, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia
| | - Jan P Buchmann
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW , Australia
| | - Edward C Holmes
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW , Australia
| |
Collapse
|
44
|
Affiliation(s)
- Stephen Branden Van Oss
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| |
Collapse
|
45
|
Pavesi A. Asymmetric evolution in viral overlapping genes is a source of selective protein adaptation. Virology 2019; 532:39-47. [PMID: 31004987 PMCID: PMC7125799 DOI: 10.1016/j.virol.2019.03.017] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Revised: 03/25/2019] [Accepted: 03/26/2019] [Indexed: 12/29/2022]
Abstract
Overlapping genes represent an intriguing puzzle, as they encode two proteins whose ability to evolve is constrained by each other. Overlapping genes can undergo “symmetric evolution” (similar selection pressures on the two proteins) or “asymmetric evolution” (significantly different selection pressures on the two proteins). By sequence analysis of 75 pairs of homologous viral overlapping genes, I evaluated their accordance with one or the other model. Analysis of nucleotide and amino acid sequences revealed that half of overlaps undergo asymmetric evolution, as the protein from one frame shows a number of substitutions significantly higher than that of the protein from the other frame. Interestingly, the most variable protein (often known to interact with the host proteins) appeared to be encoded by the de novo frame in all cases examined. These findings suggest that overlapping genes, besides to increase the coding ability of viruses, are also a source of selective protein adaptation. A dataset of 80 pairs of homologous overlapping genes from viruses is examined. Its analysis reveals that half of overlapping genes undergo asymmetric evolution. The most variable gene product is that encoded by the de novo overlapping gene. Overlapping genes evolving asymmetrically are a source of selective protein adaptation.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 11/A, I-43124, Parma, Italy.
| |
Collapse
|
46
|
Puustusmaa M, Abroi A. cRegions-a tool for detecting conserved cis-elements in multiple sequence alignment of diverged coding sequences. PeerJ 2019; 6:e6176. [PMID: 30647994 PMCID: PMC6330207 DOI: 10.7717/peerj.6176] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Accepted: 11/27/2018] [Indexed: 12/31/2022] Open
Abstract
Identifying cis-acting elements and understanding regulatory mechanisms of a gene is crucial to fully understand the molecular biology of an organism. In general, it is difficult to identify previously uncharacterised cis-acting elements with an unknown consensus sequence. The task is especially problematic with viruses containing regions of limited or no similarity to other previously characterised sequences. Fortunately, the fast increase in the number of sequenced genomes allows us to detect some of these elusive cis-elements. In this work, we introduce a web-based tool called cRegions. It was developed to identify regions within a protein-coding sequence where the conservation in the amino acid sequence is caused by the conservation in the nucleotide sequence. The cRegion can be the first step in discovering novel cis-acting sequences from diverged protein-coding genes. The results can be used as a basis for future experimental analysis. We applied cRegions on the non-structural and structural polyproteins of alphaviruses as an example and successfully detected all known cis-acting elements. In this publication and in previous work, we have shown that cRegions is able to detect a wide variety of functional elements in DNA and RNA viruses. These functional elements include splice sites, stem-loops, overlapping reading frames, internal promoters, ribosome frameshifting signals and other embedded elements with yet unknown function. The cRegions web tool is available at http://bioinfo.ut.ee/cRegions/.
Collapse
Affiliation(s)
- Mikk Puustusmaa
- Department of Bioinformatics, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Aare Abroi
- Institute of Technology, University of Tartu, Tartu, Estonia
| |
Collapse
|
47
|
Abstract
Despite the central role of bacterial noncoding small RNAs (sRNAs) in posttranscriptional regulation, little is understood about their evolution. Here we compile what has been studied to date and trace a life cycle of sRNAs-from their mechanisms of emergence, through processes of change and frequent neofunctionalization, to their loss from bacterial lineages. Because they possess relatively unrestrictive structural requirements, we find that sRNA origins are varied, and include de novo emergence as well as formation from preexisting genetic elements via duplication events and horizontal gene transfer. The need for only partial complementarity to their mRNA targets facilitates apparent rapid change, which also contributes to significant challenges in tracing sRNAs across broad evolutionary distances. We document that recently emerged sRNAs in particular evolve quickly, mirroring dynamics observed in microRNAs, their functional analogs in eukaryotes. Mutations in mRNA-binding regions, transcriptional regulator or sigma factor binding sites, and protein-binding regions are all likely sources of shifting regulatory roles of sRNAs. Finally, using examples from the few evolutionary studies available, we examine cases of sRNA loss and describe how these may be the result of adaptive in addition to neutral processes. We highlight the need for more-comprehensive analyses of sRNA evolutionary patterns as a means to improve novel sRNA detection, enhance genome annotation, and deepen our understanding of regulatory networks in bacteria.
Collapse
|
48
|
Ulrich K, Wehner S, Bekaert M, Di Paola N, Dilcher M, Muir KF, Taggart JB, Matejusova I, Weidmann M. Molecular epidemiological study on Infectious Pancreatic Necrosis Virus isolates from aquafarms in Scotland over three decades. J Gen Virol 2018; 99:1567-1581. [PMID: 30358526 DOI: 10.1099/jgv.0.001155] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
In order to obtain an insight into genomic changes and associated evolution and adaptation of Infectious Pancreatic Necrosis Virus (IPNV), the complete coding genomes of 57 IPNV isolates collected from Scottish aquafarms from 1982 to 2014 were sequenced and analysed. Phylogenetic analysis of the sequenced IPNV strains showed separate clustering of genogroups I, II, III and V. IPNV isolates with genetic reassortment of segment A/B of genogroup III/II were determined. About 59 % of the IPNV isolates belonged to the persistent type and 32 % to the low-virulent type, and only one highly pathogenic strain (1.79 %) was identified. Codon adaptation index calculations indicated that the IPNV major capsid protein VP2 has adapted to its salmonid host. Under-representation of CpG dinucleotides in the IPNV genome to minimize detection by the innate immunity receptors, and observed positive selection in the virulence determination sites of VP2 embedded in the variable region of the main antigenic region, suggest an immune escape mechanism driving virulence evolution. The prevalence of mostly persistent genotypes, together with the assumption of adaptation and immune escape, indicates that IPNV is evolving with the host.
Collapse
Affiliation(s)
- Kristina Ulrich
- 1Institute of Aquaculture, University of Stirling, Stirling, UK
| | | | - Michaël Bekaert
- 1Institute of Aquaculture, University of Stirling, Stirling, UK
| | - Nicholas Di Paola
- 3Biomedical Sciences Institute, University of Sao Paulo, Sao Paulo, Brazil
| | - Meik Dilcher
- 4Canterbury Health Laboratories, Christchurch, New-Zealand
| | | | - John B Taggart
- 1Institute of Aquaculture, University of Stirling, Stirling, UK
| | | | - Manfred Weidmann
- 1Institute of Aquaculture, University of Stirling, Stirling, UK
| |
Collapse
|
49
|
Pavesi A, Vianelli A, Chirico N, Bao Y, Blinkova O, Belshaw R, Firth A, Karlin D. Overlapping genes and the proteins they encode differ significantly in their sequence composition from non-overlapping genes. PLoS One 2018; 13:e0202513. [PMID: 30339683 PMCID: PMC6195259 DOI: 10.1371/journal.pone.0202513] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 08/03/2018] [Indexed: 11/19/2022] Open
Abstract
Overlapping genes represent a fascinating evolutionary puzzle, since they encode two functionally unrelated proteins from the same DNA sequence. They originate by a mechanism of overprinting, in which point mutations in an existing frame allow the expression (the "birth") of a completely new protein from a second frame. In viruses, in which overlapping genes are abundant, these new proteins often play a critical role in infection, yet they are frequently overlooked during genome annotation. This results in erroneous interpretation of mutational studies and in a significant waste of resources. Therefore, overlapping genes need to be correctly detected, especially since they are now thought to be abundant also in eukaryotes. Developing better detection methods and conducting systematic evolutionary studies require a large, reliable benchmark dataset of known cases. We thus assembled a high-quality dataset of 80 viral overlapping genes whose expression is experimentally proven. Many of them were not present in databases. We found that overall, overlapping genes differ significantly from non-overlapping genes in their nucleotide and amino acid composition. In particular, the proteins they encode are enriched in high-degeneracy amino acids and depleted in low-degeneracy ones, which may alleviate the evolutionary constraints acting on overlapping genes. Principal component analysis revealed that the vast majority of overlapping genes follow a similar composition bias, despite their heterogeneity in length and function. Six proven mammalian overlapping genes also followed this bias. We propose that this apparently near-universal composition bias may either favour the birth of overlapping genes, or/and result from selection pressure acting on them.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy
| | - Alberto Vianelli
- Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Nicola Chirico
- Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Yiming Bao
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Olga Blinkova
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America
| | - Robert Belshaw
- School of Biomedical & Healthcare Sciences, Plymouth University Peninsula Schools of Medicine and Dentistry (PUPSMD), Plymouth, United Kingdom
| | - Andrew Firth
- Department of Pathology, Division of Virology, University of Cambridge, Cambridge, United Kingdom
| | - David Karlin
- Department of Zoology, University of Oxford, Oxford, United Kingdom
- Division of Structural Biology, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
50
|
Willis S, Masel J. Gene Birth Contributes to Structural Disorder Encoded by Overlapping Genes. Genetics 2018; 210:303-313. [PMID: 30026186 PMCID: PMC6116962 DOI: 10.1534/genetics.118.301249] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Accepted: 07/18/2018] [Indexed: 11/18/2022] Open
Abstract
The same nucleotide sequence can encode two protein products in different reading frames. Overlapping gene regions encode higher levels of intrinsic structural disorder (ISD) than nonoverlapping genes (39% vs. 25% in our viral dataset). This might be because of the intrinsic properties of the genetic code, because one member per pair was recently born de novo in a process that favors high ISD, or because high ISD relieves increased evolutionary constraint imposed by dual-coding. Here, we quantify the relative contributions of these three alternative hypotheses. We estimate that the recency of de novo gene birth explains [Formula: see text] or more of the elevation in ISD in overlapping regions of viral genes. While the two reading frames within a same-strand overlapping gene pair have markedly different ISD tendencies that must be controlled for, their effects cancel out to make no net contribution to ISD. The remaining elevation of ISD in the older members of overlapping gene pairs, presumed due to the need to alleviate evolutionary constraint, was already present prior to the origin of the overlap. Same-strand overlapping gene birth events can occur in two different frames, favoring high ISD either in the ancestral gene or in the novel gene; surprisingly, most de novo gene birth events contained completely within the body of an ancestral gene favor high ISD in the ancestral gene (23 phylogenetically independent events vs. 1). This can be explained by mutation bias favoring the frame with more start codons and fewer stop codons.
Collapse
Affiliation(s)
- Sara Willis
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
| |
Collapse
|