1
|
Vakirlis N, Kupczok A. Large-scale investigation of species-specific orphan genes in the human gut microbiome elucidates their evolutionary origins. Genome Res 2024; 34:888-903. [PMID: 38977308 DOI: 10.1101/gr.278977.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 06/12/2024] [Indexed: 07/10/2024]
Abstract
Species-specific genes, also known as orphans, are ubiquitous across life's domains. In prokaryotes, species-specific orphan genes (SSOGs) are mostly thought to originate in external elements such as viruses followed by horizontal gene transfer, whereas the scenario of native origination, through rapid divergence or de novo, is mostly dismissed. However, quantitative evidence supporting either scenario is lacking. Here, we systematically analyzed genomes from 4644 human gut microbiome species and identified more than 600,000 unique SSOGs, representing an average of 2.6% of a given species' pangenome. These sequences are mostly rare within each species yet show signs of purifying selection. Overall, SSOGs use optimal codons less frequently, and their proteins are more disordered than those of conserved genes (i.e., non-SSOGs). Importantly, across species, the GC content of SSOGs closely matches that of conserved ones. In contrast, the ∼5% of SSOGs that share similarity to known viral sequences have distinct characteristics, including lower GC content. Thus, SSOGs with similarity to viruses differ from the remaining SSOGs, contrasting an external origination scenario for most of them. By examining the orthologous genomic region in closely related species, we show that a small subset of SSOGs likely evolved natively de novo and find that these genes also differ in their properties from the remaining SSOGs. Our results challenge the notion that external elements are the dominant source of prokaryotic genetic novelty and will enable future studies into the biological role and relevance of species-specific genes in the human gut.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Institute For Fundamental Biomedical Research, B.S.R.C. "Alexander Fleming," Vari 166 72, Greece;
- Institute for General Microbiology, Kiel University, 24118 Kiel, Germany
| | - Anne Kupczok
- Bioinformatics Group, Wageningen University, 6700 PB Wageningen, The Netherlands
| |
Collapse
|
2
|
Pavesi A, Romerio F. Creation of the HIV-1 antisense gene asp coincided with the emergence of the pandemic group M and is associated with faster disease progression. Microbiol Spectr 2024; 12:e0380223. [PMID: 38230940 PMCID: PMC10846101 DOI: 10.1128/spectrum.03802-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/19/2023] [Indexed: 01/18/2024] Open
Abstract
Despite being first identified more than three decades ago, the antisense gene asp of HIV-1 remains an enigma. asp is present uniquely in pandemic (group M) HIV-1 strains, and it is absent in all non-pandemic (out-of-M) HIV-1 strains and virtually all non-human primate lentiviruses. This suggests that the creation of asp may have contributed to HIV-1 fitness or worldwide spread. It also raises the question of which evolutionary processes were at play in the creation of asp. Here, we show that HIV-1 genomes containing an intact asp gene are associated with faster HIV-1 disease progression. Furthermore, we demonstrate that the creation of a full-length asp gene occurred via the evolution of codon usage in env overlapping asp on the opposite strand. This involved differential use of synonymous codons or conservative amino acid substitution in env that eliminated internal stop codons in asp, and redistribution of synonymous codons in env that minimized the likelihood of new premature stops arising in asp. Nevertheless, the creation of a full-length asp gene reduced the genetic diversity of env. The Luria-Delbruck fluctuation test suggests that the interrupted asp open reading frame (ORF) is the progenitor of the intact ORF, rather than a descendant under random genetic drift. Therefore, the existence of group-M isolates with a truncated asp ORF indicates an incomplete transition process. For the first time, our study links the presence of a full-length asp ORF to faster disease progression, thus warranting further investigation into the cellular processes and molecular mechanisms through which the ASP protein impacts HIV-1 replication, transmission, and pathogenesis.IMPORTANCEOverlapping genes engage in a tug-of-war, constraining each other's evolution. The creation of a new gene overlapping an existing one comes at an evolutionary cost. Thus, its conservation must be advantageous, or it will be lost, especially if the pre-existing gene is essential for the viability of the virus or cell. We found that the creation and conservation of the HIV-1 antisense gene asp occurred through differential use of synonymous codons or conservative amino acid substitutions within the overlapping gene, env. This process did not involve amino acid changes in ENV that benefited its function, but rather it constrained the evolution of ENV. Nonetheless, the creation of asp brought a net selective advantage to HIV-1 because asp is conserved especially among high-prevalence strains. The association between the presence of an intact asp gene and faster HIV-1 disease progression supports that conclusion and warrants further investigation.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy
| | - Fabio Romerio
- Department of Molecular and Comparative Pathobiology, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
3
|
Olo Ndela E, Roux S, Henke C, Sczyrba A, Sime Ngando T, Varsani A, Enault F. Reekeekee- and roodoodooviruses, two different Microviridae clades constituted by the smallest DNA phages. Virus Evol 2022; 9:veac123. [PMID: 36694818 PMCID: PMC9865509 DOI: 10.1093/ve/veac123] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 10/19/2022] [Accepted: 12/22/2022] [Indexed: 12/25/2022] Open
Abstract
Small circular single-stranded DNA viruses of the Microviridae family are both prevalent and diverse in all ecosystems. They usually harbor a genome between 4.3 and 6.3 kb, with a microvirus recently isolated from a marine Alphaproteobacteria being the smallest known genome of a DNA phage (4.248 kb). A subfamily, Amoyvirinae, has been proposed to classify this virus and other related small Alphaproteobacteria-infecting phages. Here, we report the discovery, in meta-omics data sets from various aquatic ecosystems, of sixteen complete microvirus genomes significantly smaller (2.991-3.692 kb) than known ones. Phylogenetic analysis reveals that these sixteen genomes represent two related, yet distinct and diverse, novel groups of microviruses-amoyviruses being their closest known relatives. We propose that these small microviruses are members of two tentatively named subfamilies Reekeekeevirinae and Roodoodoovirinae. As known microvirus genomes encode many overlapping and overprinted genes that are not identified by gene prediction software, we developed a new methodology to identify all genes based on protein conservation, amino acid composition, and selection pressure estimations. Surprisingly, only four to five genes could be identified per genome, with the number of overprinted genes lower than that in phiX174. These small genomes thus tend to have both a lower number of genes and a shorter length for each gene, leaving no place for variable gene regions that could harbor overprinted genes. Even more surprisingly, these two Microviridae groups had specific and different gene content, and major differences in their conserved protein sequences, highlighting that these two related groups of small genome microviruses use very different strategies to fulfill their lifecycle with such a small number of genes. The discovery of these genomes and the detailed prediction and annotation of their genome content expand our understanding of ssDNA phages in nature and are further evidence that these viruses have explored a wide range of possibilities during their long evolution.
Collapse
Affiliation(s)
| | | | - Christian Henke
- Computational Metagenomics, Bielefeld University, Universitätsstraße 27, Bielefeld 30501, Germany,Center for Biotechnology, Bielefeld University, Universitätsstraße 27, Bielefeld 33615, Germany
| | - Alexander Sczyrba
- Computational Metagenomics, Bielefeld University, Universitätsstraße 27, Bielefeld 30501, Germany,Center for Biotechnology, Bielefeld University, Universitätsstraße 27, Bielefeld 33615, Germany
| | - Télesphore Sime Ngando
- Université Clermont Auvergne, CNRS, Laboratoire Microorganismes: Genome et Environnement, Clermont-Ferrand F-63000, France
| | | | | |
Collapse
|
4
|
Jayaraman V, Toledo‐Patiño S, Noda‐García L, Laurino P. Mechanisms of protein evolution. Protein Sci 2022; 31:e4362. [PMID: 35762715 PMCID: PMC9214755 DOI: 10.1002/pro.4362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 05/11/2022] [Accepted: 05/14/2022] [Indexed: 11/06/2022]
Abstract
How do proteins evolve? How do changes in sequence mediate changes in protein structure, and in turn in function? This question has multiple angles, ranging from biochemistry and biophysics to evolutionary biology. This review provides a brief integrated view of some key mechanistic aspects of protein evolution. First, we explain how protein evolution is primarily driven by randomly acquired genetic mutations and selection for function, and how these mutations can even give rise to completely new folds. Then, we also comment on how phenotypic protein variability, including promiscuity, transcriptional and translational errors, may also accelerate this process, possibly via "plasticity-first" mechanisms. Finally, we highlight open questions in the field of protein evolution, with respect to the emergence of more sophisticated protein systems such as protein complexes, pathways, and the emergence of pre-LUCA enzymes.
Collapse
Affiliation(s)
- Vijay Jayaraman
- Department of Molecular Cell BiologyWeizmann Institute of ScienceRehovotIsrael
| | - Saacnicteh Toledo‐Patiño
- Protein Engineering and Evolution UnitOkinawa Institute of Science and Technology Graduate UniversityOkinawaJapan
| | - Lianet Noda‐García
- Department of Plant Pathology and Microbiology, Institute of Environmental Sciences, Robert H. Smith Faculty of Agriculture, Food and EnvironmentHebrew University of JerusalemRehovotIsrael
| | - Paola Laurino
- Protein Engineering and Evolution UnitOkinawa Institute of Science and Technology Graduate UniversityOkinawaJapan
| |
Collapse
|
5
|
Pley C, Lourenço J, McNaughton AL, Matthews PC. Spacer Domain in Hepatitis B Virus Polymerase: Plugging a Hole or Performing a Role? J Virol 2022; 96:e0005122. [PMID: 35412348 PMCID: PMC9093120 DOI: 10.1128/jvi.00051-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 03/14/2022] [Indexed: 11/25/2022] Open
Abstract
Hepatitis B virus (HBV) polymerase is divided into terminal protein, spacer, reverse transcriptase, and RNase domains. Spacer has previously been considered dispensable, merely acting as a tether between other domains or providing plasticity to accommodate deletions and mutations. We explore evidence for the role of spacer sequence, structure, and function in HBV evolution and lineage, consider its associations with escape from drugs, vaccines, and immune responses, and review its potential impacts on disease outcomes.
Collapse
Affiliation(s)
- Caitlin Pley
- School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
- Guy’s and St Thomas’ NHS Foundation Trust, London, United Kingdom
| | - José Lourenço
- Department of Zoology, University of Oxford, Oxford, United Kingdom
- Biosystems and Integrative Sciences Institute, University of Lisbon, Lisbon, Portugal
| | - Anna L. McNaughton
- Population Health Science, Bristol Medical School, University of Bristol, Bristol, United Kingdom
- Nuffield Department of Medicine, University of Oxford Medawar Building, Oxford, United Kingdom
| | - Philippa C. Matthews
- Nuffield Department of Medicine, University of Oxford Medawar Building, Oxford, United Kingdom
- The Francis Crick Institute, London, United Kingdom
- Division of Infection and Immunity, University College London, London, United Kingdom
| |
Collapse
|
6
|
Pavesi A, Romerio F. Extending the Coding Potential of Viral Genomes with Overlapping Antisense ORFs: A Case for the De Novo Creation of the Gene Encoding the Antisense Protein ASP of HIV-1. Viruses 2022; 14:v14010146. [PMID: 35062351 PMCID: PMC8781085 DOI: 10.3390/v14010146] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 01/11/2022] [Accepted: 01/12/2022] [Indexed: 02/04/2023] Open
Abstract
Gene overprinting occurs when point mutations within a genomic region with an existing coding sequence create a new one in another reading frame. This process is quite frequent in viral genomes either to maximize the amount of information that they encode or in response to strong selective pressure. The most frequent scenario involves two different reading frames in the same DNA strand (sense overlap). Much less frequent are cases of overlapping genes that are encoded on opposite DNA strands (antisense overlap). One such example is the antisense ORF, asp in the minus strand of the HIV-1 genome overlapping the env gene. The asp gene is highly conserved in pandemic HIV-1 strains of group M, and it is absent in non-pandemic HIV-1 groups, HIV-2, and lentiviruses infecting non-human primates, suggesting that the ~190-amino acid protein that is expressed from this gene (ASP) may play a role in virus spread. While the function of ASP in the virus life cycle remains to be elucidated, mounting evidence from several research groups indicates that ASP is expressed in vivo. There are two alternative hypotheses that could be envisioned to explain the origin of the asp ORF. On one hand, asp may have originally been present in the ancestor of contemporary lentiviruses, and subsequently lost in all descendants except for most HIV-1 strains of group M due to selective advantage. Alternatively, the asp ORF may have originated very recently with the emergence of group M HIV-1 strains from SIVcpz. Here, we used a combination of computational and statistical approaches to study the genomic region of env in primate lentiviruses to shed light on the origin, structure, and sequence evolution of the asp ORF. The results emerging from our studies support the hypothesis of a recent de novo addition of the antisense ORF to the HIV-1 genome through a process that entailed progressive removal of existing internal stop codons from SIV strains to HIV-1 strains of group M, and fine tuning of the codon sequence in env that reduced the chances of new stop codons occurring in asp. Altogether, the study supports the notion that the HIV-1 asp gene encodes an accessory protein, providing a selective advantage to the virus.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, 43124 Parma, Italy;
| | - Fabio Romerio
- Department of Molecular and Comparative Pathobiology, Johns Hopkins University School of Medicine, Baltimore, MD 21205-2196, USA
- Correspondence:
| |
Collapse
|
7
|
Abstract
Modern genome-scale methods that identify new genes, such as proteogenomics and ribosome profiling, have revealed, to the surprise of many, that overlap in genes, open reading frames and even coding sequences is widespread and functionally integrated into prokaryotic, eukaryotic and viral genomes. In parallel, the constraints that overlapping regions place on genome sequences and their evolution can be harnessed in bioengineering to build more robust synthetic strains and constructs. With a focus on overlapping protein-coding and RNA-coding genes, this Review examines their discovery, topology and biogenesis in the context of their genome biology. We highlight exciting new uses for sequence overlap to control translation, compress synthetic genetic constructs, and protect against mutation.
Collapse
|
8
|
Wichmann S, Scherer S, Ardern Z. Biological factors in the synthetic construction of overlapping genes. BMC Genomics 2021; 22:888. [PMID: 34895142 PMCID: PMC8665328 DOI: 10.1186/s12864-021-08181-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Accepted: 11/17/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Overlapping genes (OLGs) with long protein-coding overlapping sequences are disallowed by standard genome annotation programs, outside of viruses. Recently however they have been discovered in Archaea, diverse Bacteria, and Mammals. The biological factors underlying life's ability to create overlapping genes require more study, and may have important applications in understanding evolution and in biotechnology. A previous study claimed that protein domains from viruses were much better suited to forming overlaps than those from other cellular organisms - in this study we assessed this claim, in order to discover what might underlie taxonomic differences in the creation of gene overlaps. RESULTS After overlapping arbitrary Pfam domain pairs and evaluating them with Hidden Markov Models we find OLG construction to be much less constrained than expected. For instance, close to 10% of the constructed sequences cannot be distinguished from typical sequences in their protein family. Most are also indistinguishable from natural protein sequences regarding identity and secondary structure. Surprisingly, contrary to a previous study, virus domains were much less suitable for designing OLGs than bacterial or eukaryotic domains were. In general, the amount of amino acid change required to force a domain to overlap is approximately equal to the variation observed within a typical domain family. The resulting high similarity between natural sequences and those altered so as to overlap is mostly due to the combination of high redundancy in the genetic code and the evolutionary exchangeability of many amino acids. CONCLUSIONS Synthetic overlapping genes which closely resemble natural gene sequences, as measured by HMM profiles, are remarkably easy to construct, and most arbitrary domain pairs can be altered so as to overlap while retaining high similarity to the original sequences. Future work however will need to assess important factors not considered such as intragenic interactions which affect protein folding. While the analysis here is not sufficient to guarantee functional folding proteins, further analysis of constructed OLGs will improve our understanding of the origin of these remarkable genetic elements across life and opens up exciting possibilities for synthetic biology.
Collapse
Affiliation(s)
- Stefan Wichmann
- Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Siegfried Scherer
- Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Zachary Ardern
- Chair of Microbial Ecology, Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
9
|
Castro JF, Tautz D. The Effects of Sequence Length and Composition of Random Sequence Peptides on the Growth of E. coli Cells. Genes (Basel) 2021; 12:1913. [PMID: 34946861 PMCID: PMC8702183 DOI: 10.3390/genes12121913] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 11/22/2021] [Accepted: 11/26/2021] [Indexed: 12/21/2022] Open
Abstract
We study the potential for the de novo evolution of genes from random nucleotide sequences using libraries of E. coli expressing random sequence peptides. We assess the effects of such peptides on cell growth by monitoring frequency changes in individual clones in a complex library through four serial passages. Using a new analysis pipeline that allows the tracing of peptides of all lengths, we find that over half of the peptides have consistent effects on cell growth. Across nine different experiments, around 16% of clones increase in frequency and 36% decrease, with some variation between individual experiments. Shorter peptides (8-20 residues), are more likely to increase in frequency, longer ones are more likely to decrease. GC content, amino acid composition, intrinsic disorder, and aggregation propensity show slightly different patterns between peptide groups. Sequences that increase in frequency tend to be more disordered with lower aggregation propensity. This coincides with the observation that young genes with more disordered structures are better tolerated in genomes. Our data indicate that random sequences can be a source of evolutionary innovation, since a large fraction of them are well tolerated by the cells or can provide a growth advantage.
Collapse
Affiliation(s)
| | - Diethard Tautz
- Max Planck Institute for Evolutionary Biology, August-Thienemann Strasse 2, 24306 Plön, Germany;
| |
Collapse
|
10
|
Computational methods for inferring location and genealogy of overlapping genes in virus genomes: approaches and applications. Curr Opin Virol 2021; 52:1-8. [PMID: 34798370 PMCID: PMC8594276 DOI: 10.1016/j.coviro.2021.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Revised: 10/21/2021] [Accepted: 10/22/2021] [Indexed: 12/02/2022]
Abstract
Viruses may evolve to increase the amount of encoded genetic information by means of overlapping genes, which utilize several reading frames. Such overlapping genes may be especially impactful for genomes of small size, often serving a source of novel accessory proteins, some of which play a crucial role in viral pathogenicity or in promoting the systemic spread of virus. Diverse genome-based metrics were proposed to facilitate recognition of overlapping genes that otherwise may be overlooked during genome annotation. They can detect the atypical codon bias associated with the overlap (e.g. a statistically significant reduction in variability at synonymous sites) or other sequence-composition features peculiar to overlapping genes. In this review, I compare nine computational methods, discuss their strengths and limitations, and survey how they were applied to detect candidate overlapping genes in the genome of SARS-CoV-2, the etiological agent of COVID-19 pandemic.
Collapse
|
11
|
Pavesi A. Prediction of two novel overlapping ORFs in the genome of SARS-CoV-2. Virology 2021; 562:149-157. [PMID: 34339929 PMCID: PMC8317007 DOI: 10.1016/j.virol.2021.07.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 07/21/2021] [Accepted: 07/21/2021] [Indexed: 10/25/2022]
Abstract
Six candidate overlapping genes have been detected in SARS-CoV-2, yet current methods struggle to detect overlapping genes that recently originated. However, such genes might encode proteins beneficial to the virus, and provide a model system to understand gene birth. To complement existing detection methods, I first demonstrated that selection pressure to avoid stop codons in alternative reading frames is a driving force in the origin and retention of overlapping genes. I then built a detection method, CodScr, based on this selection pressure. Finally, I combined CodScr with methods that detect other properties of overlapping genes, such as a biased nucleotide and amino acid composition. I detected two novel ORFs (ORF-Sh and ORF-Mh), overlapping the spike and membrane genes respectively, which are under selection pressure and may be beneficial to SARS-CoV-2. ORF-Sh and ORF-Mh are present, as ORF uninterrupted by stop codons, in 100% and 95% of the SARS-CoV-2 genomes, respectively.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area Delle Scienze 23/A, I-43124, Parma, Italy.
| |
Collapse
|
12
|
Pavesi A. Origin, Evolution and Stability of Overlapping Genes in Viruses: A Systematic Review. Genes (Basel) 2021; 12:genes12060809. [PMID: 34073395 PMCID: PMC8227390 DOI: 10.3390/genes12060809] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 05/22/2021] [Accepted: 05/24/2021] [Indexed: 12/11/2022] Open
Abstract
During their long evolutionary history viruses generated many proteins de novo by a mechanism called “overprinting”. Overprinting is a process in which critical nucleotide substitutions in a pre-existing gene can induce the expression of a novel protein by translation of an alternative open reading frame (ORF). Overlapping genes represent an intriguing example of adaptive conflict, because they simultaneously encode two proteins whose freedom to change is constrained by each other. However, overlapping genes are also a source of genetic novelties, as the constraints under which alternative ORFs evolve can give rise to proteins with unusual sequence properties, most importantly the potential for novel functions. Starting with the discovery of overlapping genes in phages infecting Escherichia coli, this review covers a range of studies dealing with detection of overlapping genes in small eukaryotic viruses (genomic length below 30 kb) and recognition of their critical role in the evolution of pathogenicity. Origin of overlapping genes, what factors favor their birth and retention, and how they manage their inherent adaptive conflict are extensively reviewed. Special attention is paid to the assembly of overlapping genes into ad hoc databases, suitable for future studies, and to the development of statistical methods for exploring viral genome sequences in search of undiscovered overlaps.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 23/A, I-43124 Parma, Italy
| |
Collapse
|
13
|
Gholizadeh Z, Iqbal MS, Li R, Romerio F. The HIV-1 Antisense Gene ASP: The New Kid on the Block. Vaccines (Basel) 2021; 9:vaccines9050513. [PMID: 34067514 PMCID: PMC8156140 DOI: 10.3390/vaccines9050513] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 05/04/2021] [Accepted: 05/13/2021] [Indexed: 01/14/2023] Open
Abstract
Viruses have developed incredibly creative ways of making a virtue out of necessity, including taking full advantage of their small genomes. Indeed, viruses often encode multiple proteins within the same genomic region by using two or more reading frames in both orientations through a process called overprinting. Complex retroviruses provide compelling examples of that. The human immunodeficiency virus type 1 (HIV-1) genome expresses sixteen proteins from nine genes that are encoded in the three positive-sense reading frames. In addition, the genome of some HIV-1 strains contains a tenth gene in one of the negative-sense reading frames. The so-called Antisense Protein (ASP) gene overlaps the HIV-1 Rev Response Element (RRE) and the envelope glycoprotein gene, and encodes a highly hydrophobic protein of ~190 amino acids. Despite being identified over thirty years ago, relatively few studies have investigated the role that ASP may play in the virus lifecycle, and its expression in vivo is still questioned. Here we review the current knowledge about ASP, and we discuss some of the many unanswered questions.
Collapse
|
14
|
Carter CW. Simultaneous codon usage, the origin of the proteome, and the emergence of de-novo proteins. Curr Opin Struct Biol 2021; 68:142-148. [PMID: 33529785 DOI: 10.1016/j.sbi.2021.01.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 01/05/2021] [Indexed: 12/21/2022]
Abstract
Genetic coding generally uses only one of a gene's two strands; its complement serving as template for replication. Aminoacyl-tRNA synthetases, aaRS, apparently first emerged as pairs on bidirectional genes, in which anticodons in the template strand served as codons for an entirely different protein. Interpreting both strands in frame constrained such genes sufficiently that it was rapidly superseded, leaving only traces in the elevated pairing between codon middle bases in antiparallel alignments. Codon assignments actually promote using information from both strands in multiple reading frames. Related phenomena, known as overprinting, are widely associated with viruses. In-frame bidirectional coding and overprinting nevertheless imply different structural and functional relationships, and different roles in generating folded proteins throughout the evolution of the proteome.
Collapse
Affiliation(s)
- Charles W Carter
- Department of Biochemistry, Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7260, United States.
| |
Collapse
|
15
|
Seitz S, Habjanič J, Schütz AK, Bartenschlager R. The Hepatitis B Virus Envelope Proteins: Molecular Gymnastics Throughout the Viral Life Cycle. Annu Rev Virol 2020; 7:263-288. [PMID: 32600157 DOI: 10.1146/annurev-virology-092818-015508] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
New hepatitis B virions released from infected hepatocytes are the result of an intricate maturation process that starts with the formation of the nucleocapsid providing a confined space where the viral DNA genome is synthesized via reverse transcription. Virion assembly is finalized by the enclosure of the icosahedral nucleocapsid within a heterogeneous envelope. The latter contains integral membrane proteins of three sizes, collectively known as hepatitis B surface antigen, and adopts multiple conformations in the course of the viral life cycle. The nucleocapsid conformation depends on the reverse transcription status of the genome, which in turn controls nucleocapsid interaction with the envelope proteins for virus exit. In addition, after secretion the virions undergo a distinct maturation step during which a topological switch of the large envelope protein confers infectivity. Here we review molecular determinants for envelopment and models that postulate molecular signals encoded in the capsid scaffold conducive or adverse to the recruitment of envelope proteins.
Collapse
Affiliation(s)
- Stefan Seitz
- Department of Infectious Diseases, University of Heidelberg, 69120 Heidelberg, Germany;
| | - Jelena Habjanič
- Bavarian NMR Center, Department of Chemistry, Technical University of Munich, 85748 Garching, Germany.,Institute of Structural Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Anne K Schütz
- Bavarian NMR Center, Department of Chemistry, Technical University of Munich, 85748 Garching, Germany.,Institute of Structural Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Ralf Bartenschlager
- Department of Infectious Diseases, University of Heidelberg, 69120 Heidelberg, Germany; .,Division of Virus-Associated Carcinogenesis, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| |
Collapse
|
16
|
Pavesi A. New insights into the evolutionary features of viral overlapping genes by discriminant analysis. Virology 2020; 546:51-66. [PMID: 32452417 PMCID: PMC7157939 DOI: 10.1016/j.virol.2020.03.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Accepted: 03/29/2020] [Indexed: 12/18/2022]
Abstract
Overlapping genes originate by a mechanism of overprinting, in which nucleotide substitutions in a pre-existing frame induce the expression of a de novo protein from an alternative frame. In this study, I assembled a dataset of 319 viral overlapping genes, which included 82 overlaps whose expression is experimentally known and the respective 237 homologs. Principal component analysis revealed that overlapping genes have a common pattern of nucleotide and amino acid composition. Discriminant analysis separated overlapping from non-overlapping genes with an accuracy of 97%. When applied to overlapping genes with known genealogy, it separated ancestral from de novo frames with an accuracy close to 100%. This high discriminant power was crucial to computationally design variants of de novo viral proteins known to possess selective anticancer toxicity (apoptin) or protection against neurodegeneration (X protein), as well as to detect two new potential overlapping genes in the genome of the new coronavirus SARS-CoV-2.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area Delle Scienze 23/A, I-43124, Parma, Italy.
| |
Collapse
|
17
|
Mikhailov KV, Efeykin BD, Panchin AY, Knorre DA, Logacheva MD, Penin AA, Muntyan MS, Nikitin MA, Popova OV, Zanegina ON, Vyssokikh MY, Spiridonov SE, Aleoshin VV, Panchin YV. Coding palindromes in mitochondrial genes of Nematomorpha. Nucleic Acids Res 2020; 47:6858-6870. [PMID: 31194871 PMCID: PMC6649704 DOI: 10.1093/nar/gkz517] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 05/29/2019] [Accepted: 06/01/2019] [Indexed: 12/11/2022] Open
Abstract
Inverted repeats are common DNA elements, but they rarely overlap with protein-coding sequences due to the ensuing conflict with the structure and function of the encoded protein. We discovered numerous perfect inverted repeats of considerable length (up to 284 bp) embedded within the protein-coding genes in mitochondrial genomes of four Nematomorpha species. Strikingly, both arms of the inverted repeats encode conserved regions of the amino acid sequence. We confirmed enzymatic activity of the respiratory complex I encoded by inverted repeat-containing genes. The nucleotide composition of inverted repeats suggests strong selection at the amino acid level in these regions. We conclude that the inverted repeat-containing genes are transcribed and translated into functional proteins. The survey of available mitochondrial genomes reveals that several other organisms possess similar albeit shorter embedded repeats. Mitochondrial genomes of Nematomorpha demonstrate an extraordinary evolutionary compromise where protein function and stringent secondary structure elements within the coding regions are preserved simultaneously.
Collapse
Affiliation(s)
- Kirill V Mikhailov
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Leninskiye Gory 1-40, Moscow 119991, Russian Federation.,Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow 127994, Russian Federation
| | - Boris D Efeykin
- Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow 127994, Russian Federation.,Severtsov Institute of Ecology and Evolution, Moscow 119071, Russian Federation
| | - Alexander Y Panchin
- Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow 127994, Russian Federation
| | - Dmitry A Knorre
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Leninskiye Gory 1-40, Moscow 119991, Russian Federation.,Institute of Molecular Medicine, Sechenov First Moscow State Medical University, Moscow 119991, Russian Federation
| | - Maria D Logacheva
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Leninskiye Gory 1-40, Moscow 119991, Russian Federation.,Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow 127994, Russian Federation.,Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow 143028, Russian Federation
| | - Aleksey A Penin
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Leninskiye Gory 1-40, Moscow 119991, Russian Federation.,Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow 127994, Russian Federation
| | - Maria S Muntyan
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Leninskiye Gory 1-40, Moscow 119991, Russian Federation
| | - Mikhail A Nikitin
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Leninskiye Gory 1-40, Moscow 119991, Russian Federation.,Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow 127994, Russian Federation
| | - Olga V Popova
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Leninskiye Gory 1-40, Moscow 119991, Russian Federation
| | - Olga N Zanegina
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Leninskiye Gory 1-40, Moscow 119991, Russian Federation
| | - Mikhail Y Vyssokikh
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Leninskiye Gory 1-40, Moscow 119991, Russian Federation
| | - Sergei E Spiridonov
- Severtsov Institute of Ecology and Evolution, Moscow 119071, Russian Federation
| | - Vladimir V Aleoshin
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Leninskiye Gory 1-40, Moscow 119991, Russian Federation.,Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow 127994, Russian Federation
| | - Yuri V Panchin
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Leninskiye Gory 1-40, Moscow 119991, Russian Federation.,Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow 127994, Russian Federation
| |
Collapse
|
18
|
Affram Y, Zapata JC, Gholizadeh Z, Tolbert WD, Zhou W, Iglesias-Ussel MD, Pazgier M, Ray K, Latinovic OS, Romerio F. The HIV-1 Antisense Protein ASP Is a Transmembrane Protein of the Cell Surface and an Integral Protein of the Viral Envelope. J Virol 2019; 93:e00574-19. [PMID: 31434734 PMCID: PMC6803264 DOI: 10.1128/jvi.00574-19] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 08/14/2019] [Indexed: 12/13/2022] Open
Abstract
The negative strand of HIV-1 encodes a highly hydrophobic antisense protein (ASP) with no known homologs. The presence of humoral and cellular immune responses to ASP in HIV-1 patients indicates that ASP is expressed in vivo, but its role in HIV-1 replication remains unknown. We investigated ASP expression in multiple chronically infected myeloid and lymphoid cell lines using an anti-ASP monoclonal antibody (324.6) in combination with flow cytometry and microscopy approaches. At baseline and in the absence of stimuli, ASP shows polarized subnuclear distribution, preferentially in areas with low content of suppressive epigenetic marks. However, following treatment with phorbol 12-myristate 13-acetate (PMA), ASP translocates to the cytoplasm and is detectable on the cell surface, even in the absence of membrane permeabilization, indicating that 324.6 recognizes an ASP epitope that is exposed extracellularly. Further, surface staining with 324.6 and anti-gp120 antibodies showed that ASP and gp120 colocalize, suggesting that ASP might become incorporated in the membranes of budding virions. Indeed, fluorescence correlation spectroscopy studies showed binding of 324.6 to cell-free HIV-1 particles. Moreover, 324.6 was able to capture and retain HIV-1 virions with efficiency similar to that of the anti-gp120 antibody VRC01. Our studies indicate that ASP is an integral protein of the plasma membranes of chronically infected cells stimulated with PMA, and upon viral budding, ASP becomes a structural protein of the HIV-1 envelope. These results may provide leads to investigate the possible role of ASP in the virus replication cycle and suggest that ASP may represent a new therapeutic or vaccine target.IMPORTANCE The HIV-1 genome contains a gene expressed in the opposite, or antisense, direction to all other genes. The protein product of this antisense gene, called ASP, is poorly characterized, and its role in viral replication remains unknown. We provide evidence that the antisense protein, ASP, of HIV-1 is found within the cell nucleus in unstimulated cells. In addition, we show that after PMA treatment, ASP exits the nucleus and localizes on the cell membrane. Moreover, we demonstrate that ASP is present on the surfaces of viral particles. Altogether, our studies identify ASP as a new structural component of HIV-1 and show that ASP is an accessory protein that promotes viral replication. The presence of ASP on the surfaces of both infected cells and viral particles might be exploited therapeutically.
Collapse
Affiliation(s)
- Yvonne Affram
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Juan C Zapata
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Zahra Gholizadeh
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - William D Tolbert
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Wei Zhou
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Maria D Iglesias-Ussel
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Marzena Pazgier
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Krishanu Ray
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Olga S Latinovic
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Fabio Romerio
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
19
|
Minarovits J, Niller HH. Truncated oncoproteins of retroviruses and hepatitis B virus: A lesson in contrasts. INFECTION GENETICS AND EVOLUTION 2019; 73:342-357. [DOI: 10.1016/j.meegid.2019.05.020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 05/14/2019] [Accepted: 05/27/2019] [Indexed: 02/07/2023]
|
20
|
Schlub TE, Buchmann JP, Holmes EC. A Simple Method to Detect Candidate Overlapping Genes in Viruses Using Single Genome Sequences. Mol Biol Evol 2019; 35:2572-2581. [PMID: 30099499 PMCID: PMC6188560 DOI: 10.1093/molbev/msy155] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Overlapping genes in viruses maximize the coding capacity of their genomes and allow the generation of new genes without major increases in genome size. Despite their importance, the evolution and function of overlapping genes are often not well understood, in part due to difficulties in their detection. In addition, most bioinformatic approaches for the detection of overlapping genes require the comparison of multiple genome sequences that may not be available in metagenomic surveys of virus biodiversity. We introduce a simple new method for identifying candidate functional overlapping genes using single virus genome sequences. Our method uses randomization tests to estimate the expected length of open reading frames and then identifies overlapping open reading frames that significantly exceed this length and are thus predicted to be functional. We applied this method to 2548 reference RNA virus genomes and find that it has both high sensitivity and low false discovery for genes that overlap by at least 50 nucleotides. Notably, this analysis provided evidence for 29 previously undiscovered functional overlapping genes, some of which are coded in the antisense direction suggesting there are limitations in our current understanding of RNA virus replication.
Collapse
Affiliation(s)
- Timothy E Schlub
- Sydney School of Public Health, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia
| | - Jan P Buchmann
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW , Australia
| | - Edward C Holmes
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW , Australia
| |
Collapse
|
21
|
Pavesi A. Asymmetric evolution in viral overlapping genes is a source of selective protein adaptation. Virology 2019; 532:39-47. [PMID: 31004987 PMCID: PMC7125799 DOI: 10.1016/j.virol.2019.03.017] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Revised: 03/25/2019] [Accepted: 03/26/2019] [Indexed: 12/29/2022]
Abstract
Overlapping genes represent an intriguing puzzle, as they encode two proteins whose ability to evolve is constrained by each other. Overlapping genes can undergo “symmetric evolution” (similar selection pressures on the two proteins) or “asymmetric evolution” (significantly different selection pressures on the two proteins). By sequence analysis of 75 pairs of homologous viral overlapping genes, I evaluated their accordance with one or the other model. Analysis of nucleotide and amino acid sequences revealed that half of overlaps undergo asymmetric evolution, as the protein from one frame shows a number of substitutions significantly higher than that of the protein from the other frame. Interestingly, the most variable protein (often known to interact with the host proteins) appeared to be encoded by the de novo frame in all cases examined. These findings suggest that overlapping genes, besides to increase the coding ability of viruses, are also a source of selective protein adaptation. A dataset of 80 pairs of homologous overlapping genes from viruses is examined. Its analysis reveals that half of overlapping genes undergo asymmetric evolution. The most variable gene product is that encoded by the de novo overlapping gene. Overlapping genes evolving asymmetrically are a source of selective protein adaptation.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 11/A, I-43124, Parma, Italy.
| |
Collapse
|
22
|
Casola C. From De Novo to "De Nono": The Majority of Novel Protein-Coding Genes Identified with Phylostratigraphy Are Old Genes or Recent Duplicates. Genome Biol Evol 2018; 10:2906-2918. [PMID: 30346517 PMCID: PMC6239577 DOI: 10.1093/gbe/evy231] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/10/2018] [Indexed: 12/11/2022] Open
Abstract
The evolution of novel protein-coding genes from noncoding regions of the genome is one of the most compelling pieces of evidence for genetic innovations in nature. One popular approach to identify de novo genes is phylostratigraphy, which consists of determining the approximate time of origin (age) of a gene based on its distribution along a species phylogeny. Several studies have revealed significant flaws in determining the age of genes, including de novo genes, using phylostratigraphy alone. However, the rate of false positives in de novo gene surveys, based on phylostratigraphy, remains unknown. Here, I reanalyze the findings from three studies, two of which identified tens to hundreds of rodent-specific de novo genes adopting a phylostratigraphy-centered approach. Most putative de novo genes discovered in these investigations are no longer included in recently updated mouse gene sets. Using a combination of synteny information and sequence similarity searches, I show that ∼60% of the remaining 381 putative de novo genes share homology with genes from other vertebrates, originated through gene duplication, and/or share no synteny information with nonrodent mammals. These results led to an estimated rate of ∼12 de novo genes per million years in mouse. Contrary to a previous study (Wilson BA, Foy SG, Neme R, Masel J. 2017. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat Ecol Evol. 1:0146), I found no evidence supporting the preadaptation hypothesis of de novo gene formation. Nearly half of the de novo genes confirmed in this study are within older genes, indicating that co-option of preexisting regulatory regions and a higher GC content may facilitate the origin of novel genes.
Collapse
Affiliation(s)
- Claudio Casola
- Department of Ecosystem Science and Management, Texas A&M University
| |
Collapse
|
23
|
Pavesi A, Vianelli A, Chirico N, Bao Y, Blinkova O, Belshaw R, Firth A, Karlin D. Overlapping genes and the proteins they encode differ significantly in their sequence composition from non-overlapping genes. PLoS One 2018; 13:e0202513. [PMID: 30339683 PMCID: PMC6195259 DOI: 10.1371/journal.pone.0202513] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 08/03/2018] [Indexed: 11/19/2022] Open
Abstract
Overlapping genes represent a fascinating evolutionary puzzle, since they encode two functionally unrelated proteins from the same DNA sequence. They originate by a mechanism of overprinting, in which point mutations in an existing frame allow the expression (the "birth") of a completely new protein from a second frame. In viruses, in which overlapping genes are abundant, these new proteins often play a critical role in infection, yet they are frequently overlooked during genome annotation. This results in erroneous interpretation of mutational studies and in a significant waste of resources. Therefore, overlapping genes need to be correctly detected, especially since they are now thought to be abundant also in eukaryotes. Developing better detection methods and conducting systematic evolutionary studies require a large, reliable benchmark dataset of known cases. We thus assembled a high-quality dataset of 80 viral overlapping genes whose expression is experimentally proven. Many of them were not present in databases. We found that overall, overlapping genes differ significantly from non-overlapping genes in their nucleotide and amino acid composition. In particular, the proteins they encode are enriched in high-degeneracy amino acids and depleted in low-degeneracy ones, which may alleviate the evolutionary constraints acting on overlapping genes. Principal component analysis revealed that the vast majority of overlapping genes follow a similar composition bias, despite their heterogeneity in length and function. Six proven mammalian overlapping genes also followed this bias. We propose that this apparently near-universal composition bias may either favour the birth of overlapping genes, or/and result from selection pressure acting on them.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy
| | - Alberto Vianelli
- Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Nicola Chirico
- Department of Theoretical and Applied Sciences, University of Insubria, Varese, Italy
| | - Yiming Bao
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Olga Blinkova
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America
| | - Robert Belshaw
- School of Biomedical & Healthcare Sciences, Plymouth University Peninsula Schools of Medicine and Dentistry (PUPSMD), Plymouth, United Kingdom
| | - Andrew Firth
- Department of Pathology, Division of Virology, University of Cambridge, Cambridge, United Kingdom
| | - David Karlin
- Department of Zoology, University of Oxford, Oxford, United Kingdom
- Division of Structural Biology, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
24
|
Willis S, Masel J. Gene Birth Contributes to Structural Disorder Encoded by Overlapping Genes. Genetics 2018; 210:303-313. [PMID: 30026186 PMCID: PMC6116962 DOI: 10.1534/genetics.118.301249] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Accepted: 07/18/2018] [Indexed: 11/18/2022] Open
Abstract
The same nucleotide sequence can encode two protein products in different reading frames. Overlapping gene regions encode higher levels of intrinsic structural disorder (ISD) than nonoverlapping genes (39% vs. 25% in our viral dataset). This might be because of the intrinsic properties of the genetic code, because one member per pair was recently born de novo in a process that favors high ISD, or because high ISD relieves increased evolutionary constraint imposed by dual-coding. Here, we quantify the relative contributions of these three alternative hypotheses. We estimate that the recency of de novo gene birth explains [Formula: see text] or more of the elevation in ISD in overlapping regions of viral genes. While the two reading frames within a same-strand overlapping gene pair have markedly different ISD tendencies that must be controlled for, their effects cancel out to make no net contribution to ISD. The remaining elevation of ISD in the older members of overlapping gene pairs, presumed due to the need to alleviate evolutionary constraint, was already present prior to the origin of the overlap. Same-strand overlapping gene birth events can occur in two different frames, favoring high ISD either in the ancestral gene or in the novel gene; surprisingly, most de novo gene birth events contained completely within the body of an ancestral gene favor high ISD in the ancestral gene (23 phylogenetically independent events vs. 1). This can be explained by mutation bias favoring the frame with more start codons and fewer stop codons.
Collapse
Affiliation(s)
- Sara Willis
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
| |
Collapse
|
25
|
Lu TC, Leu JY, Lin WC. A Comprehensive Analysis of Transcript-Supported De Novo Genes in Saccharomyces sensu stricto Yeasts. Mol Biol Evol 2018; 34:2823-2838. [PMID: 28981695 PMCID: PMC5850716 DOI: 10.1093/molbev/msx210] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Novel genes arising from random DNA sequences (de novo genes) have been suggested to be widespread in the genomes of different organisms. However, our knowledge about the origin and evolution of de novo genes is still limited. To systematically understand the general features of de novo genes, we established a robust pipeline to analyze >20,000 transcript-supported coding sequences (CDSs) from the budding yeast Saccharomyces cerevisiae. Our analysis pipeline combined phylogeny, synteny, and sequence alignment information to identify possible orthologs across 20 Saccharomycetaceae yeasts and discovered 4,340 S. cerevisiae-specific de novo genes and 8,871 S. sensu stricto-specific de novo genes. We further combine information on CDS positions and transcript structures to show that >65% of de novo genes arose from transcript isoforms of ancient genes, especially in the upstream and internal regions of ancient genes. Fourteen identified de novo genes with high transcript levels were chosen to verify their protein expressions. Ten of them, including eight transcript isoform-associated CDSs, showed translation signals and five proteins exhibited specific cytosolic localizations. Our results suggest that de novo genes frequently arise in the S. sensu stricto complex and have the potential to be quickly integrated into ancient cellular network.
Collapse
Affiliation(s)
- Tzu-Chiao Lu
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei, Taiwan.,Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan.,Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Jun-Yi Leu
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei, Taiwan.,Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| | - Wen-Chang Lin
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei, Taiwan.,Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
26
|
Hücker SM, Vanderhaeghen S, Abellan-Schneyder I, Wecko R, Simon S, Scherer S, Neuhaus K. A novel short L-arginine responsive protein-coding gene (laoB) antiparallel overlapping to a CadC-like transcriptional regulator in Escherichia coli O157:H7 Sakai originated by overprinting. BMC Evol Biol 2018; 18:21. [PMID: 29433444 PMCID: PMC5810103 DOI: 10.1186/s12862-018-1134-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Accepted: 01/31/2018] [Indexed: 11/10/2022] Open
Abstract
Background Due to the DNA triplet code, it is possible that the sequences of two or more protein-coding genes overlap to a large degree. However, such non-trivial overlaps are usually excluded by genome annotation pipelines and, thus, only a few overlapping gene pairs have been described in bacteria. In contrast, transcriptome and translatome sequencing reveals many signals originated from the antisense strand of annotated genes, of which we analyzed an example gene pair in more detail. Results A small open reading frame of Escherichia coli O157:H7 strain Sakai (EHEC), designated laoB (L-arginine responsive overlapping gene), is embedded in reading frame −2 in the antisense strand of ECs5115, encoding a CadC-like transcriptional regulator. This overlapping gene shows evidence of transcription and translation in Luria-Bertani (LB) and brain-heart infusion (BHI) medium based on RNA sequencing (RNAseq) and ribosomal-footprint sequencing (RIBOseq). The transcriptional start site is 289 base pairs (bp) upstream of the start codon and transcription termination is 155 bp downstream of the stop codon. Overexpression of LaoB fused to an enhanced green fluorescent protein (EGFP) reporter was possible. The sequence upstream of the transcriptional start site displayed strong promoter activity under different conditions, whereas promoter activity was significantly decreased in the presence of L-arginine. A strand-specific translationally arrested mutant of laoB provided a significant growth advantage in competitive growth experiments in the presence of L-arginine compared to the wild type, which returned to wild type level after complementation of laoB in trans. A phylostratigraphic analysis indicated that the novel gene is restricted to the Escherichia/Shigella clade and might have originated recently by overprinting leading to the expression of part of the antisense strand of ECs5115. Conclusions Here, we present evidence of a novel small protein-coding gene laoB encoded in the antisense frame −2 of the annotated gene ECs5115. Clearly, laoB is evolutionarily young and it originated in the Escherichia/Shigella clade by overprinting, a process which may cause the de novo evolution of bacterial genes like laoB. Electronic supplementary material The online version of this article (10.1186/s12862-018-1134-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sarah M Hücker
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.,Fraunhofer ITEM-R, Am Biopark 9, 93053, Regensburg, Germany
| | - Sonja Vanderhaeghen
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Isabel Abellan-Schneyder
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.,Core Facility Microbiome/NGS, ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Romy Wecko
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Svenja Simon
- Department of Computer and Information Science, University of Konstanz, Box 78, 78457, Konstanz, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.,ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Klaus Neuhaus
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany. .,Core Facility Microbiome/NGS, ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
| |
Collapse
|
27
|
Opuu V, Silvert M, Simonson T. Computational design of fully overlapping coding schemes for protein pairs and triplets. Sci Rep 2017; 7:15873. [PMID: 29158504 PMCID: PMC5696523 DOI: 10.1038/s41598-017-16221-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Accepted: 11/09/2017] [Indexed: 11/26/2022] Open
Abstract
Gene pairs that overlap in their coding regions are rare except in viruses. They may occur transiently in gene creation and are of biotechnological interest. We have examined the possibility to encode an arbitrary pair of protein domains as a dual gene, with the shorter coding sequence completely embedded in the longer one. For 500 × 500 domain pairs (X, Y), we computationally designed homologous pairs (X', Y') coded this way, using an algorithm that provably maximizes the sequence similarity between (X', Y') and (X, Y). Three schemes were considered, with X' and Y' coded on the same or complementary strands. For 16% of the pairs, an overlapping coding exists where the level of homology of X', Y' to the natural proteins represents an E-value of 10-10 or better. Thus, for an arbitrary domain pair, it is surprisingly easy to design homologous sequences that can be encoded as a fully-overlapping gene pair. The algorithm is general and was used to design 200 triple genes, with three proteins encoded by the same DNA segment. The ease of design suggests overlapping genes may have occurred frequently in evolution and could be readily used to compress or constrain artificial genomes.
Collapse
Affiliation(s)
- Vaitea Opuu
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - Martin Silvert
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France.
| |
Collapse
|
28
|
Jayasena AS, Fisher MF, Panero JL, Secco D, Bernath-Levin K, Berkowitz O, Taylor NL, Schilling EE, Whelan J, Mylne JS. Stepwise Evolution of a Buried Inhibitor Peptide over 45 My. Mol Biol Evol 2017; 34:1505-1516. [PMID: 28333296 DOI: 10.1093/molbev/msx104] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
The de novo evolution of genes and the novel proteins they encode has stimulated much interest in the contribution such innovations make to the diversity of life. Most research on this de novo evolution focuses on transcripts, so studies on the biochemical steps that can enable completely new proteins to evolve and the time required to do so have been lacking. Sunflower Preproalbumin with SFTI-1 (PawS1) is an unusual albumin precursor because in addition to producing albumin it also yields a potent, bicyclic protease-inhibitor called SunFlower Trypsin Inhibitor-1 (SFTI-1). Here, we show how this inhibitor peptide evolved stepwise over tens of millions of years. To trace the origin of the inhibitor peptide SFTI-1, we assembled seed transcriptomes for 110 sunflower relatives whose evolution could be resolved by a chronogram, which allowed dates to be estimated for the various stages of molecular evolution. A genetic insertion event in an albumin precursor gene ∼45 Ma introduced two additional cleavage sites for protein maturation and conferred duality upon PawS1-Like genes such that they also encode a small buried macrocycle. Expansion of this region, including two Cys residues, enlarged the peptide ∼34 Ma and made the buried peptides bicyclic. Functional specialization into a protease inhibitor occurred ∼23 Ma. These findings document the evolution of a novel peptide inside a benign region of a pre-existing protein. We illustrate how a novel peptide can evolve without de novo gene evolution and, critically, without affecting the function of what becomes the protein host.
Collapse
Affiliation(s)
- Achala S Jayasena
- School of Molecular Sciences & ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, Australia
| | - Mark F Fisher
- School of Molecular Sciences & ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, Australia
| | - Jose L Panero
- Department of Integrative Biology, University of Texas, Austin, TX
| | - David Secco
- School of Molecular Sciences & ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, Australia
| | - Kalia Bernath-Levin
- School of Molecular Sciences & ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, Australia
| | - Oliver Berkowitz
- Department of Animal, Plant and Soil Sciences, School of Life Sciences & ARC Centre of Excellence in Plant Energy Biology, AgriBio, The Centre for AgriBioscience, La Trobe University, Bundoora, Victoria, Australia
| | - Nicolas L Taylor
- School of Molecular Sciences & ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, Australia
| | - Edward E Schilling
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville, TN
| | - James Whelan
- Department of Animal, Plant and Soil Sciences, School of Life Sciences & ARC Centre of Excellence in Plant Energy Biology, AgriBio, The Centre for AgriBioscience, La Trobe University, Bundoora, Victoria, Australia
| | - Joshua S Mylne
- School of Molecular Sciences & ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, Australia
| |
Collapse
|
29
|
Foldability of a Natural De Novo Evolved Protein. Structure 2017; 25:1687-1696.e4. [PMID: 29033289 DOI: 10.1016/j.str.2017.09.006] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2017] [Revised: 07/22/2017] [Accepted: 09/15/2017] [Indexed: 12/13/2022]
Abstract
The de novo evolution of protein-coding genes from noncoding DNA is emerging as a source of molecular innovation in biology. Studies of random sequence libraries, however, suggest that young de novo proteins will not fold into compact, specific structures typical of native globular proteins. Here we show that Bsc4, a functional, natural de novo protein encoded by a gene that evolved recently from noncoding DNA in the yeast S. cerevisiae, folds to a partially specific three-dimensional structure. Bsc4 forms soluble, compact oligomers with high β sheet content and a hydrophobic core, and undergoes cooperative, reversible denaturation. Bsc4 lacks a specific quaternary state, however, existing instead as a continuous distribution of oligomer sizes, and binds dyes indicative of amyloid oligomers or molten globules. The combination of native-like and non-native-like properties suggests a rudimentary fold that could potentially act as a functional intermediate in the emergence of new folded proteins de novo.
Collapse
|
30
|
Lauber C, Seitz S, Mattei S, Suh A, Beck J, Herstein J, Börold J, Salzburger W, Kaderali L, Briggs JAG, Bartenschlager R. Deciphering the Origin and Evolution of Hepatitis B Viruses by Means of a Family of Non-enveloped Fish Viruses. Cell Host Microbe 2017; 22:387-399.e6. [PMID: 28867387 PMCID: PMC5604429 DOI: 10.1016/j.chom.2017.07.019] [Citation(s) in RCA: 109] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2017] [Revised: 07/10/2017] [Accepted: 07/29/2017] [Indexed: 02/07/2023]
Abstract
Hepatitis B viruses (HBVs), which are enveloped viruses with reverse-transcribed DNA genomes, constitute the family Hepadnaviridae. An outstanding feature of HBVs is their streamlined genome organization with extensive gene overlap. Remarkably, the ∼1,100 bp open reading frame (ORF) encoding the envelope proteins is fully nested within the ORF of the viral replicase P. Here, we report the discovery of a diversified family of fish viruses, designated nackednaviruses, which lack the envelope protein gene, but otherwise exhibit key characteristics of HBVs including genome replication via protein-primed reverse-transcription and utilization of structurally related capsids. Phylogenetic reconstruction indicates that these two virus families separated more than 400 million years ago before the rise of tetrapods. We show that HBVs are of ancient origin, descending from non-enveloped progenitors in fishes. Their envelope protein gene emerged de novo, leading to a major transition in viral lifestyle, followed by co-evolution with their hosts over geologic eras. Nackednaviruses are non-enveloped fish viruses related to hepadnaviruses Both virus families separated from a common ancestor >400 million years ago The envelope protein gene of hepadnaviruses emerged through two distinct processes Hepadnaviruses mainly co-evolve with hosts while nackednaviruses jump between hosts
Collapse
Affiliation(s)
- Chris Lauber
- Institute for Medical Informatics and Biometry, Technische Universität Dresden, 01307 Dresden, Germany
| | - Stefan Seitz
- University of Heidelberg, Department of Infectious Diseases, Molecular Virology, 69120 Heidelberg, Germany.
| | - Simone Mattei
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Alexander Suh
- Department of Evolutionary Biology, Evolutionary Biology Centre (EBC), Uppsala University, 75236 Uppsala, Sweden
| | - Jürgen Beck
- Department of Internal Medicine 2/Molecular Biology, University Hospital Freiburg, 79106 Freiburg, Germany
| | - Jennifer Herstein
- Department of Psychiatry and the Behavioral Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Jacob Börold
- University of Heidelberg, Department of Infectious Diseases, Molecular Virology, 69120 Heidelberg, Germany
| | | | - Lars Kaderali
- Institute for Medical Informatics and Biometry, Technische Universität Dresden, 01307 Dresden, Germany; Institute for Bioinformatics, University Medicine Greifswald, 17487 Greifswald, Germany
| | - John A G Briggs
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Ralf Bartenschlager
- University of Heidelberg, Department of Infectious Diseases, Molecular Virology, 69120 Heidelberg, Germany; Division of Virus-Associated Carcinogenesis, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| |
Collapse
|
31
|
Dorokhov YL, Sheshukova EV, Komarova TV. Tobamovirus 3'-Terminal Gene Overlap May be a Mechanism for within-Host Fitness Improvement. Front Microbiol 2017; 8:851. [PMID: 28553276 PMCID: PMC5425575 DOI: 10.3389/fmicb.2017.00851] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 04/25/2017] [Indexed: 12/13/2022] Open
Abstract
Overlapping genes (OGs) are a universal phenomenon in all kingdoms, and viruses display a high content of OGs combined with a high rate of evolution. It is believed that the mechanism of gene overlap is based on overprinting of an existing gene. OGs help virus genes compress a maximum amount of information into short sequences, conferring viral proteins with novel features and thereby increasing their within-host fitness. Analysis of tobamovirus 3′-terminal genes reveals at least two modes of OG organization and mechanisms of interaction with the host. Originally isolated from Solanaceae species, viruses (referred to as Solanaceae-infecting) such as tobacco mosaic virus do not show 3′-terminal overlap between movement protein (MP) and coat protein (CP) genes but do contain open reading frame 6 (ORF6), which overlaps with both genes. Conversely, tobamoviruses, originally isolated from Brassicaceae species (referred to as Brassicaceae-infecting) and also able to infect Solanaceae plants, have no ORF6 but are characterized by overlapping MP and CP genes. Our analysis showed that the MP/CP overlap of Brassicaceae-infecting tobamoviruses results in the following: (i) genome compression and strengthening of subgenomic promoters; (ii) CP gene early expression directly from genomic and dicistronic MP subgenomic mRNA using an internal ribosome entry site (IRES) and a stable hairpin structure in the overlapping region; (iii) loss of ORF6, which influences the symptomatology of Solanaceae-infecting tobamoviruses; and (iv) acquisition of an IRES polypurine-rich region encoding an MP nuclear localization signal. We believe that MP/CP gene overlap may constitute a mechanism for host range expansion and virus adjustment to Brassicaceae plants.
Collapse
Affiliation(s)
- Yuri L Dorokhov
- N.I. Vavilov Institute of General Genetics, Russian Academy of ScienceMoscow, Russia.,A.N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State UniversityMoscow, Russia
| | | | - Tatiana V Komarova
- N.I. Vavilov Institute of General Genetics, Russian Academy of ScienceMoscow, Russia.,A.N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State UniversityMoscow, Russia
| |
Collapse
|
32
|
Fernandes JD, Faust TB, Strauli NB, Smith C, Crosby DC, Nakamura RL, Hernandez RD, Frankel AD. Functional Segregation of Overlapping Genes in HIV. Cell 2017; 167:1762-1773.e12. [PMID: 27984726 DOI: 10.1016/j.cell.2016.11.031] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2016] [Revised: 09/29/2016] [Accepted: 11/15/2016] [Indexed: 11/28/2022]
Abstract
Overlapping genes pose an evolutionary dilemma as one DNA sequence evolves under the selection pressures of multiple proteins. Here, we perform systematic statistical and mutational analyses of the overlapping HIV-1 genes tat and rev and engineer exhaustive libraries of non-overlapped viruses to perform deep mutational scanning of each gene independently. We find a "segregated" organization in which overlapped sites encode functional residues of one gene or the other, but never both. Furthermore, this organization eliminates unfit genotypes, providing a fitness advantage to the population. Our comprehensive analysis reveals the extraordinary manner in which HIV minimizes the constraint of overlapping genes and repurposes that constraint to its own advantage. Thus, overlaps are not just consequences of evolutionary constraints, but rather can provide population fitness advantages.
Collapse
Affiliation(s)
- Jason D Fernandes
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA 94158, USA; Program in Pharmaceutical Sciences and Pharmacogenomics, University of California San Francisco, San Francisco, CA 94158, USA
| | - Tyler B Faust
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA 94158, USA; Tetrad Program, Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA 94158, USA
| | - Nicolas B Strauli
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA; Biomedical Sciences Graduate Program, University of California San Francisco, San Francisco, CA 94158, USA
| | - Cynthia Smith
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA 94158, USA
| | - David C Crosby
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA 94158, USA
| | - Robert L Nakamura
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA 94158, USA
| | - Ryan D Hernandez
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
| | - Alan D Frankel
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA 94158, USA.
| |
Collapse
|
33
|
Discovery of an endogenous Deltaretrovirus in the genome of long-fingered bats (Chiroptera: Miniopteridae). Proc Natl Acad Sci U S A 2017; 114:3145-3150. [PMID: 28280099 DOI: 10.1073/pnas.1621224114] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Retroviruses can create endogenous forms on infiltration into the germline cells of their hosts. These forms are then vertically transmitted and can be considered as genetic fossils of ancient viruses. All retrovirus genera, with the exception of deltaretroviruses, have had their representation identified in the host genome as a virus fossil record. Here we describe an endogenous Deltaretrovirus, identified in the germline of long-fingered bats (Miniopteridae). A single, heavily deleted copy of this retrovirus has been found in the genome of miniopterid species, but not in the genomes of the phylogenetically closest bat families, Vespertilionidae and Cistugonidae. Therefore, the endogenization occurred in a time interval between 20 and 45 million years ago. This discovery closes the last major gap in the retroviral fossil record and provides important insights into the history of deltaretroviruses in mammals.
Collapse
|
34
|
Taylor LJ, Strebel K. Pyviko: an automated Python tool to design gene knockouts in complex viruses with overlapping genes. BMC Microbiol 2017; 17:12. [PMID: 28061810 PMCID: PMC5219722 DOI: 10.1186/s12866-016-0920-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Accepted: 12/20/2016] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Gene knockouts are a common tool used to study gene function in various organisms. However, designing gene knockouts is complicated in viruses, which frequently contain sequences that code for multiple overlapping genes. Designing mutants that can be traced by the creation of new or elimination of existing restriction sites further compounds the difficulty in experimental design of knockouts of overlapping genes. While software is available to rapidly identify restriction sites in a given nucleotide sequence, no existing software addresses experimental design of mutations involving multiple overlapping amino acid sequences in generating gene knockouts. RESULTS Pyviko performed well on a test set of over 240,000 gene pairs collected from viral genomes deposited in the National Center for Biotechnology Information Nucleotide database, identifying a point mutation which added a premature stop codon within the first 20 codons of the target gene in 93.2% of all tested gene-overprinted gene pairs. This shows that Pyviko can be used successfully in a wide variety of contexts to facilitate the molecular cloning and study of viral overprinted genes. CONCLUSIONS Pyviko is an extensible and intuitive Python tool for designing knockouts of overlapping genes. Freely available as both a Python package and a web-based interface ( http://louiejtaylor.github.io/pyViKO/ ), Pyviko simplifies the experimental design of gene knockouts in complex viruses with overlapping genes.
Collapse
Affiliation(s)
- Louis J. Taylor
- Viral Biochemistry Section, Laboratory of Molecular Microbiology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD USA
- Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania Philadelphia, Pennsylvania, USA
| | - Klaus Strebel
- Viral Biochemistry Section, Laboratory of Molecular Microbiology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD USA
| |
Collapse
|
35
|
The combinatorics of overlapping genes. J Theor Biol 2016; 415:90-101. [PMID: 27737786 DOI: 10.1016/j.jtbi.2016.09.018] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Revised: 08/31/2016] [Accepted: 09/22/2016] [Indexed: 11/23/2022]
Abstract
Overlapping genes exist in all domains of life and are much more abundant than expected upon their first discovery in the late 1970s. Assuming that the reference gene is read in frame +0, an overlapping gene can be encoded in two reading frames in the sense strand, denoted by +1 and +2, and in three reading frames in the opposite strand, denoted by -0, -1, and -2. This motivated numerous researchers to study the constraints induced by the genetic code on the various overlapping frames, mostly based on information theory. Our focus in this paper is on the constraints induced on two overlapping genes in terms of amino acids, as well as polypeptides. We show that simple linear constraints bind the amino-acid composition of two proteins encoded by overlapping genes. Novel constraints are revealed when polypeptides are considered, and not just single amino acids. For example, in double-coding sequences with an overlapping reading frame -2, each Tyrosine (denoted as Tyr or Y) in the overlapping frame overlaps a Tyrosine in the reference frame +0 (and reciprocally), whereas specific words (e.g. YY) never occur. We thus distinguish between null constraints (YY = 0 in frame -2) and non-null constraints (Y in frame +0 ⇔ Y in frame -2). Our equivalence-based constraints are symmetrical and thus enable the characterization of the joint composition of overlapping proteins. We describe several formal frameworks and a graph algorithm to characterize and compute these constraints. As expected, the degrees of freedom left by these constraints vary drastically among the different overlapping frames. Interestingly, the biological meaning of constraints induced on two overlapping proteins (hydropathy, forbidden di-peptides, expected overlap length …) is also specific to the reading frame. We study the combinatorics of these constraints for overlapping polypeptides of length n, pointing out that, (i) except for frame -2, non-null constraints are deduced from the amino-acid (length = 1) constraints and (ii) null constraints are deduced from the di-peptide (length = 2) constraints. These results yield support for understanding the mechanisms and evolution of overlapping genes, and for developing novel overlapping gene detection methods.
Collapse
|
36
|
Chi PB, Liberles DA. Selection on protein structure, interaction, and sequence. Protein Sci 2016; 25:1168-78. [PMID: 26808055 PMCID: PMC4918422 DOI: 10.1002/pro.2886] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Revised: 01/18/2016] [Accepted: 01/19/2016] [Indexed: 11/10/2022]
Abstract
Characterizing the probabilities of observing amino acid substitutions at specific sites in a protein over evolutionary time is a major goal in the field of molecular evolution. While purely statistical approaches at different levels of complexity exist, approaches rooted in underlying biological processes are necessary to characterize both the context-dependence of sequence changes (epistasis) and to extrapolate to sequences not observed in biological databases. To develop such approaches, an understanding of the different selective forces that act on amino acid substitution is necessary. Here, an overview of selection on and corresponding modeling of folding stability, folding specificity, binding affinity and specificity for ligands, the evolution of new binding sites on protein surfaces, protein dynamics, intrinsic disorder, and protein aggregation as well as the interplay with protein expression level (concentration) and biased mutational processes are presented.
Collapse
Affiliation(s)
- Peter B Chi
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122
- Department of Mathematics and Computer Science, Ursinus College, Collegeville, Pennsylvania, 19426
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122
| |
Collapse
|
37
|
Brandes N, Linial M. Gene overlapping and size constraints in the viral world. Biol Direct 2016; 11:26. [PMID: 27209091 PMCID: PMC4875738 DOI: 10.1186/s13062-016-0128-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Accepted: 05/06/2016] [Indexed: 12/19/2022] Open
Abstract
Background Viruses are the simplest replicating units, characterized by a limited number of coding genes and an exceptionally high rate of overlapping genes. We sought a unified evolutionary explanation that accounts for their genome sizes, gene overlapping and capsid properties. Results We performed an unbiased statistical analysis of ~100 families within ~400 genera that comprise the currently known viral world. We found that the volume utilization of capsids is often low, and greatly varies among viral families. Furthermore, although viruses span three orders of magnitude in genome length, they almost never have over 1500 overlapping nucleotides, or over four significantly overlapping genes per virus. Conclusions Our findings undermine the generality of the compression theory, which emphasizes optimal packing and length dependency to explain overlapping genes and capsid size in viral genomes. Instead, we propose that gene novelty and evolution exploration offer better explanations to size constraints and gene overlapping in all viruses. Reviewers This article was reviewed by Arne Elofsson and David Kreil. Electronic supplementary material The online version of this article (doi:10.1186/s13062-016-0128-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nadav Brandes
- Einstein Institute of Mathematics, The Edmond J. Safra Campus, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Michal Linial
- Department of Biological Chemistry, Room A-530, Institute of Life Sciences, The Edmond J. Safra Campus, The Hebrew University of Jerusalem, 91904, Jerusalem, Israel.
| |
Collapse
|
38
|
Abstract
Comparative genomics have brought much insight into the de novo emergence of genes. Two new studies in Drosophila explore the dynamics of gene gain and loss at the population and species levels, extending our view on the life cycle of genes.
Collapse
Affiliation(s)
- Rafik Neme
- Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| | - Diethard Tautz
- Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany.
| |
Collapse
|
39
|
Fellner L, Simon S, Scherling C, Witting M, Schober S, Polte C, Schmitt-Kopplin P, Keim DA, Scherer S, Neuhaus K. Evidence for the recent origin of a bacterial protein-coding, overlapping orphan gene by evolutionary overprinting. BMC Evol Biol 2015; 15:283. [PMID: 26677845 PMCID: PMC4683798 DOI: 10.1186/s12862-015-0558-z] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 12/06/2015] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Gene duplication is believed to be the classical way to form novel genes, but overprinting may be an important alternative. Overprinting allows entirely novel proteins to evolve de novo, i.e., formerly non-coding open reading frames within functional genes become expressed. Only three cases have been described for Escherichia coli. Here, a fourth example is presented. RESULTS RNA sequencing revealed an open reading frame weakly transcribed in cow dung, coding for 101 residues and embedded completely in the -2 reading frame of citC in enterohemorrhagic E. coli. This gene is designated novel overlapping gene, nog1. The promoter region fused to gfp exhibits specific activities and 5' rapid amplification of cDNA ends indicated the transcriptional start 40-bp upstream of the start codon. nog1 was strand-specifically arrested in translation by a nonsense mutation silent in citC. This Nog1-mutant showed a phenotype in competitive growth against wild type in the presence of MgCl2. Small differences in metabolite concentrations were also found. Bioinformatic analyses propose Nog1 to be inner membrane-bound and to possess at least one membrane-spanning domain. A phylogenetic analysis suggests that the orphan gene nog1 arose by overprinting after Escherichia/Shigella separated from the other γ-proteobacteria. CONCLUSIONS Since nog1 is of recent origin, non-essential, short, weakly expressed and only marginally involved in E. coli's central metabolism, we propose that this gene is in an initial stage of evolution. While we present specific experimental evidence for the existence of a fourth overlapping gene in enterohemorrhagic E. coli, we believe that this may be an initial finding only and overlapping genes in bacteria may be more common than is currently assumed by microbiologists.
Collapse
Affiliation(s)
- Lea Fellner
- Lehrstuhl für Mikrobielle Ökologie, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85350, Freising, Germany.
| | - Svenja Simon
- Lehrstuhl für Datenanalyse und Visualisierung, Fachbereich Informatik und Informationswissenschaft, Universität Konstanz, Box 78, 78457, Constance, Germany.
| | - Christian Scherling
- Lehrstuhl für Ernährungsphysiologie, Wissenschaftszentrum Weihenstephan, Technische Universität München, Gregor-Mendel-Straße 2, D-85354, Freising, Germany.
| | - Michael Witting
- Research Unit Analytical BioGeoChemistry, Deutsches Forschungszentrum für Gesundheit und Umwelt GmbH, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85754, Neuherberg, Germany.
| | - Steffen Schober
- Institute of Communications Engineering, Universität Ulm, Albert-Einstein-Allee 43, 89081, Ulm, Germany. .,Present address: Blue Yonder GmbH, Ohiostraße 8, Karlsruhe, Germany.
| | - Christine Polte
- Lehrstuhl für Mikrobielle Ökologie, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85350, Freising, Germany. .,Present address: Institut für Biochemie und Molekularbiologie, Universität Hamburg, Martin-Luther-King Platz 6, 20146, Hamburg, Germany.
| | - Philippe Schmitt-Kopplin
- Research Unit Analytical BioGeoChemistry, Deutsches Forschungszentrum für Gesundheit und Umwelt GmbH, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85754, Neuherberg, Germany.
| | - Daniel A Keim
- Lehrstuhl für Datenanalyse und Visualisierung, Fachbereich Informatik und Informationswissenschaft, Universität Konstanz, Box 78, 78457, Constance, Germany.
| | - Siegfried Scherer
- Lehrstuhl für Mikrobielle Ökologie, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85350, Freising, Germany.
| | - Klaus Neuhaus
- Lehrstuhl für Mikrobielle Ökologie, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85350, Freising, Germany.
| |
Collapse
|
40
|
Zhang YC, Lin K. Phylogeny Inference of Closely Related Bacterial Genomes: Combining the Features of Both Overlapping Genes and Collinear Genomic Regions. Evol Bioinform Online 2015; 11:1-9. [PMID: 26715828 PMCID: PMC4686347 DOI: 10.4137/ebo.s33491] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Revised: 11/10/2015] [Accepted: 11/16/2015] [Indexed: 11/25/2022] Open
Abstract
Overlapping genes (OGs) represent one type of widespread genomic feature in bacterial genomes and have been used as rare genomic markers in phylogeny inference of closely related bacterial species. However, the inference may experience a decrease in performance for phylogenomic analysis of too closely or too distantly related genomes. Another drawback of OGs as phylogenetic markers is that they usually take little account of the effects of genomic rearrangement on the similarity estimation, such as intra-chromosome/genome translocations, horizontal gene transfer, and gene losses. To explore such effects on the accuracy of phylogeny reconstruction, we combine phylogenetic signals of OGs with collinear genomic regions, here called locally collinear blocks (LCBs). By putting these together, we refine our previous metric of pairwise similarity between two closely related bacterial genomes. As a case study, we used this new method to reconstruct the phylogenies of 88 Enterobacteriale genomes of the class Gammaproteobacteria. Our results demonstrated that the topological accuracy of the inferred phylogeny was improved when both OGs and LCBs were simultaneously considered, suggesting that combining these two phylogenetic markers may reduce, to some extent, the influence of gene loss on phylogeny inference. Such phylogenomic studies, we believe, will help us to explore a more effective approach to increasing the robustness of phylogeny reconstruction of closely related bacterial organisms.
Collapse
Affiliation(s)
- Yan-Cong Zhang
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Beijing Normal University, Beijing, China. ; MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Kui Lin
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Beijing Normal University, Beijing, China. ; MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| |
Collapse
|
41
|
Different patterns of codon usage in the overlapping polymerase and surface genes of hepatitis B virus suggest a de novo origin by modular evolution. J Gen Virol 2015; 96:3577-3586. [DOI: 10.1099/jgv.0.000307] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The polymerase (P) and surface (S) genes of hepatitis B virus (HBV) show the longest gene overlap in animal viruses. Gene overlaps originate by the overprinting of a novel frame onto an ancestral pre-existing frame. Identifying which frame is ancestral and which frame is de novo (the genealogy of the overlap) is an appealing topic. However, the P/S overlap of HBV is an intriguing paradox, because both genes are indispensable for virus survival. Thus, the hypothesis of a primordial virus without the surface protein or without the polymerase makes no biological sense. With the aim to determine the genealogy of the overlap, the codon usage of the overlapping frames P and S was compared to that of the non-overlapping region. It was found that the overlap of human HBV had two patterns of codon usage. One was localized in the 5′ one-third of the overlap and the other in the 3′ two-thirds. By extending the analysis to non-human HBVs, it was found that this feature occurred in all hepadnaviruses. Under the assumption that the ancestral frame has a codon usage significantly closer to that of the non-overlapping region than the de novo frame, the ancestral frames in the 5′ and 3′ region of the overlap could be predicted. They were, respectively, frame S and frame P. These results suggest that the spacer domain of the polymerase and the S domain of the surface protein originated de novo by overprinting. They support a modular evolution hypothesis for the origin of the overlap.
Collapse
|
42
|
Andersson DI, Jerlström-Hultqvist J, Näsvall J. Evolution of new functions de novo and from preexisting genes. Cold Spring Harb Perspect Biol 2015; 7:7/6/a017996. [PMID: 26032716 DOI: 10.1101/cshperspect.a017996] [Citation(s) in RCA: 96] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
How the enormous structural and functional diversity of new genes and proteins was generated (estimated to be 10(10)-10(12) different proteins in all organisms on earth [Choi I-G, Kim S-H. 2006. Evolution of protein structural classes and protein sequence families. Proc Natl Acad Sci 103: 14056-14061] is a central biological question that has a long and rich history. Extensive work during the last 80 years have shown that new genes that play important roles in lineage-specific phenotypes and adaptation can originate through a multitude of different mechanisms, including duplication, lateral gene transfer, gene fusion/fission, and de novo origination. In this review, we focus on two main processes as generators of new functions: evolution of new genes by duplication and divergence of pre-existing genes and de novo gene origination in which a whole protein-coding gene evolves from a noncoding sequence.
Collapse
Affiliation(s)
- Dan I Andersson
- Department of Medical Biochemistry and Microbiology, Uppsala University, SE-75123 Uppsala, Sweden
| | - Jon Jerlström-Hultqvist
- Department of Medical Biochemistry and Microbiology, Uppsala University, SE-75123 Uppsala, Sweden
| | - Joakim Näsvall
- Department of Medical Biochemistry and Microbiology, Uppsala University, SE-75123 Uppsala, Sweden
| |
Collapse
|
43
|
Morozov SY, Solovyev AG. Phylogenetic relationship of some "accessory" helicases of plant positive-stranded RNA viruses: toward understanding the evolution of triple gene block. Front Microbiol 2015; 6:508. [PMID: 26042118 PMCID: PMC4436898 DOI: 10.3389/fmicb.2015.00508] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Accepted: 05/08/2015] [Indexed: 12/16/2022] Open
Abstract
Recently, we hypothesized that silencing suppression activity gained by a viral replicative helicase led to the emergence of the second helicase possessing activity of the viral silencing suppressor and/or movement protein (MP). Our hypothesis accounted for the evolutionary origin of the specialized ‘triple gene block’ (TGB) in plant virus genomes encoding the MPs TGB1, TGB2, and TGB3 required for viral cell-to-cell transport through plasmodesmata. Here, we used public transcriptome databases to identify previously unrecognized viruses. The analysis of novel viral genomes further supported the previously proposed scenario of TGB origin and evolution, which included the following steps. First, the accessory helicase gene could have been acquired by horizontal gene transfer (HGT) presumably occured independently in different virus groups. Second, the TGB2 gene evolved by HGT or autonomization of the C-terminal transmembrane domain found in at least one TGB1 helicase. Third, the TGB3 gene has most likely emerged in the genomic block consisting of the TGB1 and TGB2 genes.
Collapse
Affiliation(s)
- Sergey Y Morozov
- A. N. Belozersky Institute of Physico-Chemical Biology, Moscow State University Moscow, Russia
| | - Andrey G Solovyev
- A. N. Belozersky Institute of Physico-Chemical Biology, Moscow State University Moscow, Russia
| |
Collapse
|
44
|
Casane D, Laurenti P. Syllogomanie moléculaire : l’ADN non codant enrichit le jeu des possibles. Med Sci (Paris) 2014; 30:1177-83. [DOI: 10.1051/medsci/20143012022] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
|
45
|
Overlapping genes: a new strategy of thermophilic stress tolerance in prokaryotes. Extremophiles 2014; 19:345-53. [PMID: 25503326 DOI: 10.1007/s00792-014-0720-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 12/01/2014] [Indexed: 12/29/2022]
Abstract
Overlapping genes (OGs) draw the focus of recent day's research. However, the significance of OGs in prokaryotic genomes remained unexplored. As an adaptation to high temperature, thermophiles were shown to eliminate their intergenic regions. Therefore, it could be possible that prokaryotes would increase their OG content to adapt to high temperature. To test this hypothesis, we carried out a comparative study on OG frequency of 256 prokaryotic genomes comprising both thermophiles and non-thermophiles. It was found that thermophiles exhibit higher frequency of overlapping genes than non-thermophiles. Moreover, overlap frequency was found to correlate with optimal growth temperature (OGT) in prokaryotes. Long overlap frequency was found to hold a positive correlation with OGT resulting in an abundance of long overlaps in thermophiles compared to non-thermophiles. On the other hand, short overlap (1-4 nucleotides) frequency (SOF) did not yield any direct correlation with OGT. However, the correlation of SOF with CAIavg (extent of variation of codon usage bias measured as the mean of codon adaptation index of all genes in a given genome) and IG% (proportion of intergenic regions) indicate that they might upregulate the aforementioned factors (CAIavg and IG%) which are already known to be vital forces for thermophilic adaptation. From these evidences, we propose that the OG content bears a strong link to thermophily. Long overlaps are important for their genome compaction and short overlaps are important to uphold high CAIavg. Our findings will surely help in better understanding of the significance of overlapping gene content in prokaryotic genomes.
Collapse
|
46
|
Suh A, Weber CC, Kehlmaier C, Braun EL, Green RE, Fritz U, Ray DA, Ellegren H. Early mesozoic coexistence of amniotes and hepadnaviridae. PLoS Genet 2014; 10:e1004559. [PMID: 25501991 PMCID: PMC4263362 DOI: 10.1371/journal.pgen.1004559] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2014] [Accepted: 06/24/2014] [Indexed: 12/16/2022] Open
Abstract
Hepadnaviridae are double-stranded DNA viruses that infect some species of birds and mammals. This includes humans, where hepatitis B viruses (HBVs) are prevalent pathogens in considerable parts of the global population. Recently, endogenized sequences of HBVs (eHBVs) have been discovered in bird genomes where they constitute direct evidence for the coexistence of these viruses and their hosts from the late Mesozoic until present. Nevertheless, virtually nothing is known about the ancient host range of this virus family in other animals. Here we report the first eHBVs from crocodilian, snake, and turtle genomes, including a turtle eHBV that endogenized >207 million years ago. This genomic “fossil” is >125 million years older than the oldest avian eHBV and provides the first direct evidence that Hepadnaviridae already existed during the Early Mesozoic. This implies that the Mesozoic fossil record of HBV infection spans three of the five major groups of land vertebrates, namely birds, crocodilians, and turtles. We show that the deep phylogenetic relationships of HBVs are largely congruent with the deep phylogeny of their amniote hosts, which suggests an ancient amniote–HBV coexistence and codivergence, at least since the Early Mesozoic. Notably, the organization of overlapping genes as well as the structure of elements involved in viral replication has remained highly conserved among HBVs along that time span, except for the presence of the X gene. We provide multiple lines of evidence that the tumor-promoting X protein of mammalian HBVs lacks a homolog in all other hepadnaviruses and propose a novel scenario for the emergence of X via segmental duplication and overprinting of pre-existing reading frames in the ancestor of mammalian HBVs. Our study reveals an unforeseen host range of prehistoric HBVs and provides novel insights into the genome evolution of hepadnaviruses throughout their long-lasting association with amniote hosts. Viruses are not known to leave physical fossil traces, which makes our understanding of their evolutionary prehistory crucially dependent on the detection of endogenous viruses. Ancient endogenous viruses, also known as paleoviruses, are relics of viral genomes or fragments thereof that once infiltrated their host's germline and then remained as molecular “fossils” within the host genome. The massive genome sequencing of recent years has unearthed vast numbers of paleoviruses from various animal genomes, including the first endogenous hepatitis B viruses (eHBVs) in bird genomes. We screened genomes of land vertebrates (amniotes) for the presence of paleoviruses and identified ancient eHBVs in the recently sequenced genomes of crocodilians, snakes, and turtles. We report an eHBV that is >207 million years old, making it the oldest endogenous virus currently known. Furthermore, our results provide direct evidence that the Hepadnaviridae virus family infected birds, crocodilians and turtles during the Mesozoic Era, and suggest a long-lasting coexistence of these viruses and their amniote hosts at least since the Early Mesozoic. We challenge previous views on the origin of the oncogenic X gene and provide an evolutionary explanation as to why only mammalian hepatitis B infection leads to hepatocellular carcinoma.
Collapse
Affiliation(s)
- Alexander Suh
- Department of Evolutionary Biology (EBC), Uppsala University, Uppsala, Sweden
- * E-mail:
| | - Claudia C. Weber
- Department of Evolutionary Biology (EBC), Uppsala University, Uppsala, Sweden
| | - Christian Kehlmaier
- Museum of Zoology, Senckenberg Research Institute and Natural History Museum, Dresden, Germany
| | - Edward L. Braun
- Department of Biology and Genetics Institute, University of Florida, Gainesville, Florida, United States of America
| | - Richard E. Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Uwe Fritz
- Museum of Zoology, Senckenberg Research Institute and Natural History Museum, Dresden, Germany
| | - David A. Ray
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Mississippi State, Mississippi, United States of America
- Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, Mississippi, United States of America
| | - Hans Ellegren
- Department of Evolutionary Biology (EBC), Uppsala University, Uppsala, Sweden
| |
Collapse
|
47
|
Shukla A, Hilgenfeld R. Acquisition of new protein domains by coronaviruses: analysis of overlapping genes coding for proteins N and 9b in SARS coronavirus. Virus Genes 2014; 50:29-38. [PMID: 25410051 PMCID: PMC7089080 DOI: 10.1007/s11262-014-1139-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2014] [Accepted: 10/25/2014] [Indexed: 12/02/2022]
Abstract
Acquisition of new proteins by viruses usually occurs through horizontal gene transfer or through gene duplication, but another, less common mechanism is the usage of completely or partially overlapping reading frames. A case of acquisition of a completely new protein through introduction of a start codon in an alternative reading frame is the protein encoded by open reading frame (orf) 9b of SARS coronavirus. This gene completely overlaps with the nucleocapsid (N) gene (orf9a). Our findings indicate that the orf9b gene features a discordant codon-usage pattern. We analyzed the evolution of orf9b in concert with orf9a using sequence data of betacoronavirus-lineage b and found that orf9b, which encodes the overprinting protein, evolved largely independent of the overprinted orf9a. We also examined the protein products of these genomic sequences for their structural flexibility and found that it is not necessary for a newly acquired, overlapping protein product to be intrinsically disordered, in contrast to earlier suggestions. Our findings contribute to characterizing sequence properties of newly acquired genes making use of overlapping reading frames.
Collapse
Affiliation(s)
- Aditi Shukla
- Institute of Biochemistry, Center for Structural and Cell Biology in Medicine, University of Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany
- Graduate School for Computing in Medicine & Life Sciences, University of Lübeck, Lübeck, Germany
| | - Rolf Hilgenfeld
- Institute of Biochemistry, Center for Structural and Cell Biology in Medicine, University of Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany
- German Center for Infection Research (DZIF), University of Lübeck, Lübeck, Germany
| |
Collapse
|
48
|
Discovery and characterization of auxiliary proteins encoded by type 3 simian T-cell lymphotropic viruses. J Virol 2014; 89:931-51. [PMID: 25355890 DOI: 10.1128/jvi.02150-14] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
UNLABELLED Human T-cell lymphotropic virus type 1 (HTLV-1) and HTLV-2 encode auxiliary proteins that play important roles in viral replication, viral latency, and immune escape. The presence of auxiliary protein-encoding open reading frames (ORFs) in HTLV-3, the latest HTLV to be discovered, is unknown. Simian T-cell lymphotropic virus type 3 (STLV-3) is almost identical to HTLV-3. Given the lack of HTLV-3-infected cell lines, we took advantage of STLV-3-infected cells and of an STLV-3 molecular clone to search for the presence of auxiliary transcripts. Using reverse transcriptase PCR (RT-PCR), we first uncovered the presence of three unknown viral mRNAs encoding putative proteins of 5, 8, and 9 kDa and confirmed the presence of the previously reported RorfII transcript. The existence of these viral mRNAs was confirmed by using splice site-specific RT-PCR with ex vivo samples. We showed that p5 is distributed throughout the cell and does not colocalize with a specific organelle. The p9 localization is similar to that of HTLV-1 p12 and induced a strong decrease in the calreticulin signal, similarly to HTLV-1 p12. Although p8, RorfII, and Rex-3 share an N-terminal sequence that is predicted to contain a nucleolar localization signal (NoLS), only p8 is found in the nucleolus. The p8 location in the nucleolus is linked to a bipartite NoLS. p8 and, to a lesser extent, p9 repressed viral expression but did not alter Rex-3-dependent mRNA export. Using a transformation assay, we finally showed that none of the STLV-3 auxiliary proteins had the ability to induce colony formation, while both Tax-3 and antisense protein of HTLV-3 (APH-3) promoted cellular transformation. Altogether, these results complete the characterization of the newly described primate T-lymphotropic virus type 3 (PTLV-3). IMPORTANCE Together with their simian counterparts, HTLVs form the primate T-lymphotropic viruses. HTLVs arose from interspecies transmission between nonhuman primates and humans. HTLV-1 and HTLV-2 encode auxiliary proteins that play important roles in viral replication, viral latency, and immune escape. The presence of ORFs encoding auxiliary proteins in HTLV-3 or STLV-3 genomes was unknown. Using in silico analyses, ex vivo samples, or in vitro experiments, we have uncovered the presence of 3 previously unknown viral mRNAs encoding putative proteins and confirmed the presence of a previously reported viral transcript. We characterized the intracellular localization of the four proteins. We showed that two of these proteins repress viral expression but that none of them have the ability to induce colony formation. However, both Tax and the antisense protein APH-3 promote cell transformation. Our results allowed us to characterize 4 new retroviral proteins for the first time.
Collapse
|
49
|
Light S, Basile W, Elofsson A. Orphans and new gene origination, a structural and evolutionary perspective. Curr Opin Struct Biol 2014; 26:73-83. [DOI: 10.1016/j.sbi.2014.05.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Revised: 05/07/2014] [Accepted: 05/16/2014] [Indexed: 12/28/2022]
|
50
|
Xue B, Blocquel D, Habchi J, Uversky AV, Kurgan L, Uversky VN, Longhi S. Structural disorder in viral proteins. Chem Rev 2014; 114:6880-911. [PMID: 24823319 DOI: 10.1021/cr4005692] [Citation(s) in RCA: 140] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, College of Fine Arts and Sciences, and ‡Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida , Tampa, Florida 33620, United States
| | | | | | | | | | | | | |
Collapse
|