1
|
Legarda EG, Elena SF, Mushegian AR. Emergence of two distinct spatial folds in a pair of plant virus proteins encoded by nested genes. J Biol Chem 2024; 300:107218. [PMID: 38522515 PMCID: PMC11044054 DOI: 10.1016/j.jbc.2024.107218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/15/2024] [Accepted: 03/19/2024] [Indexed: 03/26/2024] Open
Abstract
Virus genomes may encode overlapping or nested open reading frames that increase their coding capacity. It is not known whether the constraints on spatial structures of the two encoded proteins limit the evolvability of nested genes. We examine the evolution of a pair of proteins, p22 and p19, encoded by nested genes in plant viruses from the genus Tombusvirus. The known structure of p19, a suppressor of RNA silencing, belongs to the RAGNYA fold from the alpha+beta class. The structure of p22, the cell-to-cell movement protein from the 30K family widespread in plant viruses, is predicted with the AlphaFold approach, suggesting a single jelly-roll fold core from the all-beta class, structurally similar to capsid proteins from plant and animal viruses. The nucleotide and codon preferences impose modest constraints on the types of secondary structures encoded in the alternative reading frames, nonetheless allowing for compact, well-ordered folds from different structural classes in two similarly-sized nested proteins. Tombusvirus p22 emerged through radiation of the widespread 30K family, which evolved by duplication of a virus capsid protein early in the evolution of plant viruses, whereas lineage-specific p19 may have emerged by a stepwise increase in the length of the overprinted gene and incremental acquisition of functionally active secondary structure elements by the protein product. This evolution of p19 toward the RAGNYA fold represents one of the first documented examples of protein structure convergence in naturally occurring proteins.
Collapse
Affiliation(s)
- Esmeralda G Legarda
- Instituto de Biología Integrativa de Sistemas (I2SysBio), CSIC-Universitat de València, Paterna, València, Spain
| | - Santiago F Elena
- Instituto de Biología Integrativa de Sistemas (I2SysBio), CSIC-Universitat de València, Paterna, València, Spain; The Santa Fe Institute, Santa Fe, New Mexico, USA
| | - Arcady R Mushegian
- Division of Molecular and Cellular Biosciences, National Science Foundation, Arlington, Virginia, USA.
| |
Collapse
|
2
|
uz-Zaman MH, D’Alton S, Barrick JE, Ochman H. Promoter recruitment drives the emergence of proto-genes in a long-term evolution experiment with Escherichia coli. PLoS Biol 2024; 22:e3002418. [PMID: 38713714 PMCID: PMC11101190 DOI: 10.1371/journal.pbio.3002418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 05/17/2024] [Accepted: 04/18/2024] [Indexed: 05/09/2024] Open
Abstract
The phenomenon of de novo gene birth-the emergence of genes from non-genic sequences-has received considerable attention due to the widespread occurrence of genes that are unique to particular species or genomes. Most instances of de novo gene birth have been recognized through comparative analyses of genome sequences in eukaryotes, despite the abundance of novel, lineage-specific genes in bacteria and the relative ease with which bacteria can be studied in an experimental context. Here, we explore the genetic record of the Escherichia coli long-term evolution experiment (LTEE) for changes indicative of "proto-genic" phases of new gene birth in which non-genic sequences evolve stable transcription and/or translation. Over the time span of the LTEE, non-genic regions are frequently transcribed, translated and differentially expressed, with levels of transcription across low-expressed regions increasing in later generations of the experiment. Proto-genes formed downstream of new mutations result either from insertion element activity or chromosomal translocations that fused preexisting regulatory sequences to regions that were not expressed in the LTEE ancestor. Additionally, we identified instances of proto-gene emergence in which a previously unexpressed sequence was transcribed after formation of an upstream promoter, although such cases were rare compared to those caused by recruitment of preexisting promoters. Tracing the origin of the causative mutations, we discovered that most occurred early in the history of the LTEE, often within the first 20,000 generations, and became fixed soon after emergence. Our findings show that proto-genes emerge frequently within evolving populations, can persist stably, and can serve as potential substrates for new gene formation.
Collapse
Affiliation(s)
- Md. Hassan uz-Zaman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Simon D’Alton
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Jeffrey E. Barrick
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| | - Howard Ochman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
3
|
Karlin DG. Parvovirus B19 and Human Parvovirus 4 Encode Similar Proteins in a Reading Frame Overlapping the VP1 Capsid Gene. Viruses 2024; 16:191. [PMID: 38399966 PMCID: PMC10891878 DOI: 10.3390/v16020191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 01/12/2024] [Accepted: 01/24/2024] [Indexed: 02/25/2024] Open
Abstract
Viruses frequently contain overlapping genes, which encode functionally unrelated proteins from the same DNA or RNA region but in different reading frames. Yet, overlapping genes are often overlooked during genome annotation, in particular in DNA viruses. Here we looked for the presence of overlapping genes likely to encode a functional protein in human parvovirus B19 (genus Erythroparvovirus), using an experimentally validated software, Synplot2. Synplot2 detected an open reading frame, X, conserved in all erythroparvoviruses, which overlaps the VP1 capsid gene and is under highly significant selection pressure. In a related virus, human parvovirus 4 (genus Tetraparvovirus), Synplot2 also detected an open reading frame under highly significant selection pressure, ARF1, which overlaps the VP1 gene and is conserved in all tetraparvoviruses. These findings provide compelling evidence that the X and ARF1 proteins must be expressed and functional. X and ARF1 have the exact same location (they overlap the region of the VP1 gene encoding the phospholipase A2 domain), are both in the same frame (+1) with respect to the VP1 frame, and encode proteins with similar predicted properties, including a central transmembrane region. Further studies will be needed to determine whether they have a common origin and similar function. X and ARF1 are probably translated either from a polycistronic mRNA by a non-canonical mechanism, or from an unmapped monocistronic mRNA. Finally, we also discovered proteins predicted to be expressed from a frame overlapping VP1 in other species related to parvovirus B19: porcine parvovirus 2 (Z protein) and bovine parvovirus 3 (X-like protein).
Collapse
Affiliation(s)
- David G. Karlin
- Division Phytomedicine, Thaer-Institute of Agricultural and Horticultural Sciences, Humboldt-Universität zu Berlin, Lentzeallee 55/57, D-14195 Berlin, Germany;
- Independent Researcher, 13000 Marseille, France
| |
Collapse
|
4
|
López-Pérez M, Aguirre-Garrido F, Herrera-Zúñiga L, Fernández FJ. Gene as a dynamical notion: An extensive and integrative vision. Redefining the gene concept, from traditional to genic-interaction, as a new dynamical version. Biosystems 2023; 234:105060. [PMID: 37844827 DOI: 10.1016/j.biosystems.2023.105060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 09/08/2023] [Accepted: 10/10/2023] [Indexed: 10/18/2023]
Abstract
The current concept of gene has been very useful during the 20th and 21st centuries. However, recent advances in molecular biology and bioinformatics, which have further diversified the functional and adaptive profile of genetic information and its integration with cell physiology and environmental response, have contributed to focusing on additional new gene properties besides the traditional definition. Considering the inherent complexity of gene expression, whose adaptive objective must be referred to the Tortoise-Hare model, in which two tendencies converge, one focused on rapid adaptation to achieve survival, and the other that prevents an over-adaptation effect. In this context, a revision of the gene concept must be made, which must include these new mechanisms and approaches. In this paper, we propose a new conception of the idea of a gene that moves from a static and defined version of hereditary information to a dynamic idea that preponderates gene interaction (circumscribed to that established between protein-protein, protein-nucleic acid, and nucleic acid-nucleic acid) and the selection it exerts, as the irreducible element that works in a coordinated way in a genomic regulatory network (GRN).
Collapse
Affiliation(s)
- Marcos López-Pérez
- Environmental Sciences Department, Universidad Autónoma Metropolitana (Lerma Unit) Av. de las Garzas N° 10, Col. El Panteón, Municipio de Lerma de Villada, Estado de México, C.P. 52005, Mexico.
| | - Félix Aguirre-Garrido
- Environmental Sciences Department, Universidad Autónoma Metropolitana (Lerma Unit) Av. de las Garzas N° 10, Col. El Panteón, Municipio de Lerma de Villada, Estado de México, C.P. 52005, Mexico
| | - Leonardo Herrera-Zúñiga
- Chemistry Department, Universidad Autónoma Metropolitana (Iztapalapa Unit), C.P. 09340, Mexico City, Mexico
| | - Francisco J Fernández
- Biotechnology Department, Universidad Autónoma Metropolitana (Iztapalapa Unit), C.P. 09340, Mexico City, Mexico.
| |
Collapse
|
5
|
Uz-Zaman MH, D'Alton S, Barrick JE, Ochman H. Promoter capture drives the emergence of proto-genes in Escherichia coli. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.15.567300. [PMID: 38013999 PMCID: PMC10680751 DOI: 10.1101/2023.11.15.567300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
The phenomenon of de novo gene birth-the emergence of genes from non-genic sequences-has received considerable attention due to the widespread occurrence of genes that are unique to particular species or genomes. Most instances of de novo gene birth have been recognized through comparative analyses of genome sequences in eukaryotes, despite the abundance of novel, lineage-specific genes in bacteria and the relative ease with which bacteria can be studied in an experimental context. Here, we explore the genetic record of the Escherichia coli Long-Term Evolution Experiment (LTEE) for changes indicative of "proto-genic" phases of new gene birth in which non-genic sequences evolve stable transcription and/or translation. Over the time-span of the LTEE, non-genic regions are frequently transcribed, translated and differentially expressed, thereby serving as raw material for new gene emergence. Most proto-genes result either from insertion element activity or chromosomal translocations that fused pre-existing regulatory sequences to regions that were not expressed in the LTEE ancestor. Additionally, we identified instances of proto-gene emergence in which a previously unexpressed sequence was transcribed after formation of an upstream promoter. Tracing the origin of the causative mutations, we discovered that most occurred early in the history of the LTEE, often within the first 20,000 generations, and became fixed soon after emergence. Our findings show that proto-genes emerge frequently within evolving populations, persist stably, and can serve as potential substrates for new gene formation.
Collapse
|
6
|
Simoens L, Fijalkowski I, Van Damme P. Exposing the small protein load of bacterial life. FEMS Microbiol Rev 2023; 47:fuad063. [PMID: 38012116 PMCID: PMC10723866 DOI: 10.1093/femsre/fuad063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 11/10/2023] [Accepted: 11/24/2023] [Indexed: 11/29/2023] Open
Abstract
The ever-growing repertoire of genomic techniques continues to expand our understanding of the true diversity and richness of prokaryotic genomes. Riboproteogenomics laid the foundation for dynamic studies of previously overlooked genomic elements. Most strikingly, bacterial genomes were revealed to harbor robust repertoires of small open reading frames (sORFs) encoding a diverse and broadly expressed range of small proteins, or sORF-encoded polypeptides (SEPs). In recent years, continuous efforts led to great improvements in the annotation and characterization of such proteins, yet many challenges remain to fully comprehend the pervasive nature of small proteins and their impact on bacterial biology. In this work, we review the recent developments in the dynamic field of bacterial genome reannotation, catalog the important biological roles carried out by small proteins and identify challenges obstructing the way to full understanding of these elusive proteins.
Collapse
Affiliation(s)
- Laure Simoens
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, 9000 Ghent, Belgium
| | - Igor Fijalkowski
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, 9000 Ghent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K. L. Ledeganckstraat 35, 9000 Ghent, Belgium
| |
Collapse
|
7
|
Chao KH, Mao A, Salzberg SL, Pertea M. Splam: a deep-learning-based splice site predictor that improves spliced alignments. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.27.550754. [PMID: 37546880 PMCID: PMC10402160 DOI: 10.1101/2023.07.27.550754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. Here we describe Splam, a novel method for predicting splice junctions in DNA based on deep residual convolutional neural networks. Unlike some previous models, Splam looks at a relatively limited window of 400 base pairs flanking each splice site, motivated by the observation that the biological process of splicing relies primarily on signals within this window. Additionally, Splam introduces the idea of training the network on donor and acceptor pairs together, based on the principle that the splicing machinery recognizes both ends of each intron at once. We compare Splam's accuracy to recent state-of-the-art splice site prediction methods, particularly SpliceAI, another method that uses deep neural networks. Our results show that Splam is consistently more accurate than SpliceAI, with an overall accuracy of 96% at predicting human splice junctions. Splam generalizes even to non-human species, including distant ones like the flowering plant Arabidopsis thaliana. Finally, we demonstrate the use of Splam on a novel application: processing the spliced alignments of RNA-seq data to identify and eliminate errors. We show that when used in this manner, Splam yields substantial improvements in the accuracy of downstream transcriptome analysis of both poly(A) and ribo-depleted RNA-seq libraries. Overall, Splam offers a faster and more accurate approach to detecting splice junctions, while also providing a reliable and efficient solution for cleaning up erroneous spliced alignments.
Collapse
Affiliation(s)
- Kuan-Hao Chao
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Alan Mao
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Steven L Salzberg
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21211, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
8
|
Leblanc S, Brunet MA, Jacques JF, Lekehal AM, Duclos A, Tremblay A, Bruggeman-Gascon A, Samandi S, Brunelle M, Cohen AA, Scott MS, Roucou X. Newfound Coding Potential of Transcripts Unveils Missing Members of Human Protein Communities. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:515-534. [PMID: 36183975 PMCID: PMC10787177 DOI: 10.1016/j.gpb.2022.09.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 08/10/2022] [Accepted: 09/26/2022] [Indexed: 06/16/2023]
Abstract
Recent proteogenomic approaches have led to the discovery that regions of the transcriptome previously annotated as non-coding regions [i.e., untranslated regions (UTRs), open reading frames overlapping annotated coding sequences in a different reading frame, and non-coding RNAs] frequently encode proteins, termed alternative proteins (altProts). This suggests that previously identified protein-protein interaction (PPI) networks are partially incomplete because altProts are not present in conventional protein databases. Here, we used the proteogenomic resource OpenProt and a combined spectrum- and peptide-centric analysis for the re-analysis of a high-throughput human network proteomics dataset, thereby revealing the presence of 261 altProts in the network. We found 19 genes encoding both an annotated (reference) and an alternative protein interacting with each other. Of the 117 altProts encoded by pseudogenes, 38 are direct interactors of reference proteins encoded by their respective parental genes. Finally, we experimentally validate several interactions involving altProts. These data improve the blueprints of the human PPI network and suggest functional roles for hundreds of altProts.
Collapse
Affiliation(s)
- Sébastien Leblanc
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec City, QC G1V 0A6, Canada
| | - Marie A Brunet
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec City, QC G1V 0A6, Canada
| | - Jean-François Jacques
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec City, QC G1V 0A6, Canada
| | - Amina M Lekehal
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec City, QC G1V 0A6, Canada
| | - Andréa Duclos
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada
| | - Alexia Tremblay
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada
| | - Alexis Bruggeman-Gascon
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada
| | - Sondos Samandi
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec City, QC G1V 0A6, Canada
| | - Mylène Brunelle
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec City, QC G1V 0A6, Canada
| | - Alan A Cohen
- Department of Family Medicine, Université de Sherbrooke, Sherbrooke, QC J1H 5N4, Canada
| | - Michelle S Scott
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada
| | - Xavier Roucou
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada; PROTEO, Quebec Network for Research on Protein Function, Structure, and Engineering, Quebec City, QC G1V 0A6, Canada.
| |
Collapse
|
9
|
Zhang Y, Liang X, Zhao M, Qi T, Guo H, Zhao J, Zhao J, Zhan G, Kang Z, Zheng L. A novel ambigrammatic mycovirus, PsV5, works hand in glove with wheat stripe rust fungus to facilitate infection. PLANT COMMUNICATIONS 2023; 4:100505. [PMID: 36527233 DOI: 10.1016/j.xplc.2022.100505] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 11/16/2022] [Accepted: 12/14/2022] [Indexed: 05/11/2023]
Abstract
Here we describe a novel narnavirus, Puccinia striiformis virus 5 (PsV5), from the devastating wheat stripe rust fungus P. striiformis f. sp. tritici (Pst). The genome of PsV5 contains two predicted open reading frames (ORFs) that largely overlap on reverse strands: an RNA-dependent RNA polymerase (RdRp) and a reverse-frame ORF (rORF) with unknown function. Protein translations of both ORFs were demonstrated by immune technology. Transgenic wheat lines overexpressing PsV5 (RdRp-rORF), RdRp ORF, or rORF were more susceptible to Pst infection, whereas PsV5-RNA interference (RNAi) lines were more resistant. Overexpression of PsV5 (RdRp-rORF), RdRp ORF, or rORF in Fusarium graminearum also boosted fungal virulence. We thus report a novel ambigrammatic mycovirus that promotes the virulence of its fungal host. The results are a significant addition to our understanding of virosphere diversity and offer insights for sustainable wheat rust disease control.
Collapse
Affiliation(s)
- Yanhui Zhang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xiaofei Liang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Mengxin Zhao
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Tuo Qi
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, State Key Laboratory of Hybrid Rice, Key Laboratory of Major Crop Diseases & Collaborative Innovation Center for Hybrid Rice in Yangtze River Basin, Rice Research Institute, Sichuan Agricultural University at Wenjiang, Chengdu, Sichuan 611130, China
| | - Hualong Guo
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jing Zhao
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jie Zhao
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Gangming Zhan
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Zhensheng Kang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China.
| | - Li Zheng
- Sanya Nanfan Research Institute of Hainan University, Hainan Yazhou Bay Seed Laboratory, Sanya 572025, China; Key Laboratory of Green Prevention and Control of Tropical Plant Diseases and Pests, Ministry of Education and School of Plant Protection, Hainan University, Haikou, Hainan 570228, China.
| |
Collapse
|
10
|
Sathiyamani B, Daniel EA, Ansar S, Esakialraj BH, Hassan S, Revanasiddappa PD, Keshavamurthy A, Roy S, Vetrivel U, Hanna LE. Structural analysis and molecular dynamics simulation studies of HIV-1 antisense protein predict its potential role in HIV replication and pathogenesis. Front Microbiol 2023; 14:1152206. [PMID: 37020719 PMCID: PMC10067880 DOI: 10.3389/fmicb.2023.1152206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/01/2023] [Indexed: 04/07/2023] Open
Abstract
The functional significance of the HIV-1 Antisense Protein (ASP) has been a paradox since its discovery. The expression of this protein in HIV-1-infected cells and its involvement in autophagy, transcriptional regulation, and viral latency have sporadically been reported in various studies. Yet, the definite role of this protein in HIV-1 infection remains unclear. Deciphering the 3D structure of HIV-1 ASP would throw light on its potential role in HIV lifecycle and host-virus interaction. Hence, using extensive molecular modeling and dynamics simulation for 200 ns, we predicted the plausible 3D-structures of ASP from two reference strains of HIV-1 namely, Indie-C1 (subtype-C) and NL4-3 (subtype-B) so as to derive its functional implication through structural domain analysis. In spite of sequence and structural differences in subtype B and C ASP, both structures appear to share common domains like the Von Willebrand Factor Domain-A (VWFA), Integrin subunit alpha-X (ITGSX), and ETV6-Transcriptional repressor, thereby reiterating the potential role of HIV-1 ASP in transcriptional repression and autophagy, as reported in earlier studies. Gromos-based cluster analysis of the centroid structures also reassured the accuracy of the prediction. This is the first study to elucidate a highly plausible structure for HIV-1 ASP which could serve as a feeder for further experimental validation studies.
Collapse
Affiliation(s)
- Balakumaran Sathiyamani
- Department of Virology and Biotechnology, National Institute for Research in Tuberculosis, Chennai, Tamil Nadu, India
- University of Madras, Chennai, India
| | - Evangeline Ann Daniel
- Department of Virology and Biotechnology, National Institute for Research in Tuberculosis, Chennai, Tamil Nadu, India
- University of Madras, Chennai, India
| | - Samdani Ansar
- Center for Bioinformatics, Vision Research Foundation, Sankara Nethralaya, Chennai, Tamil Nadu, India
| | - Bennett Henzeler Esakialraj
- Department of Virology and Biotechnology, National Institute for Research in Tuberculosis, Chennai, Tamil Nadu, India
| | - Sameer Hassan
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden
| | | | - Amrutha Keshavamurthy
- Department of Biotechnology, Siddaganga Institute of Technology, Tumakuru, Karnataka, India
| | - Sujata Roy
- Department of Biotechnology, Rajalakshmi Engineering College, Chennai, Tamil Nadu, India
| | - Umashankar Vetrivel
- Department of Virology and Biotechnology, National Institute for Research in Tuberculosis, Chennai, Tamil Nadu, India
- *Correspondence: Luke Elizabeth Hanna, ; Umashankar Vetrivel,
| | - Luke Elizabeth Hanna
- Department of Virology and Biotechnology, National Institute for Research in Tuberculosis, Chennai, Tamil Nadu, India
- *Correspondence: Luke Elizabeth Hanna, ; Umashankar Vetrivel,
| |
Collapse
|
11
|
Vasu K, Khan D, Ramachandiran I, Blankenberg D, Fox P. Analysis of nested alternate open reading frames and their encoded proteins. NAR Genom Bioinform 2022; 4:lqac076. [PMID: 36267124 PMCID: PMC9580016 DOI: 10.1093/nargab/lqac076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 08/14/2022] [Accepted: 09/27/2022] [Indexed: 11/22/2022] Open
Abstract
Transcriptional and post-transcriptional mechanisms diversify the proteome beyond gene number, while maintaining a sequence relationship between original and altered proteins. A new mechanism breaks this paradigm, generating novel proteins by translating alternative open reading frames (Alt-ORFs) within canonical host mRNAs. Uniquely, ‘alt-proteins’ lack sequence homology with host ORF-derived proteins. We show global amino acid frequencies, and consequent biochemical characteristics of Alt-ORFs nested within host ORFs (nAlt-ORFs), are genetically-driven, and predicted by summation of frequencies of hundreds of encompassing host codon-pairs. Analysis of 101 human nAlt-ORFs of length ≥150 codons confirms the theoretical predictions, revealing an extraordinarily high median isoelectric point (pI) of 11.68, due to anomalous charged amino acid levels. Also, nAlt-ORF proteins exhibit a >2-fold preference for reading frame 2 versus 3, predicted mitochondrial and nuclear localization, and elevated codon adaptation index indicative of natural selection. Our results provide a theoretical and conceptual framework for exploration of these largely unannotated, but potentially significant, alternative ORFs and their encoded proteins.
Collapse
Affiliation(s)
- Kommireddy Vasu
- Department of Cardiovascular and Metabolic Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Debjit Khan
- Department of Cardiovascular and Metabolic Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Iyappan Ramachandiran
- Department of Cardiovascular and Metabolic Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Daniel Blankenberg
- Correspondence may also be addressed to Daniel Blankenberg. Tel: +1 216 444 4336;
| | - Paul L Fox
- To whom correspondence should be addressed. Tel: +1 216 444 8053; Fax: +1 216 444 9404;
| |
Collapse
|
12
|
Muñoz-Baena L, Poon AFY. Using networks to analyze and visualize the distribution of overlapping genes in virus genomes. PLoS Pathog 2022; 18:e1010331. [PMID: 35202429 PMCID: PMC8903798 DOI: 10.1371/journal.ppat.1010331] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 03/08/2022] [Accepted: 02/02/2022] [Indexed: 11/19/2022] Open
Abstract
Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (−0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.
Collapse
Affiliation(s)
- Laura Muñoz-Baena
- Department of Microbiology and Immunology, Western University, London, ON, Canada
| | - Art F. Y. Poon
- Department of Microbiology and Immunology, Western University, London, ON, Canada
- Department of Pathology and Laboratory Medicine, Western University, London, ON, Canada
- * E-mail:
| |
Collapse
|
13
|
Riegger RJ, Caliskan N. Thinking Outside the Frame: Impacting Genomes Capacity by Programmed Ribosomal Frameshifting. Front Mol Biosci 2022; 9:842261. [PMID: 35281266 PMCID: PMC8915115 DOI: 10.3389/fmolb.2022.842261] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 01/26/2022] [Indexed: 01/08/2023] Open
Abstract
Translation facilitates the transfer of the genetic information stored in the genome via messenger RNAs to a functional protein and is therefore one of the most fundamental cellular processes. Programmed ribosomal frameshifting is a ubiquitous alternative translation event that is extensively used by viruses to regulate gene expression from overlapping open reading frames in a controlled manner. Recent technical advances in the translation field enabled the identification of precise mechanisms as to how and when ribosomes change the reading frame on mRNAs containing cis-acting signals. Several studies began also to illustrate that trans-acting RNA modulators can adjust the timing and efficiency of frameshifting illuminating that frameshifting can be a dynamically regulated process in cells. Here, we intend to summarize these new findings and emphasize how it fits in our current understanding of PRF mechanisms as previously described.
Collapse
Affiliation(s)
- Ricarda J. Riegger
- Helmholtz Centre for Infection Research (HZI), Helmholtz Institute for RNA-Based Infection Research (HIRI), Würzburg, Germany
- Graduate School of Life Sciences (GSLS), University of Würzburg, Würzburg, Germany
| | - Neva Caliskan
- Helmholtz Centre for Infection Research (HZI), Helmholtz Institute for RNA-Based Infection Research (HIRI), Würzburg, Germany
- Medical Faculty, University of Würzburg, Würzburg, Germany
- *Correspondence: Neva Caliskan,
| |
Collapse
|
14
|
Pavesi A, Romerio F. Extending the Coding Potential of Viral Genomes with Overlapping Antisense ORFs: A Case for the De Novo Creation of the Gene Encoding the Antisense Protein ASP of HIV-1. Viruses 2022; 14:v14010146. [PMID: 35062351 PMCID: PMC8781085 DOI: 10.3390/v14010146] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 01/11/2022] [Accepted: 01/12/2022] [Indexed: 02/04/2023] Open
Abstract
Gene overprinting occurs when point mutations within a genomic region with an existing coding sequence create a new one in another reading frame. This process is quite frequent in viral genomes either to maximize the amount of information that they encode or in response to strong selective pressure. The most frequent scenario involves two different reading frames in the same DNA strand (sense overlap). Much less frequent are cases of overlapping genes that are encoded on opposite DNA strands (antisense overlap). One such example is the antisense ORF, asp in the minus strand of the HIV-1 genome overlapping the env gene. The asp gene is highly conserved in pandemic HIV-1 strains of group M, and it is absent in non-pandemic HIV-1 groups, HIV-2, and lentiviruses infecting non-human primates, suggesting that the ~190-amino acid protein that is expressed from this gene (ASP) may play a role in virus spread. While the function of ASP in the virus life cycle remains to be elucidated, mounting evidence from several research groups indicates that ASP is expressed in vivo. There are two alternative hypotheses that could be envisioned to explain the origin of the asp ORF. On one hand, asp may have originally been present in the ancestor of contemporary lentiviruses, and subsequently lost in all descendants except for most HIV-1 strains of group M due to selective advantage. Alternatively, the asp ORF may have originated very recently with the emergence of group M HIV-1 strains from SIVcpz. Here, we used a combination of computational and statistical approaches to study the genomic region of env in primate lentiviruses to shed light on the origin, structure, and sequence evolution of the asp ORF. The results emerging from our studies support the hypothesis of a recent de novo addition of the antisense ORF to the HIV-1 genome through a process that entailed progressive removal of existing internal stop codons from SIV strains to HIV-1 strains of group M, and fine tuning of the codon sequence in env that reduced the chances of new stop codons occurring in asp. Altogether, the study supports the notion that the HIV-1 asp gene encodes an accessory protein, providing a selective advantage to the virus.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, 43124 Parma, Italy;
| | - Fabio Romerio
- Department of Molecular and Comparative Pathobiology, Johns Hopkins University School of Medicine, Baltimore, MD 21205-2196, USA
- Correspondence:
| |
Collapse
|
15
|
Abstract
Modern genome-scale methods that identify new genes, such as proteogenomics and ribosome profiling, have revealed, to the surprise of many, that overlap in genes, open reading frames and even coding sequences is widespread and functionally integrated into prokaryotic, eukaryotic and viral genomes. In parallel, the constraints that overlapping regions place on genome sequences and their evolution can be harnessed in bioengineering to build more robust synthetic strains and constructs. With a focus on overlapping protein-coding and RNA-coding genes, this Review examines their discovery, topology and biogenesis in the context of their genome biology. We highlight exciting new uses for sequence overlap to control translation, compress synthetic genetic constructs, and protect against mutation.
Collapse
|
16
|
Computational methods for inferring location and genealogy of overlapping genes in virus genomes: approaches and applications. Curr Opin Virol 2021; 52:1-8. [PMID: 34798370 PMCID: PMC8594276 DOI: 10.1016/j.coviro.2021.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Revised: 10/21/2021] [Accepted: 10/22/2021] [Indexed: 12/02/2022]
Abstract
Viruses may evolve to increase the amount of encoded genetic information by means of overlapping genes, which utilize several reading frames. Such overlapping genes may be especially impactful for genomes of small size, often serving a source of novel accessory proteins, some of which play a crucial role in viral pathogenicity or in promoting the systemic spread of virus. Diverse genome-based metrics were proposed to facilitate recognition of overlapping genes that otherwise may be overlooked during genome annotation. They can detect the atypical codon bias associated with the overlap (e.g. a statistically significant reduction in variability at synonymous sites) or other sequence-composition features peculiar to overlapping genes. In this review, I compare nine computational methods, discuss their strengths and limitations, and survey how they were applied to detect candidate overlapping genes in the genome of SARS-CoV-2, the etiological agent of COVID-19 pandemic.
Collapse
|
17
|
Chazal N. Coronavirus, the King Who Wanted More Than a Crown: From Common to the Highly Pathogenic SARS-CoV-2, Is the Key in the Accessory Genes? Front Microbiol 2021; 12:682603. [PMID: 34335504 PMCID: PMC8317507 DOI: 10.3389/fmicb.2021.682603] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 06/22/2021] [Indexed: 12/14/2022] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), that emerged in late 2019, is the etiologic agent of the current "coronavirus disease 2019" (COVID-19) pandemic, which has serious health implications and a significant global economic impact. Of the seven human coronaviruses, all of which have a zoonotic origin, the pandemic SARS-CoV-2, is the third emerging coronavirus, in the 21st century, highly pathogenic to the human population. Previous human coronavirus outbreaks (SARS-CoV-1 and MERS-CoV) have already provided several valuable information on some of the common molecular and cellular mechanisms of coronavirus infections as well as their origin. However, to meet the new challenge caused by the SARS-CoV-2, a detailed understanding of the biological specificities, as well as knowledge of the origin are crucial to provide information on viral pathogenicity, transmission and epidemiology, and to enable strategies for therapeutic interventions and drug discovery. Therefore, in this review, we summarize the current advances in SARS-CoV-2 knowledges, in light of pre-existing information of other recently emerging coronaviruses. We depict the specificity of the immune response of wild bats and discuss current knowledge of the genetic diversity of bat-hosted coronaviruses that promotes viral genome expansion (accessory gene acquisition). In addition, we describe the basic virology of coronaviruses with a special focus SARS-CoV-2. Finally, we highlight, in detail, the current knowledge of genes and accessory proteins which we postulate to be the major keys to promote virus adaptation to specific hosts (bat and human), to contribute to the suppression of immune responses, as well as to pathogenicity.
Collapse
Affiliation(s)
- Nathalie Chazal
- Institut de Recherche en Infectiologie de Montpellier (IRIM), Université de Montpellier, CNRS, Montpellier, France
| |
Collapse
|
18
|
Guerra-Almeida D, Tschoeke DA, da-Fonseca RN. Understanding small ORF diversity through a comprehensive transcription feature classification. DNA Res 2021; 28:6317669. [PMID: 34240112 PMCID: PMC8435553 DOI: 10.1093/dnares/dsab007] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Indexed: 11/13/2022] Open
Abstract
Small open reading frames (small ORFs/sORFs/smORFs) are potentially coding sequences smaller than 100 codons that have historically been considered junk DNA by gene prediction software and in annotation screening; however, the advent of next-generation sequencing has contributed to the deeper investigation of junk DNA regions and their transcription products, resulting in the emergence of smORFs as a new focus of interest in systems biology. Several smORF peptides were recently reported in noncanonical mRNAs as new players in numerous biological contexts; however, their relevance is still overlooked in coding potential analysis. Hence, this review proposes a smORF classification based on transcriptional features, discussing the most promising approaches to investigate smORFs based on their different characteristics. First, smORFs were divided into nonexpressed (intergenic) and expressed (genic) smORFs. Second, genic smORFs were classified as smORFs located in noncoding RNAs (ncRNAs) or canonical mRNAs. Finally, smORFs in ncRNAs were further subdivided into sequences located in small or long RNAs, whereas smORFs located in canonical mRNAs were subdivided into several specific classes depending on their localization along the gene. We hope that this review provides new insights into large-scale annotations and reinforces the role of smORFs as essential components of a hidden coding DNA world.
Collapse
Affiliation(s)
- Diego Guerra-Almeida
- Institute of Biodiversity and Sustainability, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Diogo Antonio Tschoeke
- Alberto Luiz Coimbra Institute of Graduate Studies and Engineering Research (COPPE), Biomedical Engineering Program, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Rodrigo Nunes- da-Fonseca
- Institute of Biodiversity and Sustainability, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.,National Institute of Science and Technology in Molecular Entomology, Rio de Janeiro, Brazil
| |
Collapse
|
19
|
Abstract
Narnaviruses are RNA viruses detected in diverse fungi, plants, protists, arthropods, and nematodes. Though initially described as simple single-gene nonsegmented viruses encoding RNA-dependent RNA polymerase (RdRp), a subset of narnaviruses referred to as "ambigrammatic" harbor a unique genomic configuration consisting of overlapping open reading frames (ORFs) encoded on opposite strands. Phylogenetic analysis supports selection to maintain this unusual genome organization, but functional investigations are lacking. Here, we establish the mosquito-infecting Culex narnavirus 1 (CxNV1) as a model to investigate the functional role of overlapping ORFs in narnavirus replication. In CxNV1, a reverse ORF without homology to known proteins covers nearly the entire 3.2-kb segment encoding the RdRp. Additionally, two opposing and nearly completely overlapping novel ORFs are found on the second putative CxNV1 segment, the 0.8-kb "Robin" RNA. We developed a system to launch CxNV1 in a naive mosquito cell line and then showed that functional RdRp is required for persistence of both segments, and an intact reverse ORF is required on the RdRp segment for persistence. Mass spectrometry of persistently CxNV1-infected cells provided evidence for translation of this reverse ORF. Finally, ribosome profiling yielded a striking pattern of footprints for all four CxNV1 RNA strands that was distinct from actively translating ribosomes on host mRNA or coinfecting RNA viruses. Taken together, these data raise the possibility that the process of translation itself is important for persistence of ambigrammatic narnaviruses, potentially by protecting viral RNA with ribosomes, thus suggesting a heretofore undescribed viral tactic for replication and transmission. IMPORTANCE Fundamental to our understanding of RNA viruses is a description of which strand(s) of RNA are transmitted as the viral genome relative to which encode the viral proteins. Ambigrammatic narnaviruses break the mold. These viruses, found broadly in fungi, plants, and insects, have the unique feature of two overlapping genes encoded on opposite strands, comprising nearly the full length of the viral genome. Such extensive overlap is not seen in other RNA viruses and comes at the cost of reduced evolutionary flexibility in the sequence. The present study is motivated by investigating the benefits which balance that cost. We show for the first time a functional requirement for the ambigrammatic genome configuration in Culex narnavirus 1, which suggests a model for how translation of both strands might benefit this virus. Our work highlights a new blueprint for viral persistence, distinct from strategies defined by canonical definitions of the coding strand.
Collapse
|
20
|
Pavesi A. Origin, Evolution and Stability of Overlapping Genes in Viruses: A Systematic Review. Genes (Basel) 2021; 12:genes12060809. [PMID: 34073395 PMCID: PMC8227390 DOI: 10.3390/genes12060809] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 05/22/2021] [Accepted: 05/24/2021] [Indexed: 12/11/2022] Open
Abstract
During their long evolutionary history viruses generated many proteins de novo by a mechanism called “overprinting”. Overprinting is a process in which critical nucleotide substitutions in a pre-existing gene can induce the expression of a novel protein by translation of an alternative open reading frame (ORF). Overlapping genes represent an intriguing example of adaptive conflict, because they simultaneously encode two proteins whose freedom to change is constrained by each other. However, overlapping genes are also a source of genetic novelties, as the constraints under which alternative ORFs evolve can give rise to proteins with unusual sequence properties, most importantly the potential for novel functions. Starting with the discovery of overlapping genes in phages infecting Escherichia coli, this review covers a range of studies dealing with detection of overlapping genes in small eukaryotic viruses (genomic length below 30 kb) and recognition of their critical role in the evolution of pathogenicity. Origin of overlapping genes, what factors favor their birth and retention, and how they manage their inherent adaptive conflict are extensively reviewed. Special attention is paid to the assembly of overlapping genes into ad hoc databases, suitable for future studies, and to the development of statistical methods for exploring viral genome sequences in search of undiscovered overlaps.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 23/A, I-43124 Parma, Italy
| |
Collapse
|
21
|
Gholizadeh Z, Iqbal MS, Li R, Romerio F. The HIV-1 Antisense Gene ASP: The New Kid on the Block. Vaccines (Basel) 2021; 9:vaccines9050513. [PMID: 34067514 PMCID: PMC8156140 DOI: 10.3390/vaccines9050513] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 05/04/2021] [Accepted: 05/13/2021] [Indexed: 01/14/2023] Open
Abstract
Viruses have developed incredibly creative ways of making a virtue out of necessity, including taking full advantage of their small genomes. Indeed, viruses often encode multiple proteins within the same genomic region by using two or more reading frames in both orientations through a process called overprinting. Complex retroviruses provide compelling examples of that. The human immunodeficiency virus type 1 (HIV-1) genome expresses sixteen proteins from nine genes that are encoded in the three positive-sense reading frames. In addition, the genome of some HIV-1 strains contains a tenth gene in one of the negative-sense reading frames. The so-called Antisense Protein (ASP) gene overlaps the HIV-1 Rev Response Element (RRE) and the envelope glycoprotein gene, and encodes a highly hydrophobic protein of ~190 amino acids. Despite being identified over thirty years ago, relatively few studies have investigated the role that ASP may play in the virus lifecycle, and its expression in vivo is still questioned. Here we review the current knowledge about ASP, and we discuss some of the many unanswered questions.
Collapse
|
22
|
Nelson CW, Ardern Z, Wei X. OLGenie: Estimating Natural Selection to Predict Functional Overlapping Genes. Mol Biol Evol 2021; 37:2440-2449. [PMID: 32243542 PMCID: PMC7531306 DOI: 10.1093/molbev/msaa087] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Purifying (negative) natural selection is a hallmark of functional biological sequences, and can be detected in protein-coding genes using the ratio of nonsynonymous to synonymous substitutions per site (dN/dS). However, when two genes overlap the same nucleotide sites in different frames, synonymous changes in one gene may be nonsynonymous in the other, perturbing dN/dS. Thus, scalable methods are needed to estimate functional constraint specifically for overlapping genes (OLGs). We propose OLGenie, which implements a modification of the Wei–Zhang method. Assessment with simulations and controls from viral genomes (58 OLGs and 176 non-OLGs) demonstrates low false-positive rates and good discriminatory ability in differentiating true OLGs from non-OLGs. We also apply OLGenie to the unresolved case of HIV-1’s putative antisense protein gene, showing significant purifying selection. OLGenie can be used to study known OLGs and to predict new OLGs in genome annotation. Software and example data are freely available at https://github.com/chasewnelson/OLGenie (last accessed April 10, 2020).
Collapse
Affiliation(s)
- Chase W Nelson
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY.,Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | - Zachary Ardern
- Microbial Ecology, ZIEL-Institute for Food & Health, Technische Universität München, Freising, Germany
| | - Xinzhu Wei
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI.,Department of Integrative Biology and Statistics, University of California, Berkeley, CA
| |
Collapse
|
23
|
Blanchard E, Longo G. From axiomatic systems to the Dogmatic gene and beyond. Biosystems 2021; 204:104396. [PMID: 33722644 DOI: 10.1016/j.biosystems.2021.104396] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 02/25/2021] [Accepted: 02/26/2021] [Indexed: 02/06/2023]
Abstract
The positivistic views that dominated the early debate on the foundations of mathematics, at the beginning of the 20th century, survived the "negative results" that have shown the limits of the axiomatic approach since the 1930s. Rigour, abstraction and symbolism have been confused with formalism, based on finite strings of signs, pre-given axioms, and potentially mechanisable rewriting rules. This contributed to major clarifications in the mathematical praxes but obscured the limits of formalisms due to the exclusion of the historical creation of sense proper to any science. We expand on this sometimes fruitful confusion with some case studies. We then hint to the historical creation of sense as a component of an epistemology of mathematics. We continue with an analogy with genocentric approaches in biology, as similar positivistic views resurfaced there fifty years later. Finite sequences of letters in the DNA would completely determine ontogenesis and phylogenesis, according to the Central Dogma of molecular biology. Limits and "negative evidence" have been disregarded while searching for the "gene for" everything. Alternative perspectives require a reconstruction of the sense of history as locus for the constitution of any object of biological knowledge. In particular, the historicity of biological evolution will be understood in terms of changing phase spaces and of the role of rare events in all phylogenetic trajectories. The analysis of the evolutionary production of variability, adaptivity and ecosystemic diversity is a key component of the project we hint to, as part of a renewed relation to the biological environment.
Collapse
Affiliation(s)
- Enka Blanchard
- Digitrust Consortium, Loria, Université de Lorraine, Nancy, France.
| | - Giuseppe Longo
- Centre Cavaillés, République des Savoirs, CNRS and École Normale Supérieure, Paris, France; School of Medicine, Tufts University, Boston, MA, USA
| |
Collapse
|
24
|
Carter CW. Simultaneous codon usage, the origin of the proteome, and the emergence of de-novo proteins. Curr Opin Struct Biol 2021; 68:142-148. [PMID: 33529785 DOI: 10.1016/j.sbi.2021.01.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 01/05/2021] [Indexed: 12/21/2022]
Abstract
Genetic coding generally uses only one of a gene's two strands; its complement serving as template for replication. Aminoacyl-tRNA synthetases, aaRS, apparently first emerged as pairs on bidirectional genes, in which anticodons in the template strand served as codons for an entirely different protein. Interpreting both strands in frame constrained such genes sufficiently that it was rapidly superseded, leaving only traces in the elevated pairing between codon middle bases in antiparallel alignments. Codon assignments actually promote using information from both strands in multiple reading frames. Related phenomena, known as overprinting, are widely associated with viruses. In-frame bidirectional coding and overprinting nevertheless imply different structural and functional relationships, and different roles in generating folded proteins throughout the evolution of the proteome.
Collapse
Affiliation(s)
- Charles W Carter
- Department of Biochemistry, Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7260, United States.
| |
Collapse
|
25
|
Wright BW, Ruan J, Molloy MP, Jaschke PR. Genome Modularization Reveals Overlapped Gene Topology Is Necessary for Efficient Viral Reproduction. ACS Synth Biol 2020; 9:3079-3090. [PMID: 33044064 DOI: 10.1021/acssynbio.0c00323] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Sequence overlap between two genes is common across all genomes, with viruses having high proportions of these gene overlaps. Genome modularization and refactoring is the process of disrupting natural gene overlaps to separate coding sequences to enable their individual manipulation. The biological function and fitness effects of gene overlaps are not fully understood, and their effects on gene cluster and genome-level refactoring are unknown. The bacteriophage φX174 genome has ∼26% of nucleotides involved in encoding more than one gene. In this study we use an engineered φX174 phage containing a genome with all gene overlaps removed to show that gene overlap is critical to maintaining optimal viral fecundity. Through detailed phenotypic measurements we reveal that genome modularization in φX174 causes virion replication, stability, and attachment deficiencies. Quantitation of the complete phage proteome across an infection cycle reveals 30% of proteins display abnormal expression patterns. Taken together, we have for the first time comprehensively demonstrated that gene modularization severely perturbs the coordinated functioning of a bacteriophage replication cycle. This work highlights the biological importance of gene overlap in natural genomes and that reducing gene overlap disruption should be an integral part of future genome engineering projects.
Collapse
Affiliation(s)
- Bradley W. Wright
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| | - Juanfang Ruan
- Electron Microscope Unit, Mark Wainwright Analytical Centre, The University of New South Wales, Sydney, NSW 2052, Australia
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW 2052, Australia
| | - Mark P. Molloy
- Kolling Institute, Northern Clinical School, The University of Sydney, Sydney, NSW 2006, Australia
| | - Paul R. Jaschke
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
26
|
Michel CJ, Mayer C, Poch O, Thompson JD. Characterization of accessory genes in coronavirus genomes. Virol J 2020; 17:131. [PMID: 32854725 PMCID: PMC7450977 DOI: 10.1186/s12985-020-01402-1] [Citation(s) in RCA: 110] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 08/16/2020] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND The Covid19 infection is caused by the SARS-CoV-2 virus, a novel member of the coronavirus (CoV) family. CoV genomes code for a ORF1a / ORF1ab polyprotein and four structural proteins widely studied as major drug targets. The genomes also contain a variable number of open reading frames (ORFs) coding for accessory proteins that are not essential for virus replication, but appear to have a role in pathogenesis. The accessory proteins have been less well characterized and are difficult to predict by classical bioinformatics methods. METHODS We propose a computational tool GOFIX to characterize potential ORFs in virus genomes. In particular, ORF coding potential is estimated by searching for enrichment in motifs of the X circular code, that is known to be over-represented in the reading frames of viral genes. RESULTS We applied GOFIX to study the SARS-CoV-2 and related genomes including SARS-CoV and SARS-like viruses from bat, civet and pangolin hosts, focusing on the accessory proteins. Our analysis provides evidence supporting the presence of overlapping ORFs 7b, 9b and 9c in all the genomes and thus helps to resolve some differences in current genome annotations. In contrast, we predict that ORF3b is not functional in all genomes. Novel putative ORFs were also predicted, including a truncated form of the ORF10 previously identified in SARS-CoV-2 and a little known ORF overlapping the Spike protein in Civet-CoV and SARS-CoV. CONCLUSIONS Our findings contribute to characterizing sequence properties of accessory genes of SARS coronaviruses, and especially the newly acquired genes making use of overlapping reading frames.
Collapse
Affiliation(s)
- Christian Jean Michel
- Laboratoire ICube, Department of Computer Science, CNRS, University of Strasbourg, F-67412, Strasbourg, France
| | - Claudine Mayer
- Laboratoire ICube, Department of Computer Science, CNRS, University of Strasbourg, F-67412, Strasbourg, France.,Unité de Microbiologie Structurale, Institut Pasteur, CNRS UMR 3528, 75724, Paris Cedex 15, France.,Université Paris Diderot, Sorbonne Paris Cité, 75724, Paris Cedex 15, France
| | - Olivier Poch
- Laboratoire ICube, Department of Computer Science, CNRS, University of Strasbourg, F-67412, Strasbourg, France
| | - Julie Dawn Thompson
- Laboratoire ICube, Department of Computer Science, CNRS, University of Strasbourg, F-67412, Strasbourg, France.
| |
Collapse
|
27
|
Ardern Z, Neuhaus K, Scherer S. Are Antisense Proteins in Prokaryotes Functional? Front Mol Biosci 2020; 7:187. [PMID: 32923454 PMCID: PMC7457138 DOI: 10.3389/fmolb.2020.00187] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Accepted: 07/16/2020] [Indexed: 12/16/2022] Open
Abstract
Many prokaryotic RNAs are transcribed from loci outside of annotated protein coding genes. Across bacterial species hundreds of short open reading frames antisense to annotated genes show evidence of both transcription and translation, for instance in ribosome profiling data. Determining the functional fraction of these protein products awaits further research, including insights from studies of molecular interactions and detailed evolutionary analysis. There are multiple lines of evidence, however, that many of these newly discovered proteins are of use to the organism. Condition-specific phenotypes have been characterized for a few. These proteins should be added to genome annotations, and the methods for predicting them standardized. Evolutionary analysis of these typically young sequences also may provide important insights into gene evolution. This research should be prioritized for its exciting potential to uncover large numbers of novel proteins with extremely diverse potential practical uses, including applications in synthetic biology and responding to pathogens.
Collapse
Affiliation(s)
- Zachary Ardern
- Chair for Microbial Ecology, Technical University of Munich, Munich, Germany
| | | | | |
Collapse
|
28
|
Pavesi A. New insights into the evolutionary features of viral overlapping genes by discriminant analysis. Virology 2020; 546:51-66. [PMID: 32452417 PMCID: PMC7157939 DOI: 10.1016/j.virol.2020.03.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Accepted: 03/29/2020] [Indexed: 12/18/2022]
Abstract
Overlapping genes originate by a mechanism of overprinting, in which nucleotide substitutions in a pre-existing frame induce the expression of a de novo protein from an alternative frame. In this study, I assembled a dataset of 319 viral overlapping genes, which included 82 overlaps whose expression is experimentally known and the respective 237 homologs. Principal component analysis revealed that overlapping genes have a common pattern of nucleotide and amino acid composition. Discriminant analysis separated overlapping from non-overlapping genes with an accuracy of 97%. When applied to overlapping genes with known genealogy, it separated ancestral from de novo frames with an accuracy close to 100%. This high discriminant power was crucial to computationally design variants of de novo viral proteins known to possess selective anticancer toxicity (apoptin) or protection against neurodegeneration (X protein), as well as to detect two new potential overlapping genes in the genome of the new coronavirus SARS-CoV-2.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area Delle Scienze 23/A, I-43124, Parma, Italy.
| |
Collapse
|
29
|
Dinan AM, Lukhovitskaya NI, Olendraite I, Firth AE. A case for a negative-strand coding sequence in a group of positive-sense RNA viruses. Virus Evol 2020; 6:veaa007. [PMID: 32064120 PMCID: PMC7010960 DOI: 10.1093/ve/veaa007] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Positive-sense single-stranded RNA viruses form the largest and most diverse group of eukaryote-infecting viruses. Their genomes comprise one or more segments of coding-sense RNA that function directly as messenger RNAs upon release into the cytoplasm of infected cells. Positive-sense RNA viruses are generally accepted to encode proteins solely on the positive strand. However, we previously identified a surprisingly long (∼1,000-codon) open reading frame (ORF) on the negative strand of some members of the family Narnaviridae which, together with RNA bacteriophages of the family Leviviridae, form a sister group to all other positive-sense RNA viruses. Here, we completed the genomes of three mosquito-associated narnaviruses, all of which have the long reverse-frame ORF. We systematically identified narnaviral sequences in public data sets from a wide range of sources, including arthropod, fungal, and plant transcriptomic data sets. Long reverse-frame ORFs are widespread in one clade of narnaviruses, where they frequently occupy >95 per cent of the genome. The reverse-frame ORFs correspond to a specific avoidance of CUA, UUA, and UCA codons (i.e. stop codon reverse complements) in the forward-frame RNA-dependent RNA polymerase ORF. However, absence of these codons cannot be explained by other factors such as inability to decode these codons or GC3 bias. Together with other analyses, we provide the strongest evidence yet of coding capacity on the negative strand of a positive-sense RNA virus. As these ORFs comprise some of the longest known overlapping genes, their study may be of broad relevance to understanding overlapping gene evolution and de novo origin of genes.
Collapse
Affiliation(s)
- Adam M Dinan
- Division of Virology, Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, UK
| | - Nina I Lukhovitskaya
- Division of Virology, Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, UK
| | - Ingrida Olendraite
- Division of Virology, Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, UK
| | - Andrew E Firth
- Division of Virology, Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, UK
| |
Collapse
|
30
|
Gibbs AJ, Hajizadeh M, Ohshima K, Jones RA. The Potyviruses: An Evolutionary Synthesis Is Emerging. Viruses 2020; 12:E132. [PMID: 31979056 PMCID: PMC7077269 DOI: 10.3390/v12020132] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 01/16/2020] [Accepted: 01/20/2020] [Indexed: 12/28/2022] Open
Abstract
In this review, encouraged by the dictum of Theodosius Dobzhansky that "Nothing in biology makes sense except in the light of evolution", we outline the likely evolutionary pathways that have resulted in the observed similarities and differences of the extant molecules, biology, distribution, etc. of the potyvirids and, especially, its largest genus, the potyviruses. The potyvirids are a family of plant-infecting RNA-genome viruses. They had a single polyphyletic origin, and all share at least three of their genes (i.e., the helicase region of their CI protein, the RdRp region of their NIb protein and their coat protein) with other viruses which are otherwise unrelated. Potyvirids fall into 11 genera of which the potyviruses, the largest, include more than 150 distinct viruses found worldwide. The first potyvirus probably originated 15,000-30,000 years ago, in a Eurasian grass host, by acquiring crucial changes to its coat protein and HC-Pro protein, which enabled it to be transmitted by migrating host-seeking aphids. All potyviruses are aphid-borne and, in nature, infect discreet sets of monocotyledonous or eudicotyledonous angiosperms. All potyvirus genomes are under negative selection; the HC-Pro, CP, Nia, and NIb genes are most strongly selected, and the PIPO gene least, but there are overriding virus specific differences; for example, all turnip mosaic virus genes are more strongly conserved than those of potato virus Y. Estimates of dN/dS (ω) indicate whether potyvirus populations have been evolving as one or more subpopulations and could be used to help define species boundaries. Recombinants are common in many potyvirus populations (20%-64% in five examined), but recombination seems to be an uncommon speciation mechanism as, of 149 distinct potyviruses, only two were clear recombinants. Human activities, especially trade and farming, have fostered and spread both potyviruses and their aphid vectors throughout the world, especially over the past five centuries. The world distribution of potyviruses, especially those found on islands, indicates that potyviruses may be more frequently or effectively transmitted by seed than experimental tests suggest. Only two meta-genomic potyviruses have been recorded from animal samples, and both are probably contaminants.
Collapse
Affiliation(s)
- Adrian J. Gibbs
- Emeritus Faculty, Australian National University, Canberra, ACT 2601, Australia
| | - Mohammad Hajizadeh
- Department of Plant Protection, Faculty of Agriculture, University of Kurdistan, P.O. Box 416, Sanandaj, Iran
| | - Kazusato Ohshima
- Laboratory of Plant Virology, Department of Applied Biological Sciences, Faculty of Agriculture, Saga University, 1-banchi, Honjo-machi, Saga 840-8502, Japan;
- The United Graduate School of Agricultural Sciences, Kagoshima University, 1-21-2410 Korimoto, Kagoshima 890-0065, Japan
| | - Roger A.C. Jones
- Institute of Agriculture, University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia
| |
Collapse
|
31
|
Affram Y, Zapata JC, Gholizadeh Z, Tolbert WD, Zhou W, Iglesias-Ussel MD, Pazgier M, Ray K, Latinovic OS, Romerio F. The HIV-1 Antisense Protein ASP Is a Transmembrane Protein of the Cell Surface and an Integral Protein of the Viral Envelope. J Virol 2019; 93:e00574-19. [PMID: 31434734 PMCID: PMC6803264 DOI: 10.1128/jvi.00574-19] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 08/14/2019] [Indexed: 12/13/2022] Open
Abstract
The negative strand of HIV-1 encodes a highly hydrophobic antisense protein (ASP) with no known homologs. The presence of humoral and cellular immune responses to ASP in HIV-1 patients indicates that ASP is expressed in vivo, but its role in HIV-1 replication remains unknown. We investigated ASP expression in multiple chronically infected myeloid and lymphoid cell lines using an anti-ASP monoclonal antibody (324.6) in combination with flow cytometry and microscopy approaches. At baseline and in the absence of stimuli, ASP shows polarized subnuclear distribution, preferentially in areas with low content of suppressive epigenetic marks. However, following treatment with phorbol 12-myristate 13-acetate (PMA), ASP translocates to the cytoplasm and is detectable on the cell surface, even in the absence of membrane permeabilization, indicating that 324.6 recognizes an ASP epitope that is exposed extracellularly. Further, surface staining with 324.6 and anti-gp120 antibodies showed that ASP and gp120 colocalize, suggesting that ASP might become incorporated in the membranes of budding virions. Indeed, fluorescence correlation spectroscopy studies showed binding of 324.6 to cell-free HIV-1 particles. Moreover, 324.6 was able to capture and retain HIV-1 virions with efficiency similar to that of the anti-gp120 antibody VRC01. Our studies indicate that ASP is an integral protein of the plasma membranes of chronically infected cells stimulated with PMA, and upon viral budding, ASP becomes a structural protein of the HIV-1 envelope. These results may provide leads to investigate the possible role of ASP in the virus replication cycle and suggest that ASP may represent a new therapeutic or vaccine target.IMPORTANCE The HIV-1 genome contains a gene expressed in the opposite, or antisense, direction to all other genes. The protein product of this antisense gene, called ASP, is poorly characterized, and its role in viral replication remains unknown. We provide evidence that the antisense protein, ASP, of HIV-1 is found within the cell nucleus in unstimulated cells. In addition, we show that after PMA treatment, ASP exits the nucleus and localizes on the cell membrane. Moreover, we demonstrate that ASP is present on the surfaces of viral particles. Altogether, our studies identify ASP as a new structural component of HIV-1 and show that ASP is an accessory protein that promotes viral replication. The presence of ASP on the surfaces of both infected cells and viral particles might be exploited therapeutically.
Collapse
Affiliation(s)
- Yvonne Affram
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Juan C Zapata
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Zahra Gholizadeh
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - William D Tolbert
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Wei Zhou
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Maria D Iglesias-Ussel
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Marzena Pazgier
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Krishanu Ray
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Olga S Latinovic
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Fabio Romerio
- Institute of Human Virology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
32
|
Pavesi A. Asymmetric evolution in viral overlapping genes is a source of selective protein adaptation. Virology 2019; 532:39-47. [PMID: 31004987 PMCID: PMC7125799 DOI: 10.1016/j.virol.2019.03.017] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Revised: 03/25/2019] [Accepted: 03/26/2019] [Indexed: 12/29/2022]
Abstract
Overlapping genes represent an intriguing puzzle, as they encode two proteins whose ability to evolve is constrained by each other. Overlapping genes can undergo “symmetric evolution” (similar selection pressures on the two proteins) or “asymmetric evolution” (significantly different selection pressures on the two proteins). By sequence analysis of 75 pairs of homologous viral overlapping genes, I evaluated their accordance with one or the other model. Analysis of nucleotide and amino acid sequences revealed that half of overlaps undergo asymmetric evolution, as the protein from one frame shows a number of substitutions significantly higher than that of the protein from the other frame. Interestingly, the most variable protein (often known to interact with the host proteins) appeared to be encoded by the de novo frame in all cases examined. These findings suggest that overlapping genes, besides to increase the coding ability of viruses, are also a source of selective protein adaptation. A dataset of 80 pairs of homologous overlapping genes from viruses is examined. Its analysis reveals that half of overlapping genes undergo asymmetric evolution. The most variable gene product is that encoded by the de novo overlapping gene. Overlapping genes evolving asymmetrically are a source of selective protein adaptation.
Collapse
Affiliation(s)
- Angelo Pavesi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 11/A, I-43124, Parma, Italy.
| |
Collapse
|