1
|
Romerio F. Origin and functional role of antisense transcription in endogenous and exogenous retroviruses. Retrovirology 2023; 20:6. [PMID: 37194028 DOI: 10.1186/s12977-023-00622-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 04/30/2023] [Indexed: 05/18/2023] Open
Abstract
Most proteins expressed by endogenous and exogenous retroviruses are encoded in the sense (positive) strand of the genome and are under the control of regulatory elements within the 5' long terminal repeat (LTR). A number of retroviral genomes also encode genes in the antisense (negative) strand and their expression is under the control of negative sense promoters within the 3' LTR. In the case of the Human T-cell Lymphotropic Virus 1 (HTLV-1), the antisense protein HBZ has been shown to play a critical role in the virus lifecycle and in the pathogenic process, while the function of the Human Immunodeficiency Virus 1 (HIV-1) antisense protein ASP remains unknown. However, the expression of 3' LTR-driven antisense transcripts is not always demonstrably associated with the presence of an antisense open reading frame encoding a viral protein. Moreover, even in the case of retroviruses that do express an antisense protein, such as HTLV-1 and the pandemic strains of HIV-1, the 3' LTR-driven antisense transcript shows both protein-coding and noncoding activities. Indeed, the ability to express antisense transcripts appears to be phylogenetically more widespread among endogenous and exogenous retroviruses than the presence of a functional antisense open reading frame within these transcripts. This suggests that retroviral antisense transcripts may have originated as noncoding molecules with regulatory activity that in some cases later acquired protein-coding function. Here, we will review examples of endogenous and exogenous retroviral antisense transcripts, and the ways through which they benefit viral persistence in the host.
Collapse
Affiliation(s)
- Fabio Romerio
- Department of Molecular and Comparative Pathobiology, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
2
|
Gholizadeh Z, Iqbal MS, Li R, Romerio F. The HIV-1 Antisense Gene ASP: The New Kid on the Block. Vaccines (Basel) 2021; 9:vaccines9050513. [PMID: 34067514 PMCID: PMC8156140 DOI: 10.3390/vaccines9050513] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 05/04/2021] [Accepted: 05/13/2021] [Indexed: 01/14/2023] Open
Abstract
Viruses have developed incredibly creative ways of making a virtue out of necessity, including taking full advantage of their small genomes. Indeed, viruses often encode multiple proteins within the same genomic region by using two or more reading frames in both orientations through a process called overprinting. Complex retroviruses provide compelling examples of that. The human immunodeficiency virus type 1 (HIV-1) genome expresses sixteen proteins from nine genes that are encoded in the three positive-sense reading frames. In addition, the genome of some HIV-1 strains contains a tenth gene in one of the negative-sense reading frames. The so-called Antisense Protein (ASP) gene overlaps the HIV-1 Rev Response Element (RRE) and the envelope glycoprotein gene, and encodes a highly hydrophobic protein of ~190 amino acids. Despite being identified over thirty years ago, relatively few studies have investigated the role that ASP may play in the virus lifecycle, and its expression in vivo is still questioned. Here we review the current knowledge about ASP, and we discuss some of the many unanswered questions.
Collapse
|
3
|
Li R, Sklutuis R, Groebner JL, Romerio F. HIV-1 Natural Antisense Transcription and Its Role in Viral Persistence. Viruses 2021; 13:v13050795. [PMID: 33946840 PMCID: PMC8145503 DOI: 10.3390/v13050795] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 04/26/2021] [Accepted: 04/27/2021] [Indexed: 12/11/2022] Open
Abstract
Natural antisense transcripts (NATs) represent a class of RNA molecules that are transcribed from the opposite strand of a protein-coding gene, and that have the ability to regulate the expression of their cognate protein-coding gene via multiple mechanisms. NATs have been described in many prokaryotic and eukaryotic systems, as well as in the viruses that infect them. The human immunodeficiency virus (HIV-1) is no exception, and produces one or more NAT from a promoter within the 3’ long terminal repeat. HIV-1 antisense transcripts have been the focus of several studies spanning over 30 years. However, a complete appreciation of the role that these transcripts play in the virus lifecycle is still lacking. In this review, we cover the current knowledge about HIV-1 NATs, discuss some of the questions that are still open and identify possible areas of future research.
Collapse
Affiliation(s)
- Rui Li
- Department of Molecular and Comparative Pathobiology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA;
| | - Rachel Sklutuis
- HIV Dynamics and Replication Program, Host-Virus Interaction Branch, National Cancer Institute, National Institutes of Health, Frederick, MD 21702, USA; (R.S.); (J.L.G.)
| | - Jennifer L. Groebner
- HIV Dynamics and Replication Program, Host-Virus Interaction Branch, National Cancer Institute, National Institutes of Health, Frederick, MD 21702, USA; (R.S.); (J.L.G.)
| | - Fabio Romerio
- Department of Molecular and Comparative Pathobiology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA;
- Correspondence:
| |
Collapse
|
4
|
Garcia-Mazcorro JF, Barcenas-Walls JR. Thinking beside the box: Should we care about the non-coding strand of the 16S rRNA gene? FEMS Microbiol Lett 2016; 363:fnw171. [DOI: 10.1093/femsle/fnw171] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/08/2016] [Indexed: 12/22/2022] Open
|
5
|
Merino E, Balbás P, Puente JL, Bolívar F. Antisense overlapping open reading frames in genes from bacteria to humans. Nucleic Acids Res 1994; 22:1903-8. [PMID: 8208617 PMCID: PMC308092 DOI: 10.1093/nar/22.10.1903] [Citation(s) in RCA: 60] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Long Open Reading Frames (ORFs) in antisense DNA strands have been reported in the literature as being rare events. However, an extensive analysis of the GenBank database revealed that a substantial number of genes from several species contain an in-phase ORF in the antisense strand, that overlaps entirely the coding sequence of the sense strand, or even extends beyond. The findings described in this paper show that this is a frequent, non-random phenomenon, which is primarily dependent on codon usage, and to a lesser extent on gene size and GC content. Examination of the sequence database for several prokaryotic and eukaryotic organisms, demonstrates that coding sequences with in-phase, 100% overlapping antisense ORFs are present in every genome studied so far.
Collapse
Affiliation(s)
- E Merino
- Departamento de Biología Molecular, Universidad Nacional Autónoma de Mexico, Cuernavaca
| | | | | | | |
Collapse
|
6
|
Abstract
Many protein families are common to all cellular organisms, indicating that many genes have ancient origins. Genetic variation is mostly attributed to processes such as mutation, duplication, and rearrangement of ancient modules. Thus it is widely assumed that much of present-day genetic diversity can be traced by common ancestry to a molecular "big bang." A rarely considered alternative is that proteins may arise continuously de novo. One mechanism of generating different coding sequences is by "overprinting," in which an existing nucleotide sequence is translated de novo in a different reading frame or from noncoding open reading frames. The clearest evidence for overprinting is provided when the original gene function is retained, as in overlapping genes. Analysis of their phylogenies indicates which are the original genes and which are their informationally novel partners. We report here the phylogenetic relationships of overlapping coding sequences from steroid-related receptor genes and from tymovirus, luteovirus, and lentivirus genomes. For each pair of overlapping coding sequences, one is confined to a single lineage, whereas the other is more widespread. This suggests that the phylogenetically restricted coding sequence arose only in the progenitor of that lineage by translating an out-of-frame sequence to yield the new polypeptide. The production of novel exons by alternative splicing in thyroid receptor and lentivirus genes suggests that introns can be a valuable evolutionary source for overprinting. New genes and their products may drive major evolutionary changes.
Collapse
Affiliation(s)
- P K Keese
- Commonwealth Scientific and Industrial Organisation, Division of Plant Industry, Australian National University, Canberra
| | | |
Collapse
|
7
|
Santoro M, Scarlato V, Franzé A, Grau O, Cipollaro M, Gargano S, Bova R, Micheli MR, Storlazzi A, Cascino A. Symmetric transcription of bacteriophage T4 base plate genes. Gene X 1988; 72:241-5. [PMID: 2468563 DOI: 10.1016/0378-1119(88)90149-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Dot-blot and Northern-blot experiments, using strand-specific RNA probes, show that part of the bacteriophage T4 DNA that codes for six of the base plate structural genes (gp 51, 27, 28, 29, 48 and 54), is transcribed in vivo from both DNA strands. The r DNA strand transcripts contain sequences which are translated into structural proteins. Antisense l strand RNA is about 100 fold less abundant than RNA molecules transcribed from the r DNA strand.
Collapse
Affiliation(s)
- M Santoro
- International Institute of Genetics and Biophysics, Naples, Italy
| | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Abstract
The genome of the human immunodeficiency virus (HIV) is known to contain eight open reading frames (ORFs) on the minus strand of the double-stranded DNA replicative intermediate. Data presented here indicate that the DNA plus strand of HIV contains a previously unidentified ORF in a region complementary to the envelope gene sequence. This ORF could encode a protein of approximately 190 amino acid residues with a relative molecular mass of 20 kilodaltons if translation began from the first initiation codon. The predicted protein is highly hydrophobic and thus could be membrane associated. It is possible, therefore, that the HIV genome encodes a protein on antisense messenger RNA.
Collapse
Affiliation(s)
- R H Miller
- Hepatitis Viruses Section, National Institute of Allergy and Infectious Diseases, Bethesda, MD 20892
| |
Collapse
|
9
|
Abstract
The hypothesis that DNA strands complementary to the coding strand contain in phase coding sequences has been investigated. Statistical analysis of the 50 genes of bacteriophage T7 shows no significant correlation between patterns of codon usage on the coding and non-coding strands. In Bacillus and yeast genes the correlation observed is not different from that expected with random synonymous codon usage, while a high correlation seen in 52 E. coli genes can be explained in terms of an excess of RNY codons. A deficiency of UUA, CUA and UCA codons (complementary to termination) seems to be restricted to the E. coli genes, and may be due to low abundance of the relevant cognate tRNA species. Thus the analysis shows that the non-coding strand has the properties expected of a sequence complementary to a coding strand, with no indications that it encodes, or may have encoded, proteins.
Collapse
|
10
|
Tramontano A, Scarlato V, Barni N, Cipollaro M, Franzè A, Macchiato MF, Cascino A. Statistical evaluation of the coding capacity of complementary DNA strands. Nucleic Acids Res 1984; 12:5049-59. [PMID: 6547531 PMCID: PMC318899 DOI: 10.1093/nar/12.12.5049] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Two independent methods are used to evaluate the protein-coding information content in different classes of DNA sequences. The first method allows to evaluate the statistical relevance of finding unidentified reading frames, longer than 100 codons, on both DNA strands of: a) 117 DNA sequences that code for 142 nuclear proteins; b) 39 stable RNA coding sequences and c) 36 other DNA sequences which include regulatory and as yet unknown function sequences. The finding of 50 reading frames longer than 100 codons (complementary inverted proteins or c.i.p. genes) located on the DNA strand complementary to the protein-coding one is drastically in excess of the number predicted by chance alone. An independent method (testcode) applied to c.i.p. gene sequences, which assigns the probability of coding to a given sequence, predicts that more than 50% of these genes are translated in a functional product. These analyses indicate the existence of a new class of protein-coding genes, located on the DNA sequences complementary to the protein-coding DNA strand.
Collapse
|
11
|
Pierno G, Barni N, Candurro M, Cipollaro M, Franzè A, Juliano L, Macchiato MF, Mastrocinque G, Moscatelli C, Scarlato V. Computer programs for the characterization of protein coding genes. Nucleic Acids Res 1984; 12:281-5. [PMID: 6546420 PMCID: PMC321004 DOI: 10.1093/nar/12.1part1.281] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Computer programs, implemented on an Univac II00/80 computer system, for the identification and characterization of protein coding genes and for the analysis of nucleic acid sequences, are described.
Collapse
|
12
|
A statistical method for predicting alpha-helical and beta-sheet regions in proteins from their amino acidic sequences. ACTA ACUST UNITED AC 1984. [DOI: 10.1007/bf02457469] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
13
|
DNA sequence analysis of a mouse pro alpha 1 (I) procollagen gene: evidence for a mouse B1 element within the gene. Mol Cell Biol 1983. [PMID: 6298597 DOI: 10.1128/mcb.2.11.1362] [Citation(s) in RCA: 26] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
In a 3.8-kilobase mouse DNA sequence encoding amino acid sequences for the pro alpha 1(I) chain of type I procollagen, 14 coding sequences were identified which specify a sequence 95% homologous to amino acid residues 568 to 963 of the bovine alpha 1(I) chain. All of these coding sequences were flanked by appropriate splice junctions following the GT/AG rule. These observations suggest, but do not prove, that this pro alpha 1(I) gene is transcriptionally active. Of the 14 coding sequences, 7 were 54 base pairs in length, whereas the remainder were higher multiples of 54 base pairs. Nonrandom utilization of codons pertained throughout all of the coding sequences showing a preference (56%) for U in the wobble position. Two of the intervening sequences encoded imperfect vestiges of coding sequences which exhibited a codon preference different from that of the pro alpha 1(I) gene proper and were not flanked by splice junctions. One intervening sequence encoded a member of the mouse B1 family of middle repetitive sequences. It was flanked by 8-base-pair direct repeats and had a truncated A-rich region, suggesting that it may be a mobile element. Within this element were sequences which could function as a RNA polymerase III split promoter.
Collapse
|
14
|
Cornelissen BJ, Brederode FT, Moormann RJ, Bol JF. Complete nucleotide sequence of alfalfa mosaic virus RNA 1. Nucleic Acids Res 1983; 11:1253-65. [PMID: 6298738 PMCID: PMC325794 DOI: 10.1093/nar/11.5.1253] [Citation(s) in RCA: 52] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Double-stranded cDNA of alfalfa mosaic virus (AlMV) RNA 1 has been cloned and sequenced. From clones with overlapping inserts, and other sequence data, the complete primary sequence of the 3644 nucleotides of RNA 1 was deduced: a long open reading frame for a protein of Mr 125,685 is flanked by a 5'-terminal sequence of 100 nucleotides and a 3' noncoding region of 163 nucleotides, including the sequence of 145 nucleotides the three genomic RNAs of AlMV have in common. The two UGA-termination codons halfway RNA 1, that were postulated by Van Tol et al. (FEBS Lett. 118, 67-71, 1980) to account for partial translation of RNA 1 in vitro into Mr 58,000 and Mr 62,000 proteins, were not found in the reading frame of the Mr 125,685 protein.
Collapse
|
15
|
Monson JM, Friedman J, McCarthy BJ. DNA sequence analysis of a mouse pro alpha 1 (I) procollagen gene: evidence for a mouse B1 element within the gene. Mol Cell Biol 1982; 2:1362-71. [PMID: 6298597 PMCID: PMC369941 DOI: 10.1128/mcb.2.11.1362-1371.1982] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
In a 3.8-kilobase mouse DNA sequence encoding amino acid sequences for the pro alpha 1(I) chain of type I procollagen, 14 coding sequences were identified which specify a sequence 95% homologous to amino acid residues 568 to 963 of the bovine alpha 1(I) chain. All of these coding sequences were flanked by appropriate splice junctions following the GT/AG rule. These observations suggest, but do not prove, that this pro alpha 1(I) gene is transcriptionally active. Of the 14 coding sequences, 7 were 54 base pairs in length, whereas the remainder were higher multiples of 54 base pairs. Nonrandom utilization of codons pertained throughout all of the coding sequences showing a preference (56%) for U in the wobble position. Two of the intervening sequences encoded imperfect vestiges of coding sequences which exhibited a codon preference different from that of the pro alpha 1(I) gene proper and were not flanked by splice junctions. One intervening sequence encoded a member of the mouse B1 family of middle repetitive sequences. It was flanked by 8-base-pair direct repeats and had a truncated A-rich region, suggesting that it may be a mobile element. Within this element were sequences which could function as a RNA polymerase III split promoter.
Collapse
|
16
|
van Vloten-Doting L, Dubelaar M, Bol JF. Open reading frame in the minus strand of two plus type RNA viruses. PLANT MOLECULAR BIOLOGY 1982; 1:155-158. [PMID: 24317896 DOI: 10.1007/bf00024978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/1981] [Revised: 01/29/1982] [Indexed: 06/02/2023]
Abstract
: Inspection of the nucleotide sequences of the RNAs complementary to the coat protein mRNAs from two plant viruses with a tripartite genome: alfalfa mosaic virus and brome mosaic virus, showed the presence of open reading frames for 138 and 118 amino acids, respectively. A third virus (cowpea chlorotic mottle virus) from the same family (1) does not show this phenomenon. This suggests that if a protein is coded for by the open reading frames it may be not essential for virus multiplication. Alternatively the open reading frames have no coding function but result from structural requirements of the RNAs.
Collapse
Affiliation(s)
- L van Vloten-Doting
- Department of Biochemistry, University of Leiden, P. O. Box 9505, 2300 RA, Leiden, The Netherlands
| | | | | |
Collapse
|
17
|
|
18
|
Amaldi F, Beccari E, Bozzoni I, Luo ZX, Pierandrei-Amaldi P. Nucleotide sequences of cloned cDNA fragments specific for six Xenopus laevis ribosomal proteins. Gene X 1982; 17:311-6. [PMID: 7049839 DOI: 10.1016/0378-1119(82)90147-0] [Citation(s) in RCA: 52] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
We have previously constructed and selected six recombinant plasmids containing cDNA sequences specific for different ribosomal proteins of Xenopus laevis (Bozzoni et al., 1981). DNA cloned in these plasmids have been isolated and sequenced. Amino acid sequences of the corresponding portions of the proteins have been derived from DNA sequences; they are arginine- and lysine-rich as expected for ribosomal proteins. One of the cDNA sequences has an open reading frame also on the strand complementary to the one coding for the ribosomal protein; this fragment has inverted repeats twenty nucleotides lone at the two ends. The codon usage for the six sequences appears to be non-random with some differences among the ribosomal proteins analysed.
Collapse
|
19
|
Kröger M, Kröger-Block A. A flexible new computer program for handling DNA sequence data. Nucleic Acids Res 1982; 10:229-36. [PMID: 6278406 PMCID: PMC326129 DOI: 10.1093/nar/10.1.229] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
A compact new computer program for handling nucleic acid sequence data is presented. It consists of a number of different subsets, which may be used according to a given code system. The program is designed for the determination of restriction enzyme and other recognition sites in correlation with translation patterns, and allows tabulation of codon frequencies and protein molecular weights within specified gene boundaries. The program is especially designed for detection of overlapping genes. The language, is FORTRAN and thus the program may be used on small computers; it may also be used without any prior computer experience. Copies are available on request.
Collapse
|
20
|
Bacteriophage T4 infection mechanisms. ACTA ACUST UNITED AC 1982. [DOI: 10.1016/b978-0-444-80400-6.50013-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
21
|
|