1
|
Volkening JD, Spatz SJ, Ponnuraj N, Akbar H, Arrington JV, Vega-Rodriguez W, Jarosinski KW. Viral proteogenomic and expression profiling during productive replication of a skin-tropic herpesvirus in the natural host. PLoS Pathog 2023; 19:e1011204. [PMID: 37289833 DOI: 10.1371/journal.ppat.1011204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 05/29/2023] [Indexed: 06/10/2023] Open
Abstract
Efficient transmission of herpesviruses is essential for dissemination in host populations; however, little is known about the viral genes that mediate transmission, mostly due to a lack of natural virus-host model systems. Marek's disease is a devastating herpesviral disease of chickens caused by Marek's disease virus (MDV) and an excellent natural model to study skin-tropic herpesviruses and transmission. Like varicella zoster virus that causes chicken pox in humans, the only site where infectious cell-free MD virions are efficiently produced is in epithelial skin cells, a requirement for host-to-host transmission. Here, we enriched for heavily infected feather follicle epithelial skin cells of live chickens to measure both viral transcription and protein expression using combined short- and long-read RNA sequencing and LC/MS-MS bottom-up proteomics. Enrichment produced a previously unseen breadth and depth of viral peptide sequencing. We confirmed protein translation for 84 viral genes at high confidence (1% FDR) and correlated relative protein abundance with RNA expression levels. Using a proteogenomic approach, we confirmed translation of most well-characterized spliced viral transcripts and identified a novel, abundant isoform of the 14 kDa transcript family via IsoSeq transcripts, short-read intron-spanning sequencing reads, and a high-quality junction-spanning peptide identification. We identified peptides representing alternative start codon usage in several genes and putative novel microORFs at the 5' ends of two core herpesviral genes, pUL47 and ICP4, along with strong evidence of independent transcription and translation of the capsid scaffold protein pUL26.5. Using a natural animal host model system to examine viral gene expression provides a robust, efficient, and meaningful way of validating results gathered from cell culture systems.
Collapse
Affiliation(s)
| | - Stephen J Spatz
- US National Poultry Research Laboratory, ARS, USDA, Athens, Georgia, United States of America
| | - Nagendraprabhu Ponnuraj
- Department of Pathobiology, College of Veterinary Medicine, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Haji Akbar
- Department of Pathobiology, College of Veterinary Medicine, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Justine V Arrington
- Protein Sciences Facility, Roy J. Carver Biotechnology Center, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
| | - Widaliz Vega-Rodriguez
- Department of Pathobiology, College of Veterinary Medicine, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Keith W Jarosinski
- Department of Pathobiology, College of Veterinary Medicine, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| |
Collapse
|
2
|
Genome-Wide Profiling of Polyadenylation Events in Maize Using High-Throughput Transcriptomic Sequences. G3-GENES GENOMES GENETICS 2019; 9:2749-2760. [PMID: 31239292 PMCID: PMC6686930 DOI: 10.1534/g3.119.400196] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Polyadenylation is an essential post-transcriptional modification of eukaryotic transcripts that plays critical role in transcript stability, localization, transport, and translational efficiency. About 70% genes in plants contain alternative polyadenylation (APA) sites. Despite availability of vast amount of sequencing data, to date, a comprehensive map of the polyadenylation events in maize is not available. Here, 9.48 billion RNA-Seq reads were analyzed to characterize 95,345 Poly(A) Clusters (PAC) in 23,705 (51%) maize genes. Of these, 76% were APA genes. However, most APA genes (55%) expressed a dominant PAC rather than favoring multiple PACs equally. The lincRNA genes with PACs were significantly longer in length than the genes without any PAC and about 48% genes had APA sites. Heterogeneity was observed in 52% of the PACs supporting the imprecise nature of the polyadenylation process. Genomic distribution revealed that the majority of the PACs (78%) were located in the genic regions. Unlike previous studies, large number of PACs were observed in the intergenic (n = 21,264), 5′-UTR (735), CDS (2,542), and the intronic regions (12,841). The CDS and introns with PACs were longer in length than without PACs, whereas intergenic PACs were more often associated with transcripts that lacked annotated 3′-UTRs. Nucleotide composition around PACs demonstrated AT-richness and the common upstream motif was AAUAAA, which is consistent with other plants. According to this study, only 2,830 genes still maintained the use of AAUAAA motif. This large-scale data provides useful insights about the gene expression regulation and could be utilized as evidence to validate the annotation of transcript ends.
Collapse
|
3
|
Bogard N, Linder J, Rosenberg AB, Seelig G. A Deep Neural Network for Predicting and Engineering Alternative Polyadenylation. Cell 2019; 178:91-106.e23. [PMID: 31178116 PMCID: PMC6599575 DOI: 10.1016/j.cell.2019.04.046] [Citation(s) in RCA: 102] [Impact Index Per Article: 20.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Revised: 03/18/2019] [Accepted: 04/29/2019] [Indexed: 12/22/2022]
Abstract
Alternative polyadenylation (APA) is a major driver of transcriptome diversity in human cells. Here, we use deep learning to predict APA from DNA sequence alone. We trained our model (APARENT, APA REgression NeT) on isoform expression data from over 3 million APA reporters. APARENT's predictions are highly accurate when tasked with inferring APA in synthetic and human 3'UTRs. Visualizing features learned across all network layers reveals that APARENT recognizes sequence motifs known to recruit APA regulators, discovers previously unknown sequence determinants of 3' end processing, and integrates these features into a comprehensive, interpretable, cis-regulatory code. We apply APARENT to forward engineer functional polyadenylation signals with precisely defined cleavage position and isoform usage and validate predictions experimentally. Finally, we use APARENT to quantify the impact of genetic variants on APA. Our approach detects pathogenic variants in a wide range of disease contexts, expanding our understanding of the genetic origins of disease.
Collapse
Affiliation(s)
- Nicholas Bogard
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA 98195, USA
| | - Johannes Linder
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA 98195, USA
| | - Alexander B Rosenberg
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA 98195, USA
| | - Georg Seelig
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA 98195, USA; Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
4
|
Wang R, Zheng D, Yehia G, Tian B. A compendium of conserved cleavage and polyadenylation events in mammalian genes. Genome Res 2018; 28:1427-1441. [PMID: 30143597 PMCID: PMC6169888 DOI: 10.1101/gr.237826.118] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2018] [Accepted: 08/08/2018] [Indexed: 12/22/2022]
Abstract
Cleavage and polyadenylation is essential for 3' end processing of almost all eukaryotic mRNAs. Recent studies have shown widespread alternative cleavage and polyadenylation (APA) events leading to mRNA isoforms with different 3' UTRs and/or coding sequences. Here, we present a compendium of conserved cleavage and polyadenylation sites (PASs) in mammalian genes, based on approximately 1.2 billion 3' end sequencing reads from more than 360 human, mouse, and rat samples. We show that ∼80% of mammalian mRNA genes contain at least one conserved PAS, and ∼50% have conserved APA events. PAS conservation generally reduces promiscuous 3' end processing, stabilizing gene expression levels across species. Conservation of APA correlates with gene age, gene expression features, and gene functions. Genes with certain functions, such as cell morphology, cell proliferation, and mRNA metabolism, are particularly enriched with conserved APA events. Whereas tissue-specific genes typically have a low APA rate, brain-specific genes tend to evolve APA. In addition, we show enrichment of mRNA destabilizing motifs in alternative 3' UTR sequences, leading to substantial differences in mRNA stability between 3' UTR isoforms. Using conserved PASs, we reveal sequence motifs surrounding APA sites and a preference of adenosine at the cleavage site. Furthermore, we show that mutations of U-rich motifs around the PAS often accompany APA profile differences between species. Analysis of lncRNA PASs indicates a mechanism of PAS fixation through evolution of A-rich motifs. Taken together, our results present a comprehensive view of PAS evolution in mammals, and a phylogenic perspective on APA functions.
Collapse
Affiliation(s)
- Ruijia Wang
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, New Jersey 07103, USA
- Rutgers Cancer Institute of New Jersey, Newark, New Jersey 07103, USA
| | - Dinghai Zheng
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, New Jersey 07103, USA
- Rutgers Cancer Institute of New Jersey, Newark, New Jersey 07103, USA
| | - Ghassan Yehia
- Genome Editing Core Facility, Rutgers University, New Brunswick, New Jersey 08901, USA
| | - Bin Tian
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, New Jersey 07103, USA
- Rutgers Cancer Institute of New Jersey, Newark, New Jersey 07103, USA
| |
Collapse
|
5
|
Wang R, Zheng D, Yehia G, Tian B. A compendium of conserved cleavage and polyadenylation events in mammalian genes. Genome Res 2018. [PMID: 30143597 DOI: 10.1101/gr.237826.118.28] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2023]
Abstract
Cleavage and polyadenylation is essential for 3' end processing of almost all eukaryotic mRNAs. Recent studies have shown widespread alternative cleavage and polyadenylation (APA) events leading to mRNA isoforms with different 3' UTRs and/or coding sequences. Here, we present a compendium of conserved cleavage and polyadenylation sites (PASs) in mammalian genes, based on approximately 1.2 billion 3' end sequencing reads from more than 360 human, mouse, and rat samples. We show that ∼80% of mammalian mRNA genes contain at least one conserved PAS, and ∼50% have conserved APA events. PAS conservation generally reduces promiscuous 3' end processing, stabilizing gene expression levels across species. Conservation of APA correlates with gene age, gene expression features, and gene functions. Genes with certain functions, such as cell morphology, cell proliferation, and mRNA metabolism, are particularly enriched with conserved APA events. Whereas tissue-specific genes typically have a low APA rate, brain-specific genes tend to evolve APA. In addition, we show enrichment of mRNA destabilizing motifs in alternative 3' UTR sequences, leading to substantial differences in mRNA stability between 3' UTR isoforms. Using conserved PASs, we reveal sequence motifs surrounding APA sites and a preference of adenosine at the cleavage site. Furthermore, we show that mutations of U-rich motifs around the PAS often accompany APA profile differences between species. Analysis of lncRNA PASs indicates a mechanism of PAS fixation through evolution of A-rich motifs. Taken together, our results present a comprehensive view of PAS evolution in mammals, and a phylogenic perspective on APA functions.
Collapse
Affiliation(s)
- Ruijia Wang
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, New Jersey 07103, USA
- Rutgers Cancer Institute of New Jersey, Newark, New Jersey 07103, USA
| | - Dinghai Zheng
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, New Jersey 07103, USA
- Rutgers Cancer Institute of New Jersey, Newark, New Jersey 07103, USA
| | - Ghassan Yehia
- Genome Editing Core Facility, Rutgers University, New Brunswick, New Jersey 08901, USA
| | - Bin Tian
- Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, New Jersey 07103, USA
- Rutgers Cancer Institute of New Jersey, Newark, New Jersey 07103, USA
| |
Collapse
|
6
|
Neve J, Patel R, Wang Z, Louey A, Furger AM. Cleavage and polyadenylation: Ending the message expands gene regulation. RNA Biol 2017; 14:865-890. [PMID: 28453393 PMCID: PMC5546720 DOI: 10.1080/15476286.2017.1306171] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Revised: 03/02/2017] [Accepted: 03/09/2017] [Indexed: 12/13/2022] Open
Abstract
Cleavage and polyadenylation (pA) is a fundamental step that is required for the maturation of primary protein encoding transcripts into functional mRNAs that can be exported from the nucleus and translated in the cytoplasm. 3'end processing is dependent on the assembly of a multiprotein processing complex on the pA signals that reside in the pre-mRNAs. Most eukaryotic genes have multiple pA signals, resulting in alternative cleavage and polyadenylation (APA), a widespread phenomenon that is important to establish cell state and cell type specific transcriptomes. Here, we review how pA sites are recognized and comprehensively summarize how APA is regulated and creates mRNA isoform profiles that are characteristic for cell types, tissues, cellular states and disease.
Collapse
Affiliation(s)
- Jonathan Neve
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Radhika Patel
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Zhiqiao Wang
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Alastair Louey
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | | |
Collapse
|
7
|
Beisang D, Reilly C, Bohjanen PR. Alternative polyadenylation regulates CELF1/CUGBP1 target transcripts following T cell activation. Gene 2014; 550:93-100. [PMID: 25123787 PMCID: PMC4162518 DOI: 10.1016/j.gene.2014.08.021] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Revised: 07/22/2014] [Accepted: 08/10/2014] [Indexed: 01/19/2023]
Abstract
Alternative polyadenylation (APA) is an evolutionarily conserved mechanism for regulating gene expression. Transcript 3' end shortening through changes in polyadenylation site usage occurs following T cell activation, but the consequences of APA on gene expression are poorly understood. We previously showed that GU-rich elements (GREs) found in the 3' untranslated regions of select transcripts mediate rapid mRNA decay by recruiting the protein CELF1/CUGBP1. Using a global RNA sequencing approach, we found that a network of CELF1 target transcripts involved in cell division underwent preferential 3' end shortening via APA following T cell activation, resulting in decreased inclusion of CELF1 binding sites and increased transcript expression. We present a model whereby CELF1 regulates APA site selection following T cell activation through reversible binding to nearby GRE sequences. These findings provide insight into the role of APA in controlling cellular proliferation during biological processes such as development, oncogenesis and T cell activation.
Collapse
Affiliation(s)
- Daniel Beisang
- Center for Infectious Diseases and Microbiology Translational Research, University of Minnesota, Minneapolis, MN, USA; Department of Microbiology, University of Minnesota, Minneapolis, MN, USA.
| | - Cavan Reilly
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA.
| | - Paul R Bohjanen
- Center for Infectious Diseases and Microbiology Translational Research, University of Minnesota, Minneapolis, MN, USA; Department of Microbiology, University of Minnesota, Minneapolis, MN, USA; Department of Medicine, University of Minnesota, Minneapolis, MN, USA.
| |
Collapse
|
8
|
Li XQ, Du D. Motif types, motif locations and base composition patterns around the RNA polyadenylation site in microorganisms, plants and animals. BMC Evol Biol 2014; 14:162. [PMID: 25052519 PMCID: PMC4360255 DOI: 10.1186/s12862-014-0162-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Accepted: 07/14/2014] [Indexed: 12/22/2022] Open
Abstract
Background The polyadenylation of RNA is critical for gene functioning, but the conserved sequence motifs (often called signal or signature motifs), motif locations and abundances, and base composition patterns around mRNA polyadenylation [poly(A)] sites are still uncharacterized in most species. The evolutionary tendency for poly(A) site selection is still largely unknown. Results We analyzed the poly(A) site regions of 31 species or phyla. Different groups of species showed different poly(A) signal motifs: UUACUU at the poly(A) site in the parasite Trypanosoma cruzi; UGUAAC (approximately 13 bases upstream of the site) in the alga Chlamydomonas reinhardtii; UGUUUG (or UGUUUGUU) at mainly the fourth base downstream of the poly(A) site in the parasite Blastocystis hominis; and AAUAAA at approximately 16 bases and approximately 19 bases upstream of the poly(A) site in animals and plants, respectively. Polyadenylation signal motifs are usually several hundred times more abundant around poly(A) sites than in whole genomes. These predominant motifs usually had very specific locations, whether upstream of, at, or downstream of poly(A) sites, depending on the species or phylum. The poly(A) site was usually an adenosine (A) in all analyzed species except for B. hominis, and there was weak A predominance in C. reinhardtii. Fungi, animals, plants, and the protist Phytophthora infestans shared a general base abundance pattern (or base composition pattern) of “U-rich—A-rich—U-rich—Poly(A) site—U-rich regions”, or U-A-U-A-U for short, with some variation for each kingdom or subkingdom. Conclusion This study identified the poly(A) signal motifs, motif locations, and base composition patterns around mRNA poly(A) sites in protists, fungi, plants, and animals and provided insight into poly(A) site evolution.
Collapse
Affiliation(s)
- Xiu-Qing Li
- Molecular Genetics Laboratory, Potato Research Centre, Agriculture and Agri-Food Canada, 850 Lincoln Road, Fredericton, New Brunswick, E3B 4Z7, Canada.
| | - Donglei Du
- Quantitative Methods Research Group, Faculty of Business Administration, University of New Brunswick, 7 Macaulay Lane, Fredericton, NB, E3B 5A3, Canada.
| |
Collapse
|
9
|
Li XQ. Comparative analysis of the base compositions of the pre-mRNA 3' cleaved-off region and the mRNA 3' untranslated region relative to the genomic base composition in animals and plants. PLoS One 2014; 9:e99928. [PMID: 24941005 PMCID: PMC4062462 DOI: 10.1371/journal.pone.0099928] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2013] [Accepted: 05/20/2014] [Indexed: 12/26/2022] Open
Abstract
The precursor messenger RNA (pre-mRNA) three-prime cleaved-off region (3′COR) and the mRNA three-prime untranslated region (3′UTR) play critical roles in regulating gene expression. The differences in base composition between these regions and the corresponding genomes are still largely uncharacterized in animals and plants. In this study, the base compositions of non-redundant 3′CORs and 3′UTRs were compared with the corresponding whole genomes of eleven animals, four dicotyledonous plants, and three monocotyledonous (cereal) plants. Among the four bases (A, C, G, and U for adenine, cytosine, guanine, and uracil, respectively), U (which corresponds to T, for thymine, in DNA) was the most frequent, A the second most frequent, G the third most frequent, and C the least frequent in most of the species in both the 3′COR and 3′UTR regions. In comparison with the whole genomes, in both regions the U content was usually the most overrepresented (particularly in the monocotyledonous plants), and the C content was the most underrepresented. The order obtained for the species groups, when ranked from high to low according to the U contents in the 3′COR and 3′UTR was as follows: dicotyledonous plants, monocotyledonous plants, non-mammal animals, and mammals. In contrast, the genomic T content was highest in dicotyledonous plants, lowest in monocotyledonous plants, and intermediate in animals. These results suggest the following: 1) there is a mechanism operating in both animals and plants which is biased toward U and against C in the 3′COR and 3′UTR; 2) the 3′UTR and 3′COR, as functional units, minimized the difference between dicotyledonous and monocotyledonous plants, while the dicotyledonous and monocotyledonous genomes evolved into two extreme groups in terms of base composition.
Collapse
Affiliation(s)
- Xiu-Qing Li
- Potato Research Centre, Agriculture and Agri-Food Canada, Fredericton, New Brunswick, Canada
- * E-mail:
| |
Collapse
|
10
|
Li XQ, Du D. Variation, evolution, and correlation analysis of C+G content and genome or chromosome size in different kingdoms and phyla. PLoS One 2014; 9:e88339. [PMID: 24551092 PMCID: PMC3923770 DOI: 10.1371/journal.pone.0088339] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2013] [Accepted: 01/06/2014] [Indexed: 12/05/2022] Open
Abstract
C+G content (GC content or G+C content) is known to be correlated with genome/chromosome size in bacteria but the relationship for other kingdoms remains unclear. This study analyzed genome size, chromosome size, and base composition in most of the available sequenced genomes in various kingdoms. Genome size tends to increase during evolution in plants and animals, and the same is likely true for bacteria. The genomic C+G contents were found to vary greatly in microorganisms but were quite similar within each animal or plant subkingdom. In animals and plants, the C+G contents are ranked as follows: monocot plants>mammals>non-mammalian animals>dicot plants. The variation in C+G content between chromosomes within species is greater in animals than in plants. The correlation between average chromosome C+G content and chromosome length was found to be positive in Proteobacteria, Actinobacteria (but not in other analyzed bacterial phyla), Ascomycota fungi, and likely also in some plants; negative in some animals, insignificant in two protist phyla, and likely very weak in Archaea. Clearly, correlations between C+G content and chromosome size can be positive, negative, or not significant depending on the kingdoms/groups or species. Different phyla or species exhibit different patterns of correlation between chromosome-size and C+G content. Most chromosomes within a species have a similar pattern of variation in C+G content but outliers are common. The data presented in this study suggest that the C+G content is under genetic control by both trans- and cis- factors and that the correlation between C+G content and chromosome length can be positive, negative, or not significant in different phyla.
Collapse
Affiliation(s)
- Xiu-Qing Li
- Molecular Genetics Laboratory, Potato Research Centre, Agriculture and Agri-Food Canada, Fredericton, New Brunswick, Canada
- * E-mail:
| | - Donglei Du
- Quantitative Methods Research Group, Faculty of Business Administration, University of New Brunswick, Fredericton, New Brunswick, Canada
| |
Collapse
|