1
|
Fan C, Xing X, Murphy SJH, Poursine-Laurent J, Schmidt H, Parikh BA, Yoon J, Choudhary MNK, Saligrama N, Piersma SJ, Yokoyama WM, Wang T. Cis-regulatory evolution of the recently expanded Ly49 gene family. Nat Commun 2024; 15:4839. [PMID: 38844462 PMCID: PMC11156856 DOI: 10.1038/s41467-024-48990-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 05/14/2024] [Indexed: 06/09/2024] Open
Abstract
Comparative genomics has revealed the rapid expansion of multiple gene families involved in immunity. Members within each gene family often evolved distinct roles in immunity. However, less is known about the evolution of their epigenome and cis-regulation. Here we systematically profile the epigenome of the recently expanded murine Ly49 gene family that mainly encode either inhibitory or activating surface receptors on natural killer cells. We identify a set of cis-regulatory elements (CREs) for activating Ly49 genes. In addition, we show that in mice, inhibitory and activating Ly49 genes are regulated by two separate sets of proximal CREs, likely resulting from lineage-specific losses of CRE activity. Furthermore, we find that some Ly49 genes are cross-regulated by the CREs of other Ly49 genes, suggesting that the Ly49 family has begun to evolve a concerted cis-regulatory mechanism. Collectively, we demonstrate the different modes of cis-regulatory evolution for a rapidly expanding gene family.
Collapse
Affiliation(s)
- Changxu Fan
- Department of Genetics, Washington University School of Medicine, St. Louis, 63110, USA
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, 63110, USA
| | - Xiaoyun Xing
- Department of Genetics, Washington University School of Medicine, St. Louis, 63110, USA
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, 63110, USA
| | - Samuel J H Murphy
- Department of Neurology, Washington University School of Medicine, St. Louis, 63110, USA
- Medical Scientist Training Program, Washington University School of Medicine, St. Louis, 63110, USA
| | - Jennifer Poursine-Laurent
- Division of Rheumatology, Department of Medicine, Washington University School of Medicine, St. Louis, 63110, USA
| | - Heather Schmidt
- Department of Genetics, Washington University School of Medicine, St. Louis, 63110, USA
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, 63110, USA
| | - Bijal A Parikh
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, 63110, USA
| | - Jeesang Yoon
- Division of Rheumatology, Department of Medicine, Washington University School of Medicine, St. Louis, 63110, USA
| | - Mayank N K Choudhary
- Department of Genetics, Washington University School of Medicine, St. Louis, 63110, USA
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, 63110, USA
| | - Naresha Saligrama
- Department of Neurology, Washington University School of Medicine, St. Louis, 63110, USA
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, 63110, USA
- Bursky Center for Human Immunology and Immunotherapy Programs, Washington University School of Medicine, St. Louis, 63110, USA
- Hope Center for Neurological Disorders, Washington University School of Medicine, St. Louis, 63110, USA
- Center for Brain Immunology and Glia (BIG), Washington University School of Medicine, St. Louis, 63110, USA
| | - Sytse J Piersma
- Division of Rheumatology, Department of Medicine, Washington University School of Medicine, St. Louis, 63110, USA.
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, 63110, USA.
| | - Wayne M Yokoyama
- Division of Rheumatology, Department of Medicine, Washington University School of Medicine, St. Louis, 63110, USA.
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, 63110, USA.
| | - Ting Wang
- Department of Genetics, Washington University School of Medicine, St. Louis, 63110, USA.
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, 63110, USA.
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, 63110, USA.
| |
Collapse
|
2
|
Maeng JH, Jang HJ, Du AY, Tzeng SC, Wang T. Using long-read CAGE sequencing to profile cryptic-promoter-derived transcripts and their contribution to the immunopeptidome. Genome Res 2023; 33:gr.277061.122. [PMID: 38065624 PMCID: PMC10760525 DOI: 10.1101/gr.277061.122] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 11/13/2023] [Indexed: 01/04/2024]
Abstract
Recent studies have shown that the noncoding genome can produce unannotated proteins as antigens that induce immune response. One major source of this activity is the aberrant epigenetic reactivation of transposable elements (TEs). In tumors, TEs often provide cryptic or alternate promoters, which can generate transcripts that encode tumor-specific unannotated proteins. Thus, TE-derived transcripts (TE transcripts) have the potential to produce tumor-specific, but recurrent, antigens shared among many tumors. Identification of TE-derived tumor antigens holds the promise to improve cancer immunotherapy approaches; however, current genomics and computational tools are not optimized for their detection. Here we combined CAGE technology with full-length long-read transcriptome sequencing (long-read CAGE, or LRCAGE) and developed a suite of computational tools to significantly improve immunopeptidome detection by incorporating TE and other tumor transcripts into the proteome database. By applying our methods to human lung cancer cell line H1299 data, we show that long-read technology significantly improves mapping of promoters with low mappability scores and that LRCAGE guarantees accurate construction of uncharacterized 5' transcript structure. Augmenting a reference proteome database with newly characterized transcripts enabled us to detect noncanonical antigens from HLA-pulldown LC-MS/MS data. Lastly, we show that epigenetic treatment increased the number of noncanonical antigens, particularly those encoded by TE transcripts, which might expand the pool of targetable antigens for cancers with low mutational burden.
Collapse
Affiliation(s)
- Ju Heon Maeng
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - H Josh Jang
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Alan Y Du
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Shin-Cheng Tzeng
- Donald Danforth Plant Science Center, St. Louis, Missouri 63132, USA
| | - Ting Wang
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| |
Collapse
|
3
|
Milito A, Aschern M, McQuillan JL, Yang JS. Challenges and advances towards the rational design of microalgal synthetic promoters in Chlamydomonas reinhardtii. JOURNAL OF EXPERIMENTAL BOTANY 2023; 74:3833-3850. [PMID: 37025006 DOI: 10.1093/jxb/erad100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 03/24/2023] [Indexed: 06/19/2023]
Abstract
Microalgae hold enormous potential to provide a safe and sustainable source of high-value compounds, acting as carbon-fixing biofactories that could help to mitigate rapidly progressing climate change. Bioengineering microalgal strains will be key to optimizing and modifying their metabolic outputs, and to render them competitive with established industrial biotechnology hosts, such as bacteria or yeast. To achieve this, precise and tuneable control over transgene expression will be essential, which would require the development and rational design of synthetic promoters as a key strategy. Among green microalgae, Chlamydomonas reinhardtii represents the reference species for bioengineering and synthetic biology; however, the repertoire of functional synthetic promoters for this species, and for microalgae generally, is limited in comparison to other commercial chassis, emphasizing the need to expand the current microalgal gene expression toolbox. Here, we discuss state-of-the-art promoter analyses, and highlight areas of research required to advance synthetic promoter development in C. reinhardtii. In particular, we exemplify high-throughput studies performed in other model systems that could be applicable to microalgae, and propose novel approaches to interrogating algal promoters. We lastly outline the major limitations hindering microalgal promoter development, while providing novel suggestions and perspectives for how to overcome them.
Collapse
Affiliation(s)
- Alfonsina Milito
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus UAB, Bellaterra, Barcelona, Spain
| | - Moritz Aschern
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus UAB, Bellaterra, Barcelona, Spain
| | - Josie L McQuillan
- Department of Chemical and Biological Engineering, University of Sheffield, Mappin Street, Sheffield, S1 3JD, UK
| | - Jae-Seong Yang
- Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus UAB, Bellaterra, Barcelona, Spain
| |
Collapse
|
4
|
Murray A, Mendieta JP, Vollmers C, Schmitz RJ. Simple and accurate transcriptional start site identification using Smar2C2 and examination of conserved promoter features. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 112:583-596. [PMID: 36030508 PMCID: PMC9827901 DOI: 10.1111/tpj.15957] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 08/12/2022] [Accepted: 08/22/2022] [Indexed: 06/15/2023]
Abstract
The precise and accurate identification and quantification of transcriptional start sites (TSSs) is key to understanding the control of transcription. The core promoter consists of the TSS and proximal non-coding sequences, which are critical in transcriptional regulation. Therefore, the accurate identification of TSSs is important for understanding the molecular regulation of transcription. Existing protocols for TSS identification are challenging and expensive, leaving high-quality data available for a small subset of organisms. This sparsity of data impairs study of TSS usage across tissues or in an evolutionary context. To address these shortcomings, we developed Smart-Seq2 Rolling Circle to Concatemeric Consensus (Smar2C2), which identifies and quantifies TSSs and transcription termination sites. Smar2C2 incorporates unique molecular identifiers that allowed for the identification of as many as 70 million sites, with no known upper limit. We have also generated TSS data sets from as little as 40 pg of total RNA, which was the smallest input tested. In this study, we used Smar2C2 to identify TSSs in Glycine max (soybean), Oryza sativa (rice), Sorghum bicolor (sorghum), Triticum aestivum (wheat) and Zea mays (maize) across multiple tissues. This wide panel of plant TSSs facilitated the identification of evolutionarily conserved features, such as novel patterns in the dinucleotides that compose the initiator element (Inr), that correlated with promoter expression levels across all species examined. We also discovered sequence variations in known promoter motifs that are positioned reliably close to the TSS, such as differences in the TATA box and in the Inr that may prove significant to our understanding and control of transcription initiation. Smar2C2 allows for the easy study of these critical sequences, providing a tool to facilitate discovery.
Collapse
Affiliation(s)
- Andrew Murray
- Department of Plant BiologyUniversity of GeorgiaAthensGA30602USA
| | | | - Chris Vollmers
- Deparment of Biomolecular EngineeringUniversity of California Santa CruzSanta CruzCA95064USA
| | | |
Collapse
|
5
|
Moody J, Kouno T, Chang JC, Ando Y, Carninci P, Shin JW, Hon CC. SCAFE: a software suite for analysis of transcribed cis-regulatory elements in single cells. Bioinformatics 2022; 38:5126-5128. [PMID: 36173306 PMCID: PMC9665856 DOI: 10.1093/bioinformatics/btac644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 08/30/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Cell type-specific activities of cis-regulatory elements (CRE) are central to understanding gene regulation and disease predisposition. Single-cell RNA 5'end sequencing (sc-end5-seq) captures the transcription start sites (TSS) which can be used as a proxy to measure the activity of transcribed CREs (tCREs). However, a substantial fraction of TSS identified from sc-end5-seq data may not be genuine due to various artifacts, hindering the use of sc-end5-seq for de novo discovery of tCREs. RESULTS We developed SCAFE-Single-Cell Analysis of Five-prime Ends-a software suite that processes sc-end5-seq data to de novo identify TSS clusters based on multiple logistic regression. It annotates tCREs based on the identified TSS clusters and generates a tCRE-by-cell count matrix for downstream analyses. The software suite consists of a set of flexible tools that could either be run independently or as pre-configured workflows. AVAILABILITY AND IMPLEMENTATION SCAFE is implemented in Perl and R. The source code and documentation are freely available for download under the MIT License from https://github.com/chung-lab/SCAFE. Docker images are available from https://hub.docker.com/r/cchon/scafe. The submitted software version and test data are archived at https://doi.org/10.5281/zenodo.7023163 and https://doi.org/10.5281/zenodo.7024060, respectively. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Jen-Chien Chang
- RIKEN Center for Integrative Medical Sciences, Yokohama City, Kanagawa 230-0045, Japan
| | - Yoshinari Ando
- RIKEN Center for Integrative Medical Sciences, Yokohama City, Kanagawa 230-0045, Japan
| | - Piero Carninci
- RIKEN Center for Integrative Medical Sciences, Yokohama City, Kanagawa 230-0045, Japan,Human Technopole, Milan 20157, Italy
| | - Jay W Shin
- To whom correspondence should be addressed. or
| | | |
Collapse
|
6
|
Lee HJ, Hou Y, Maeng JH, Shah NM, Chen Y, Lawson HA, Yang H, Yue F, Wang T. Epigenomic analysis reveals prevalent contribution of transposable elements to cis-regulatory elements, tissue-specific expression, and alternative promoters in zebrafish. Genome Res 2022; 32:1424-1436. [PMID: 35649578 PMCID: PMC9341505 DOI: 10.1101/gr.276052.121] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2021] [Accepted: 05/27/2022] [Indexed: 12/04/2022]
Abstract
Transposable elements (TEs) encode regulatory elements that impact gene expression in multiple species, yet a comprehensive analysis of zebrafish TEs in the context of gene regulation is lacking. Here, we systematically investigate the epigenomic and transcriptomic landscape of TEs across 11 adult zebrafish tissues using multidimensional sequencing data. We find that TEs contribute substantially to a diverse array of regulatory elements in the zebrafish genome and that 37% of TEs are positioned in active regulatory states in adult zebrafish tissues. We identify TE subfamilies enriched in highly specific regulatory elements among different tissues. We use transcript assembly to discover TE-derived transcriptional units expressed across tissues. Finally, we show that novel TE-derived promoters can initiate tissue-specific transcription of alternate gene isoforms. This work provides a comprehensive profile of TE activity across normal zebrafish tissues, shedding light on mechanisms underlying the regulation of gene expression in this widely used model organism.
Collapse
Affiliation(s)
- Hyung Joo Lee
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA;,Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Yiran Hou
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA;,Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Ju Heon Maeng
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA;,Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Nakul M. Shah
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA;,Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Yujie Chen
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA;,Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Heather A. Lawson
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Hongbo Yang
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, Illinois 60611, USA
| | - Feng Yue
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, Illinois 60611, USA;,Robert H. Lurie Comprehensive Cancer Center of Northwestern University, Chicago, Illinois 60611, USA
| | - Ting Wang
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA;,Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA;,McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| |
Collapse
|
7
|
Schon MA, Lutzmayer S, Hofmann F, Nodine MD. Bookend: precise transcript reconstruction with end-guided assembly. Genome Biol 2022; 23:143. [PMID: 35768836 PMCID: PMC9245221 DOI: 10.1186/s13059-022-02700-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 06/05/2022] [Indexed: 12/29/2022] Open
Abstract
We developed Bookend, a package for transcript assembly that incorporates data from different RNA-seq techniques, with a focus on identifying and utilizing RNA 5′ and 3′ ends. We demonstrate that correct identification of transcript start and end sites is essential for precise full-length transcript assembly. Utilization of end-labeled reads present in full-length single-cell RNA-seq datasets dramatically improves the precision of transcript assembly in single cells. Finally, we show that hybrid assembly across short-read, long-read, and end-capture RNA-seq datasets from Arabidopsis thaliana, as well as meta-assembly of RNA-seq from single mouse embryonic stem cells, can produce reference-quality end-to-end transcript annotations.
Collapse
Affiliation(s)
- Michael A Schon
- Cluster of Plant Developmental Biology, Laboratory of Molecular Biology, Wageningen University & Research, Wageningen, 6708, PB, The Netherlands. .,Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria.
| | - Stefan Lutzmayer
- Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria
| | - Falko Hofmann
- Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria
| | - Michael D Nodine
- Cluster of Plant Developmental Biology, Laboratory of Molecular Biology, Wageningen University & Research, Wageningen, 6708, PB, The Netherlands. .,Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria.
| |
Collapse
|
8
|
Chiu CW, Li YR, Lin CY, Yeh HH, Liu MJ. Translation initiation landscape profiling reveals hidden open-reading frames required for the pathogenesis of tomato yellow leaf curl Thailand virus. THE PLANT CELL 2022; 34:1804-1821. [PMID: 35080617 PMCID: PMC9048955 DOI: 10.1093/plcell/koac019] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 01/06/2022] [Indexed: 05/12/2023]
Abstract
Plant viruses with densely packed genomes employ noncanonical translational strategies to increase the coding capacity for viral function. However, the diverse translational strategies used make it challenging to define the full set of viral genes. Here, using tomato yellow leaf curl Thailand virus (TYLCTHV, genus Begomovirus) as a model system, we identified genes beyond the annotated gene sets by experimentally profiling in vivo translation initiation sites (TISs). We found that unanticipated AUG TISs were prevalent and determined that their usage involves alternative transcriptional and/or translational start sites and is associated with flanking mRNA sequences. Specifically, two downstream in-frame TISs were identified in the viral gene AV2. These TISs were conserved in the begomovirus lineage and led to the translation of different protein isoforms localized to cytoplasmic puncta and at the cell periphery, respectively. In addition, we found translational evidence of an unexplored gene, BV2. BV2 is conserved among TYLCTHV isolates and localizes to the endoplasmic reticulum and plasmodesmata. Mutations of AV2 isoforms and BV2 significantly attenuated disease symptoms in tomato (Solanum lycopersicum). In conclusion, our study pinpointing in vivo TISs untangles the coding complexity of a plant viral genome and, more importantly, illustrates the biological significance of the hidden open-reading frames encoding viral factors for pathogenicity.
Collapse
Affiliation(s)
- Ching-Wen Chiu
- Biotechnology Center in Southern Taiwan, Academia Sinica, Tainan 711, Taiwan
| | - Ya-Ru Li
- Biotechnology Center in Southern Taiwan, Academia Sinica, Tainan 711, Taiwan
| | - Cheng-Yuan Lin
- Biotechnology Center in Southern Taiwan, Academia Sinica, Tainan 711, Taiwan
| | - Hsin-Hung Yeh
- Agricultural Biotechnology Research Center, Academia Sinica, Taipei 115, Taiwan
| | | |
Collapse
|
9
|
Abstract
Transcription start site (TSS) usage is a critical factor in the regulation of gene expression. A number of methods for global TSS mapping have been developed, but barriers of expense, technical difficulty, time, and/or cost have limited their broader adoption. To address these issues, we developed Survey of TRanscription Initiation at Promoter Elements with high-throughput sequencing (STRIPE-seq). Requiring only three enzymatic steps with intervening bead cleanups, a STRIPE-seq library can be prepared from as little as 50 ng total RNA in ~5 h at a cost of ~$12 (US). In addition to profiling TSS usage, STRIPE-seq provides information on transcript levels that can be used for differential expression analysis. Thanks to its simplicity and low cost, we envision that STRIPE-seq could be employed by any molecular biology laboratory interested in profiling transcription initiation.
Collapse
Affiliation(s)
| | - Gabriel E Zentner
- Department of Biology, Indiana University, Bloomington, IN, USA.
- Indiana University Melvin and Bren Simon Comprehensive Cancer Center, Indianapolis, IN, USA.
- eGenesis, Inc., Cambridge, MA, USA.
| |
Collapse
|
10
|
Wulf MG, Maguire S, Dai N, Blondel A, Posfai D, Krishnan K, Sun Z, Guan S, Corrêa IR. Chemical capping improves template switching and enhances sequencing of small RNAs. Nucleic Acids Res 2021; 50:e2. [PMID: 34581823 PMCID: PMC8754658 DOI: 10.1093/nar/gkab861] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 08/26/2021] [Accepted: 09/14/2021] [Indexed: 12/16/2022] Open
Abstract
Template-switching reverse transcription is widely used in RNA sequencing for low-input and low-quality samples, including RNA from single cells or formalin-fixed paraffin-embedded (FFPE) tissues. Previously, we identified the native eukaryotic mRNA 5′ cap as a key structural element for enhancing template switching efficiency. Here, we introduce CapTS-seq, a new strategy for sequencing small RNAs that combines chemical capping and template switching. We probed a variety of non-native synthetic cap structures and found that an unmethylated guanosine triphosphate cap led to the lowest bias and highest efficiency for template switching. Through cross-examination of different nucleotides at the cap position, our data provided unequivocal evidence that the 5′ cap acts as a template for the first nucleotide in reverse transcriptase-mediated post-templated addition to the emerging cDNA—a key feature to propel template switching. We deployed CapTS-seq for sequencing synthetic miRNAs, human total brain and liver FFPE RNA, and demonstrated that it consistently improves library quality for miRNAs in comparison with a gold standard template switching-based small RNA-seq kit.
Collapse
Affiliation(s)
- Madalee G Wulf
- New England Biolabs, Inc., 240 County Road, Ipswich, MA 01938, USA
| | - Sean Maguire
- New England Biolabs, Inc., 240 County Road, Ipswich, MA 01938, USA
| | - Nan Dai
- New England Biolabs, Inc., 240 County Road, Ipswich, MA 01938, USA
| | - Alice Blondel
- New England Biolabs, Inc., 240 County Road, Ipswich, MA 01938, USA
| | - Dora Posfai
- New England Biolabs, Inc., 240 County Road, Ipswich, MA 01938, USA
| | | | - Zhiyi Sun
- New England Biolabs, Inc., 240 County Road, Ipswich, MA 01938, USA
| | - Shengxi Guan
- New England Biolabs, Inc., 240 County Road, Ipswich, MA 01938, USA
| | - Ivan R Corrêa
- New England Biolabs, Inc., 240 County Road, Ipswich, MA 01938, USA
| |
Collapse
|
11
|
Abstract
Transcription start site (TSS) selection influences transcript stability and translation as well as protein sequence. Alternative TSS usage is pervasive in organismal development, is a major contributor to transcript isoform diversity in humans, and is frequently observed in human diseases including cancer. In this review, we discuss the breadth of techniques that have been used to globally profile TSSs and the resulting insights into gene regulation, as well as future prospects in this area of inquiry.
Collapse
Affiliation(s)
| | - Gabriel E. Zentner
- Department of Biology, Indiana University, Bloomington, IN 47401, USA
- Indiana University Melvin and Bren Simon Comprehensive Cancer Center, Indianapolis, IN 46202, USA
| |
Collapse
|
12
|
Policastro RA, McDonald DJ, Brendel VP, Zentner GE. Flexible analysis of TSS mapping data and detection of TSS shifts with TSRexploreR. NAR Genom Bioinform 2021; 3:lqab051. [PMID: 34250478 PMCID: PMC8265037 DOI: 10.1093/nargab/lqab051] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 04/29/2021] [Accepted: 05/18/2021] [Indexed: 12/13/2022] Open
Abstract
Heterogeneity in transcription initiation has important consequences for transcript stability and translation, and shifts in transcription start site (TSS) usage are prevalent in various developmental, metabolic, and disease contexts. Accordingly, numerous methods for global TSS profiling have been developed, including most recently Survey of TRanscription Initiation at Promoter Elements with high-throughput sequencing (STRIPE-seq), a method to profile transcription start sites (TSSs) on a genome-wide scale with significant cost and time savings compared to previous methods. In anticipation of more widespread adoption of STRIPE-seq and related methods for construction of promoter atlases and studies of differential gene expression, we built TSRexploreR, an R package for end-to-end analysis of TSS mapping data. TSRexploreR provides functions for TSS and transcription start region (TSR) detection, normalization, correlation, visualization, and differential TSS/TSR analyses. TSRexploreR is highly interoperable, accepting the data structures of TSS and TSR sets generated by several existing tools for processing and alignment of TSS mapping data, such as CAGEr for Cap Analysis of Gene Expression (CAGE) data. Lastly, TSRexploreR implements a novel approach for the detection of shifts in TSS distribution.
Collapse
Affiliation(s)
| | - Daniel J McDonald
- Department of Statistics, Indiana University, Bloomington, IN 47405, USA
| | - Volker P Brendel
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| | - Gabriel E Zentner
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
- Indiana University Melvin and Bren Simon Comprehensive Cancer Center, Indianapolis, IN 46202, USA
| |
Collapse
|
13
|
Policastro RA, Raborn RT, Brendel VP, Zentner GE. Simple and efficient profiling of transcription initiation and transcript levels with STRIPE-seq. Genome Res 2020; 30:910-923. [PMID: 32660958 PMCID: PMC7370879 DOI: 10.1101/gr.261545.120] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 06/18/2020] [Indexed: 01/07/2023]
Abstract
Accurate mapping of transcription start sites (TSSs) is key for understanding transcriptional regulation. However, current protocols for genome-wide TSS profiling are laborious and/or expensive. We present Survey of TRanscription Initiation at Promoter Elements with high-throughput sequencing (STRIPE-seq), a simple, rapid, and cost-effective protocol for sequencing capped RNA 5' ends from as little as 50 ng total RNA. Including depletion of uncapped RNA and reaction cleanups, a STRIPE-seq library can be constructed in about 5 h. We show application of STRIPE-seq to TSS profiling in yeast and human cells and show that it can also be effectively used for quantification of transcript levels and analysis of differential gene expression. In conjunction with our ready-to-use computational workflows, STRIPE-seq is a straightforward, efficient means by which to probe the landscape of transcriptional initiation.
Collapse
Affiliation(s)
| | | | - Volker P Brendel
- Department of Biology
- Department of Computer Science, Indiana University, Bloomington, Indiana 47405, USA
| | - Gabriel E Zentner
- Department of Biology
- Indiana University Melvin and Bren Simon Comprehensive Cancer Center, Indianapolis, Indiana 46202, USA
| |
Collapse
|
14
|
Wulf MG, Maguire S, Humbert P, Dai N, Bei Y, Nichols NM, Corrêa IR, Guan S. Non-templated addition and template switching by Moloney murine leukemia virus (MMLV)-based reverse transcriptases co-occur and compete with each other. J Biol Chem 2019; 294:18220-18231. [PMID: 31640989 PMCID: PMC6885630 DOI: 10.1074/jbc.ra119.010676] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 10/17/2019] [Indexed: 11/21/2022] Open
Abstract
Single-cell RNA-Seq (scRNA-Seq) has led to an unprecedented understanding of gene expression and regulation in individual cells. Many scRNA-Seq approaches rely upon the template switching property of Moloney murine leukemia virus (MMLV)-type reverse transcriptases. Template switching is believed to happen in a sequential process involving nontemplated addition of three protruding nucleotides (+CCC) to the 3′-end of the nascent cDNA, which can then anneal to the matching rGrGrG 3′-end of the template-switching oligo (TSO), allowing the reverse transcriptase (RT) to switch templates and continue copying the TSO sequence. In this study, we present a detailed analysis of template switching biases with respect to the RNA template, specifically of the role of the sequence and nature of its 5′-end (capped versus noncapped) in these biases. Our findings confirmed that the presence of a 5′-m7G cap enhances template switching efficiency. We also profiled the composition of the nontemplated addition in the absence of TSO and observed that the 5′-end of RNA template influences the terminal transferase activity of the RT. Furthermore, we found that designing new TSOs that pair with the most common nontemplated additions did little to improve template switching efficiency. Our results provide evidence suggesting that, in contrast to the current understanding of the template switching process, nontemplated addition and template switching are concurrent and competing processes.
Collapse
Affiliation(s)
| | - Sean Maguire
- New England Biolabs, Inc., Ipswich, Massachusetts 01938
| | - Paul Humbert
- New England Biolabs, Inc., Ipswich, Massachusetts 01938
| | - Nan Dai
- New England Biolabs, Inc., Ipswich, Massachusetts 01938
| | - Yanxia Bei
- New England Biolabs, Inc., Ipswich, Massachusetts 01938
| | | | - Ivan R Corrêa
- New England Biolabs, Inc., Ipswich, Massachusetts 01938.
| | - Shengxi Guan
- New England Biolabs, Inc., Ipswich, Massachusetts 01938.
| |
Collapse
|
15
|
Thodberg M, Thieffry A, Vitting-Seerup K, Andersson R, Sandelin A. CAGEfightR: analysis of 5'-end data using R/Bioconductor. BMC Bioinformatics 2019; 20:487. [PMID: 31585526 PMCID: PMC6778389 DOI: 10.1186/s12859-019-3029-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 08/15/2019] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND 5'-end sequencing assays, and Cap Analysis of Gene Expression (CAGE) in particular, have been instrumental in studying transcriptional regulation. 5'-end methods provide genome-wide maps of transcription start sites (TSSs) with base pair resolution. Because active enhancers often feature bidirectional TSSs, such data can also be used to predict enhancer candidates. The current availability of mature and comprehensive computational tools for the analysis of 5'-end data is limited, preventing efficient analysis of new and existing 5'-end data. RESULTS We present CAGEfightR, a framework for analysis of CAGE and other 5'-end data implemented as an R/Bioconductor-package. CAGEfightR can import data from BigWig files and allows for fast and memory efficient prediction and analysis of TSSs and enhancers. Downstream analyses include quantification, normalization, annotation with transcript and gene models, TSS shape statistics, linking TSSs to enhancers via co-expression, identification of enhancer clusters, and genome-browser style visualization. While built to analyze CAGE data, we demonstrate the utility of CAGEfightR in analyzing nascent RNA 5'-data (PRO-Cap). CAGEfightR is implemented using standard Bioconductor classes, making it easy to learn, use and combine with other Bioconductor packages, for example popular differential expression tools such as limma, DESeq2 and edgeR. CONCLUSIONS CAGEfightR provides a single, scalable and easy-to-use framework for comprehensive downstream analysis of 5'-end data. CAGEfightR is designed to be interoperable with other Bioconductor packages, thereby unlocking hundreds of mature transcriptomic analysis tools for 5'-end data. CAGEfightR is freely available via Bioconductor: bioconductor.org/packages/CAGEfightR .
Collapse
Affiliation(s)
- Malte Thodberg
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark.
- Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark.
| | - Axel Thieffry
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark
- Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark
| | - Kristoffer Vitting-Seerup
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark
- Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark
- Danish Cancer Society, Strandboulevarden 49 DK2100, Copenhagen Ø, Denmark
| | - Robin Andersson
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark
| | - Albin Sandelin
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark.
- Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2100, Copenhagen N, Denmark.
| |
Collapse
|
16
|
Abstract
Application of Transcription Start Site (TSS) profiling technologies, coupled with large-scale next-generation sequencing (NGS) has yielded valuable insights into the location, structure, and activity of promoters across diverse metazoan model systems. In insects, TSS profiling has been used to characterize the promoter architecture of Drosophila melanogaster (Hoskins et al., Genome Res 21(2):182-192, 2011) and subsequently was employed to reveal widespread transposon-driven alternative promoter usage in the fruit fly (Batut et al., Genome Res 23:169-180, 2012).In this chapter we discuss the computational analysis of the experimental data derived from one of TSS profiling methods, RAMPAGE (RNA Annotation and Mapping of Promoters for Analysis of Gene Expression) that can be used for the precise, quantitative identification of promoters in insect genomes. We demonstrate this using the software tools GoRAMPAGE (Brendel and Raborn, GoRAMPAGE-A workflow for promoter detection by 5'-read mapping. https://github.com/BrendelGroup/GoRAMPAGE , 2016) and TSRchitect (Raborn and Brendel, TSRchitect: promoter identification from large-scale TSS profiling data. R Bioconductor package version 1.8.0 [Online]. Available: http://bioconductor.org/packages/release/bioc/html/TSRchitect.html , 2017), providing detailed instructions with the aim of taking the user from raw reads to processed results.
Collapse
|
17
|
Song QA, Catlin NS, Brad Barbazuk W, Li S. Computational analysis of alternative splicing in plant genomes. Gene 2019; 685:186-195. [PMID: 30321657 DOI: 10.1016/j.gene.2018.10.026] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Revised: 09/16/2018] [Accepted: 10/11/2018] [Indexed: 12/11/2022]
Abstract
Computational analyses play crucial roles in characterizing splicing isoforms in plant genomes. In this review, we provide a survey of computational tools used in recently published, genome-scale splicing analyses in plants. We summarize the commonly used software and pipelines for read mapping, isoform reconstruction, isoform quantification, and differential expression analysis. We also discuss methods for analyzing long reads and the strategies to combine long and short reads in identifying splicing isoforms. We review several tools for characterizing local splicing events, splicing graphs, coding potential, and visualizing splicing isoforms. We further discuss the procedures for identifying conserved splicing isoforms across plant species. Finally, we discuss the outlook of integrating other genomic data with splicing analyses to identify regulatory mechanisms of AS on genome-wide scale.
Collapse
Affiliation(s)
- Qi A Song
- Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, United States of America
| | - Nathan S Catlin
- Department of Biology, University of Florida, Gainesville, FL 32611, United States of America
| | - W Brad Barbazuk
- Department of Biology, University of Florida, Gainesville, FL 32611, United States of America; Genetics Institute, University of Florida, Gainesville, FL 32611, United States of America
| | - Song Li
- School of Plant and Environmental Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, United States of America.
| |
Collapse
|
18
|
Wang HLV, Chekanova JA. An Overview of Methodologies in Studying lncRNAs in the High-Throughput Era: When Acronyms ATTACK! Methods Mol Biol 2019; 1933:1-30. [PMID: 30945176 PMCID: PMC6684206 DOI: 10.1007/978-1-4939-9045-0_1] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The discovery of pervasive transcription in eukaryotic genomes provided one of many surprising (and perhaps most surprising) findings of the genomic era and led to the uncovering of a large number of previously unstudied transcriptional events. This pervasive transcription leads to the production of large numbers of noncoding RNAs (ncRNAs) and thus opened the window to study these diverse, abundant transcripts of unclear relevance and unknown function. Since that discovery, recent advances in high-throughput sequencing technologies have identified a large collection of ncRNAs, from microRNAs to long noncoding RNAs (lncRNAs). Subsequent discoveries have shown that many lncRNAs play important roles in various eukaryotic processes; these discoveries have profoundly altered our understanding of the regulation of eukaryotic gene expression. Although the identification of ncRNAs has become a standard experimental approach, the functional characterization of these diverse ncRNAs remains a major challenge. In this chapter, we highlight recent progress in the methods to identify lncRNAs and the techniques to study the molecular function of these lncRNAs and the application of these techniques to the study of plant lncRNAs.
Collapse
Affiliation(s)
- Hsiao-Lin V Wang
- Guangxi Key Laboratory of Sugarcane Biology, Guangxi University, Nanning, Guangxi, China
- Present address: Department of Biology, Emory University, Atlanta, GA, USA
| | - Julia A Chekanova
- Guangxi Key Laboratory of Sugarcane Biology, Guangxi University, Nanning, Guangxi, China.
| |
Collapse
|
19
|
Schon MA, Kellner MJ, Plotnikova A, Hofmann F, Nodine MD. NanoPARE: parallel analysis of RNA 5' ends from low-input RNA. Genome Res 2018; 28:1931-1942. [PMID: 30355603 PMCID: PMC6280765 DOI: 10.1101/gr.239202.118] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Accepted: 10/15/2018] [Indexed: 11/25/2022]
Abstract
Diverse RNA 5′ ends are generated through both transcriptional and post-transcriptional processes. These important modes of gene regulation often vary across cell types and can contribute to the diversification of transcriptomes and thus cellular differentiation. Therefore, the identification of primary and processed 5′ ends of RNAs is important for their functional characterization. Methods have been developed to profile either RNA 5′ ends from primary transcripts or the products of RNA degradation genome-wide. However, these approaches either require high amounts of starting RNA or are performed in the absence of paired gene-body mRNA-seq data. This limits current efforts in RNA 5′ end annotation to whole tissues and can prevent accurate RNA 5′ end classification due to biases in the data sets. To enable the accurate identification and precise classification of RNA 5′ ends from standard and low-input RNA, we developed a next-generation sequencing-based method called nanoPARE and associated software. By integrating RNA 5′ end information from nanoPARE with gene-body mRNA-seq data from the same RNA sample, our method enables the identification of transcription start sites at single-nucleotide resolution from single-cell levels of total RNA, as well as small RNA-mediated cleavage events from at least 10,000-fold less total RNA compared to conventional approaches. NanoPARE can therefore be used to accurately profile transcription start sites, noncapped RNA 5′ ends, and small RNA targeting events from individual tissue types. As a proof-of-principle, we utilized nanoPARE to improve Arabidopsis thaliana RNA 5′ end annotations and quantify microRNA-mediated cleavage events across five different flower tissues.
Collapse
Affiliation(s)
- Michael A Schon
- Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), 1030 Vienna, Austria
| | - Max J Kellner
- Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), 1030 Vienna, Austria
| | - Alexandra Plotnikova
- Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), 1030 Vienna, Austria
| | - Falko Hofmann
- Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), 1030 Vienna, Austria
| | - Michael D Nodine
- Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), 1030 Vienna, Austria
| |
Collapse
|
20
|
Comprehensive comparative analysis of 5'-end RNA-sequencing methods. Nat Methods 2018; 15:505-511. [PMID: 29867192 PMCID: PMC6075671 DOI: 10.1038/s41592-018-0014-2] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 04/10/2018] [Indexed: 12/20/2022]
Abstract
Specialized RNA-seq methods are required to identify the 5' ends of transcripts, which are critical for studies of gene regulation, but these methods have not been systematically benchmarked. We directly compared six such methods, including the performance of five methods on a single human cellular RNA sample and a new spike-in RNA assay that helps circumvent challenges resulting from uncertainties in annotation and RNA processing. We found that the 'cap analysis of gene expression' (CAGE) method performed best for mRNA and that most of its unannotated peaks were supported by evidence from other genomic methods. We applied CAGE to eight brain-related samples and determined sample-specific transcription start site (TSS) usage, as well as a transcriptome-wide shift in TSS usage between fetal and adult brain.
Collapse
|
21
|
Poulain S, Kato S, Arnaud O, Morlighem JÉ, Suzuki M, Plessy C, Harbers M. NanoCAGE: A Method for the Analysis of Coding and Noncoding 5'-Capped Transcriptomes. Methods Mol Biol 2018; 1543:57-109. [PMID: 28349422 DOI: 10.1007/978-1-4939-6716-2_4] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Transcripts in all eukaryotes are characterized by the 5'-end specific cap structure in mRNAs. Cap Analysis Gene Expression or CAGE makes use of these caps to specifically obtain cDNA fragments from the 5'-end of RNA and sequences those at high throughput for transcript identification and genome-wide mapping of transcription start sites for coding and noncoding genes. Here, we provide an improved version of our nanoCAGE protocol that has been developed for preparing CAGE libraries from as little as 50 ng of total RNA within three standard working days. Key steps in library preparation have been improved over our previously published protocol to obtain libraries having a good 5'-end selection and a more equal size distribution for higher sequencing efficiency on Illumina MiSeq and HiSeq sequencers. We recommend nanoCAGE as the method of choice for transcriptome profiling projects even from limited amounts of RNA, and as the best approach for genome-wide mapping of transcription start sites within promoter regions.
Collapse
Affiliation(s)
- Stéphane Poulain
- Division of Genomic Technologies, RIKEN Center for Life Science Technologies, 1-7-22, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Sachi Kato
- Division of Genomic Technologies, RIKEN Center for Life Science Technologies, 1-7-22, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
- RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Ophélie Arnaud
- Division of Genomic Technologies, RIKEN Center for Life Science Technologies, 1-7-22, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Jean-Étienne Morlighem
- RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
- Laboratory of Biochemistry and Biotechnology, Institute for Marine Sciences, Federal University of Ceara, Av. da Abolição, 3207-Meireles, Fortaleza, CE, 60165-081, Brazil
| | - Makoto Suzuki
- RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
- DNAFORM, Inc., Leading Venture Plaza 2, 75-1 Ono-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0046, Japan
| | - Charles Plessy
- Division of Genomic Technologies, RIKEN Center for Life Science Technologies, 1-7-22, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan.
- RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan.
| | - Matthias Harbers
- Division of Genomic Technologies, RIKEN Center for Life Science Technologies, 1-7-22, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan.
- RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan.
| |
Collapse
|
22
|
You Q, Yan H, Liu Y, Yi X, Zhang K, Xu W, Su Z. A systemic identification approach for primary transcription start site of Arabidopsis miRNAs from multidimensional omics data. Funct Integr Genomics 2016; 17:353-363. [PMID: 28032247 DOI: 10.1007/s10142-016-0541-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Revised: 12/13/2016] [Accepted: 12/19/2016] [Indexed: 01/08/2023]
Abstract
The 22-nucleotide non-coding microRNAs (miRNAs) are mostly transcribed by RNA polymerase II and are similar to protein-coding genes. Unlike the clear process from stem-loop precursors to mature miRNAs, the primary transcriptional regulation of miRNA, especially in plants, still needs to be further clarified, including the original transcription start site, functional cis-elements and primary transcript structures. Due to several well-characterized transcription signals in the promoter region, we proposed a systemic approach integrating multidimensional "omics" (including genomics, transcriptomics, and epigenomics) data to improve the genome-wide identification of primary miRNA transcripts. Here, we used the model plant Arabidopsis thaliana to improve the ability to identify candidate promoter locations in intergenic miRNAs and to determine rules for identifying primary transcription start sites of miRNAs by integrating high-throughput omics data, such as the DNase I hypersensitive sites, chromatin immunoprecipitation-sequencing of polymerase II and H3K4me3, as well as high throughput transcriptomic data. As a result, 93% of refined primary transcripts could be confirmed by the primer pairs from a previous study. Cis-element and secondary structure analyses also supported the feasibility of our results. This work will contribute to the primary transcriptional regulatory analysis of miRNAs, and the conserved regulatory pattern may be a suitable miRNA characteristic in other plant species.
Collapse
Affiliation(s)
- Qi You
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Hengyu Yan
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Yue Liu
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Xin Yi
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Kang Zhang
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Wenying Xu
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Zhen Su
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China.
| |
Collapse
|
23
|
Dorrell RG, Klinger CM, Newby RJ, Butterfield ER, Richardson E, Dacks JB, Howe CJ, Nisbet ER, Bowler C. Progressive and Biased Divergent Evolution Underpins the Origin and Diversification of Peridinin Dinoflagellate Plastids. Mol Biol Evol 2016; 34:361-379. [DOI: 10.1093/molbev/msw235] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
24
|
Megraw M, Cumbie JS, Ivanchenko MG, Filichkin SA. Small Genetic Circuits and MicroRNAs: Big Players in Polymerase II Transcriptional Control in Plants. THE PLANT CELL 2016; 28:286-303. [PMID: 26869700 PMCID: PMC4790873 DOI: 10.1105/tpc.15.00852] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2015] [Accepted: 02/10/2016] [Indexed: 05/11/2023]
Abstract
RNA Polymerase II (Pol II) regulatory cascades involving transcription factors (TFs) and their targets orchestrate the genetic circuitry of every eukaryotic organism. In order to understand how these cascades function, they can be dissected into small genetic networks, each containing just a few Pol II transcribed genes, that generate specific signal-processing outcomes. Small RNA regulatory circuits involve direct regulation of a small RNA by a TF and/or direct regulation of a TF by a small RNA and have been shown to play unique roles in many organisms. Here, we will focus on small RNA regulatory circuits containing Pol II transcribed microRNAs (miRNAs). While the role of miRNA-containing regulatory circuits as modular building blocks for the function of complex networks has long been on the forefront of studies in the animal kingdom, plant studies are poised to take a lead role in this area because of their advantages in probing transcriptional and posttranscriptional control of Pol II genes. The relative simplicity of tissue- and cell-type organization, miRNA targeting, and genomic structure make the Arabidopsis thaliana plant model uniquely amenable for small RNA regulatory circuit studies in a multicellular organism. In this Review, we cover analysis, tools, and validation methods for probing the component interactions in miRNA-containing regulatory circuits. We then review the important roles that plant miRNAs are playing in these circuits and summarize methods for the identification of small genetic circuits that strongly influence plant function. We conclude by noting areas of opportunity where new plant studies are imminently needed.
Collapse
Affiliation(s)
- Molly Megraw
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon 97331 Department of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon 97331 Center for Genome Research and Biocomputing, Oregon State University, Corvallis, Oregon 97331
| | - Jason S Cumbie
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon 97331
| | - Maria G Ivanchenko
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon 97331
| | - Sergei A Filichkin
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon 97331 Center for Genome Research and Biocomputing, Oregon State University, Corvallis, Oregon 97331
| |
Collapse
|
25
|
A One Precursor One siRNA Model for Pol IV-Dependent siRNA Biogenesis. Cell 2016; 163:445-55. [PMID: 26451488 DOI: 10.1016/j.cell.2015.09.032] [Citation(s) in RCA: 180] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Revised: 08/14/2015] [Accepted: 09/11/2015] [Indexed: 01/07/2023]
Abstract
RNA-directed DNA methylation in Arabidopsis thaliana is driven by the plant-specific RNA Polymerase IV (Pol IV). It has been assumed that a Pol IV transcript can give rise to multiple 24-nt small interfering RNAs (siRNAs) that target DNA methylation. Here, we demonstrate that Pol IV-dependent RNAs (P4RNAs) from wild-type Arabidopsis are surprisingly short in length (30 to 40 nt) and mirror 24-nt siRNAs in distribution, abundance, strand bias, and 5'-adenine preference. P4RNAs exhibit transcription start sites similar to Pol II products and are featured with 5'-monophosphates and 3'-misincorporated nucleotides. The 3'-misincorporation preferentially occurs at methylated cytosines on the template DNA strand, suggesting a co-transcriptional feedback to siRNA biogenesis by DNA methylation to reinforce silencing locally. These results highlight an unusual mechanism of Pol IV transcription and suggest a "one precursor, one siRNA" model for the biogenesis of 24-nt siRNAs in Arabidopsis.
Collapse
|
26
|
Blevins T, Podicheti R, Mishra V, Marasco M, Wang J, Rusch D, Tang H, Pikaard CS. Identification of Pol IV and RDR2-dependent precursors of 24 nt siRNAs guiding de novo DNA methylation in Arabidopsis. eLife 2015; 4:e09591. [PMID: 26430765 PMCID: PMC4716838 DOI: 10.7554/elife.09591] [Citation(s) in RCA: 167] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2015] [Accepted: 10/01/2015] [Indexed: 12/21/2022] Open
Abstract
In Arabidopsis thaliana, abundant 24 nucleotide small interfering RNAs (24 nt siRNA) guide the cytosine methylation and silencing of transposons and a subset of genes. 24 nt siRNA biogenesis requires nuclear RNA polymerase IV (Pol IV), RNA-dependent RNA polymerase 2 (RDR2) and DICER-like 3 (DCL3). However, siRNA precursors are mostly undefined. We identified Pol IV and RDR2-dependent RNAs (P4R2 RNAs) that accumulate in dcl3 mutants and are diced into 24 nt RNAs by DCL3 in vitro. P4R2 RNAs are mostly 26-45 nt and initiate with a purine adjacent to a pyrimidine, characteristics shared by Pol IV transcripts generated in vitro. RDR2 terminal transferase activity, also demonstrated in vitro, may account for occasional non-templated nucleotides at P4R2 RNA 3' termini. The 24 nt siRNAs primarily correspond to the 5' or 3' ends of P4R2 RNAs, suggesting a model whereby siRNAs are generated from either end of P4R2 duplexes by single dicing events.
Collapse
Affiliation(s)
- Todd Blevins
- Howard Hughes Medical Institute, Indiana University, Bloomington, United States
- Department of Biology, Indiana University, Bloomington, United States
- Department of Molecular and Cellular Biochemistry, Indiana University, Bloomington, United States
| | - Ram Podicheti
- Center for Genomics and Bioinformatics, Indiana University, Bloomington, United States
- School of Informatics and Computing, Indiana University, Bloomington, United States
| | - Vibhor Mishra
- Department of Biology, Indiana University, Bloomington, United States
- Department of Molecular and Cellular Biochemistry, Indiana University, Bloomington, United States
| | - Michelle Marasco
- Department of Biology, Indiana University, Bloomington, United States
- Department of Molecular and Cellular Biochemistry, Indiana University, Bloomington, United States
| | - Jing Wang
- Department of Biology, Indiana University, Bloomington, United States
- Department of Molecular and Cellular Biochemistry, Indiana University, Bloomington, United States
| | - Doug Rusch
- Center for Genomics and Bioinformatics, Indiana University, Bloomington, United States
| | - Haixu Tang
- School of Informatics and Computing, Indiana University, Bloomington, United States
| | - Craig S Pikaard
- Howard Hughes Medical Institute, Indiana University, Bloomington, United States
- Department of Biology, Indiana University, Bloomington, United States
- Department of Molecular and Cellular Biochemistry, Indiana University, Bloomington, United States
| |
Collapse
|
27
|
Cumbie JS, Ivanchenko MG, Megraw M. NanoCAGE-XL and CapFilter: an approach to genome wide identification of high confidence transcription start sites. BMC Genomics 2015; 16:597. [PMID: 26268438 PMCID: PMC4534009 DOI: 10.1186/s12864-015-1670-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2015] [Accepted: 05/29/2015] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Identifying the transcription start sites (TSS) of genes is essential for characterizing promoter regions. Several protocols have been developed to capture the 5' end of transcripts via Cap Analysis of Gene Expression (CAGE) or linker-ligation strategies such as Paired-End Analysis of Transcription Start Sites (PEAT), but often require large amounts of tissue. More recently, nanoCAGE was developed for sequencing on the Illumina GAIIx to overcome these difficulties. RESULTS Here we present the first publicly available adaptation of nanoCAGE for sequencing on recent ultra-high throughput platforms such as Illumina HiSeq-2000, and CapFilter, a computational pipeline that greatly increases confidence in TSS identification. We report excellent gene coverage, reproducibility, and precision in transcription start site discovery for samples from Arabidopsis thaliana roots. CONCLUSION nanoCAGE-XL together with CapFilter allows for genome wide identification of high confidence transcription start sites in large eukaryotic genomes.
Collapse
Affiliation(s)
- Jason S Cumbie
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, 97331, USA.
| | - Maria G Ivanchenko
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, 97331, USA.
| | - Molly Megraw
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, 97331, USA.
- Department of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, 97331, USA.
- Center for Genome Research and Biocomputing, Oregon State University, Corvallis, OR, 97331, USA.
| |
Collapse
|