1
|
Xi S, Nguyen T, Murray S, Lorenz P, Mellor J. Size fractionated NET-Seq reveals a conserved architecture of transcription units around yeast genes. Yeast 2024; 41:222-241. [PMID: 38433440 DOI: 10.1002/yea.3931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 02/06/2024] [Accepted: 02/09/2024] [Indexed: 03/05/2024] Open
Abstract
Genomes from yeast to humans are subject to pervasive transcription. A single round of pervasive transcription is sufficient to alter local chromatin conformation, nucleosome dynamics and gene expression, but is hard to distinguish from background signals. Size fractionated native elongating transcript sequencing (sfNET-Seq) was developed to precisely map nascent transcripts independent of expression levels. RNAPII-associated nascent transcripts are fractionation into different size ranges before library construction. When anchored to the transcription start sites (TSS) of annotated genes, the combined pattern of the output metagenes gives the expected reference pattern. Bioinformatic pattern matching to the reference pattern identified 9542 transcription units in Saccharomyces cerevisiae, of which 47% are coding and 53% are noncoding. In total, 3113 (33%) are unannotated noncoding transcription units. Anchoring all transcription units to the TSS or polyadenylation site (PAS) of annotated genes reveals distinctive architectures of linked pairs of divergent transcripts approximately 200nt apart. The Reb1 transcription factor is enriched 30nt downstream of the PAS only when an upstream (TSS -60nt with respect to PAS) noncoding transcription unit co-occurs with a downstream (TSS +150nt) coding transcription unit and acts to limit levels of upstream antisense transcripts. The potential for extensive transcriptional interference is evident from low abundance unannotated transcription units with variable TSS (median -240nt) initiating within a 500nt window upstream of, and transcribing over, the promoters of protein-coding genes. This study confirms a highly interleaved yeast genome with different types of transcription units altering the chromatin landscape in distinctive ways, with the potential to exert extensive regulatory control.
Collapse
Affiliation(s)
- Shidong Xi
- Department of Biochemistry, University of Oxford, Oxford, UK
| | - Tania Nguyen
- Department of Biochemistry, University of Oxford, Oxford, UK
| | - Struan Murray
- Department of Biochemistry, University of Oxford, Oxford, UK
| | - Phil Lorenz
- Department of Biochemistry, University of Oxford, Oxford, UK
| | - Jane Mellor
- Department of Biochemistry, University of Oxford, Oxford, UK
| |
Collapse
|
2
|
Calvo-Roitberg E, Carroll CL, Venev SV, Kim G, Mick ST, Dekker J, Fiszbein A, Pai AA. mRNA initiation and termination are spatially coordinated. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.05.574404. [PMID: 38260419 PMCID: PMC10802295 DOI: 10.1101/2024.01.05.574404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
The expression of a precise mRNA transcriptome is crucial for establishing cell identity and function, with dozens of alternative isoforms produced for a single gene sequence. The regulation of mRNA isoform usage occurs by the coordination of co-transcriptional mRNA processing mechanisms across a gene. Decisions involved in mRNA initiation and termination underlie the largest extent of mRNA isoform diversity, but little is known about any relationships between decisions at both ends of mRNA molecules. Here, we systematically profile the joint usage of mRNA transcription start sites (TSSs) and polyadenylation sites (PASs) across tissues and species. Using both short and long read RNA-seq data, we observe that mRNAs preferentially using upstream TSSs also tend to use upstream PASs, and congruently, the usage of downstream sites is similarly paired. This observation suggests that mRNA 5' end choice may directly influence mRNA 3' ends. Our results suggest a novel "Positional Initiation-Termination Axis" (PITA), in which the usage of alternative terminal sites are coupled based on the order in which they appear in the genome. PITA isoforms are more likely to encode alternative protein domains and use conserved sites. PITA is strongly associated with the length of genomic features, such that PITA is enriched in longer genes with more area devoted to regions that regulate alternative 5' or 3' ends. Strikingly, we found that PITA genes are more likely than non-PITA genes to have multiple, overlapping chromatin structural domains related to pairing of ordinally coupled start and end sites. In turn, PITA coupling is also associated with fast RNA Polymerase II (RNAPII) trafficking across these long gene regions. Our findings indicate that a combination of spatial and kinetic mechanisms couple transcription initiation and mRNA 3' end decisions based on ordinal position to define the expression mRNA isoforms.
Collapse
Affiliation(s)
| | | | - Sergey V. Venev
- Department of Systems Biology, University Massachusetts Chan Medical School, Worcester, MA
| | - GyeungYun Kim
- Department of Biology, Boston University, Boston, MA
| | | | - Job Dekker
- Department of Systems Biology, University Massachusetts Chan Medical School, Worcester, MA
- Howard Hughes Medical Institute, Chevy Chase, MD
| | - Ana Fiszbein
- Department of Biology, Boston University, Boston, MA
- Center for Computing & Data Sciences, Boston University, Boston, MA
| | - Athma A. Pai
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA
| |
Collapse
|
3
|
Calvo-Roitberg E, Daniels RF, Pai AA. Challenges in identifying mRNA transcript starts and ends from long-read sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.26.550536. [PMID: 37546743 PMCID: PMC10402045 DOI: 10.1101/2023.07.26.550536] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Long-read sequencing (LRS) technologies have the potential to revolutionize scientific discoveries in RNA biology, especially by enabling the comprehensive identification and quantification of full length mRNA isoforms. However, inherently high error rates make the analysis of long-read sequencing data challenging. While these error rates have been characterized for sequence and splice site identification, it is still unclear how accurately LRS reads represent transcript start and end sites. Here, we systematically assess the variability and accuracy of mRNA terminal ends identified by LRS reads across multiple sequencing platforms. We find substantial inconsistencies in both the start and end coordinates of LRS reads spanning a gene, such that LRS reads often fail to accurately recapitulate annotated or empirically derived terminal ends of mRNA molecules. To address this challenge, we introduce an approach to condition reads based on empirically derived terminal ends and identified a subset of reads that are more likely to represent full-length transcripts. Our approach can improve transcriptome analyses by enhancing the fidelity of transcript terminal end identification, but may result in lower power to quantify genes or discover novel isoforms. Thus, it is necessary to be cautious when selecting sequencing approaches and/or interpreting data from long-read RNA sequencing.
Collapse
Affiliation(s)
| | - Rachel F Daniels
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA
| | - Athma A Pai
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA
| |
Collapse
|
4
|
Schon MA, Lutzmayer S, Hofmann F, Nodine MD. Bookend: precise transcript reconstruction with end-guided assembly. Genome Biol 2022; 23:143. [PMID: 35768836 PMCID: PMC9245221 DOI: 10.1186/s13059-022-02700-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 06/05/2022] [Indexed: 12/29/2022] Open
Abstract
We developed Bookend, a package for transcript assembly that incorporates data from different RNA-seq techniques, with a focus on identifying and utilizing RNA 5′ and 3′ ends. We demonstrate that correct identification of transcript start and end sites is essential for precise full-length transcript assembly. Utilization of end-labeled reads present in full-length single-cell RNA-seq datasets dramatically improves the precision of transcript assembly in single cells. Finally, we show that hybrid assembly across short-read, long-read, and end-capture RNA-seq datasets from Arabidopsis thaliana, as well as meta-assembly of RNA-seq from single mouse embryonic stem cells, can produce reference-quality end-to-end transcript annotations.
Collapse
Affiliation(s)
- Michael A Schon
- Cluster of Plant Developmental Biology, Laboratory of Molecular Biology, Wageningen University & Research, Wageningen, 6708, PB, The Netherlands. .,Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria.
| | - Stefan Lutzmayer
- Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria
| | - Falko Hofmann
- Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria
| | - Michael D Nodine
- Cluster of Plant Developmental Biology, Laboratory of Molecular Biology, Wageningen University & Research, Wageningen, 6708, PB, The Netherlands. .,Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria.
| |
Collapse
|
5
|
Li B, Marques S, Wang J, Pelechano V. Using TIF-Seq2 to investigate association between 5´ and 3´mRNA ends. Methods Enzymol 2021; 655:85-118. [PMID: 34183135 DOI: 10.1016/bs.mie.2021.03.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
The development of high-throughput technologies has revealed pervasive transcription in all genomes that have been investigated so far. This has uncovered a highly interleaved transcriptome organization involving thousands of overlapping coding and non-coding RNA isoforms that challenge our traditional definitions of genes and functional regions of the genome. In this chapter, we discuss the application of an improved Transcript Isoform Sequencing approach (TIF-Seq2) able to concurrently determine the start and end sites of individual RNA molecules. We exemplify its use for the investigation of the human transcriptome and show how it is especially well suited to discriminate between overlapping molecules and accurately define their boundaries.
Collapse
Affiliation(s)
- Bingnan Li
- SciLifeLab, Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Solna, Sweden
| | - Sueli Marques
- SciLifeLab, Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Solna, Sweden
| | - Jingwen Wang
- SciLifeLab, Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Solna, Sweden
| | - Vicent Pelechano
- SciLifeLab, Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Solna, Sweden.
| |
Collapse
|
6
|
Chia M, Li C, Marques S, Pelechano V, Luscombe NM, van Werven FJ. High-resolution analysis of cell-state transitions in yeast suggests widespread transcriptional tuning by alternative starts. Genome Biol 2021; 22:34. [PMID: 33446241 PMCID: PMC7807719 DOI: 10.1186/s13059-020-02245-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 12/15/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND The start and end sites of messenger RNAs (TSSs and TESs) are highly regulated, often in a cell-type-specific manner. Yet the contribution of transcript diversity in regulating gene expression remains largely elusive. We perform an integrative analysis of multiple highly synchronized cell-fate transitions and quantitative genomic techniques in Saccharomyces cerevisiae to identify regulatory functions associated with transcribing alternative isoforms. RESULTS Cell-fate transitions feature widespread elevated expression of alternative TSS and, to a lesser degree, TES usage. These dynamically regulated alternative TSSs are located mostly upstream of canonical TSSs, but also within gene bodies possibly encoding for protein isoforms. Increased upstream alternative TSS usage is linked to various effects on canonical TSS levels, which range from co-activation to repression. We identified two key features linked to these outcomes: an interplay between alternative and canonical promoter strengths, and distance between alternative and canonical TSSs. These two regulatory properties give a plausible explanation of how locally transcribed alternative TSSs control gene transcription. Additionally, we find that specific chromatin modifiers Set2, Set3, and FACT play an important role in mediating gene repression via alternative TSSs, further supporting that the act of upstream transcription drives the local changes in gene transcription. CONCLUSIONS The integrative analysis of multiple cell-fate transitions suggests the presence of a regulatory control system of alternative TSSs that is important for dynamic tuning of gene expression. Our work provides a framework for understanding how TSS heterogeneity governs eukaryotic gene expression, particularly during cell-fate changes.
Collapse
Affiliation(s)
- Minghao Chia
- The Francis Crick Institute, London, UK
- Genome Institute of Singapore, 60 Biopolis Street, Genome, #02-01, Singapore, 138672, Singapore
| | - Cai Li
- The Francis Crick Institute, London, UK
- School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Sueli Marques
- SciLifeLab, Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Solna, Sweden
| | - Vicente Pelechano
- SciLifeLab, Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Solna, Sweden
| | - Nicholas M Luscombe
- The Francis Crick Institute, London, UK
- Okinawa Institute of Science & Technology Graduate University, Okinawa, 904-0495, Japan
- UCL Genetics Institute, University College London, London, WC1E 6BT, UK
| | | |
Collapse
|
7
|
Kim H, Kim E, Lee I, Bae B, Park M, Nam H. Artificial Intelligence in Drug Discovery: A Comprehensive Review of Data-driven and Machine Learning Approaches. BIOTECHNOL BIOPROC E 2021; 25:895-930. [PMID: 33437151 PMCID: PMC7790479 DOI: 10.1007/s12257-020-0049-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 05/27/2020] [Accepted: 06/03/2020] [Indexed: 02/07/2023]
Abstract
As expenditure on drug development increases exponentially, the overall drug discovery process requires a sustainable revolution. Since artificial intelligence (AI) is leading the fourth industrial revolution, AI can be considered as a viable solution for unstable drug research and development. Generally, AI is applied to fields with sufficient data such as computer vision and natural language processing, but there are many efforts to revolutionize the existing drug discovery process by applying AI. This review provides a comprehensive, organized summary of the recent research trends in AI-guided drug discovery process including target identification, hit identification, ADMET prediction, lead optimization, and drug repositioning. The main data sources in each field are also summarized in this review. In addition, an in-depth analysis of the remaining challenges and limitations will be provided, and proposals for promising future directions in each of the aforementioned areas.
Collapse
Affiliation(s)
- Hyunho Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Eunyoung Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Ingoo Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Bongsung Bae
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Minsu Park
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| |
Collapse
|