1
|
Han SW, Jewell S, Thomas-Tikhonenko A, Barash Y. Contrasting and combining transcriptome complexity captured by short and long RNA sequencing reads. Genome Res 2024; 34:1624-1635. [PMID: 39322279 PMCID: PMC11529863 DOI: 10.1101/gr.278659.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 09/11/2024] [Indexed: 09/27/2024]
Abstract
Mapping transcriptomic variations using either short- or long-read RNA sequencing is a staple of genomic research. Long reads are able to capture entire isoforms and overcome repetitive regions, whereas short reads still provide improved coverage and error rates. Yet, open questions remain, such as how to quantitatively compare the technologies, can we combine them, and what is the benefit of such a combined view? We tackle these questions by first creating a pipeline to assess matched long- and short-read data using a variety of transcriptome statistics. We find that across data sets, algorithms, and technologies, matched short-read data detects ∼30% more splice junctions, such that ∼10%-30% of the splice junctions included at ≥20% by short reads are missed by long reads. In contrast, long reads detect many more intron-retention events and can detect full isoforms, pointing to the benefit of combining the technologies. We introduce MAJIQ-L, an extension of the MAJIQ software, to enable a unified view of transcriptome variations from both technologies and demonstrate its benefits. Our software can be used to assess any future long-read technology or algorithm and can be combined with short-read data for improved transcriptome analysis.
Collapse
Affiliation(s)
- Seong Woo Han
- Department of Computer and Information Sciences, School of Engineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - San Jewell
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Andrei Thomas-Tikhonenko
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Yoseph Barash
- Department of Computer and Information Sciences, School of Engineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA;
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
2
|
Guo LT, Grinko A, Olson S, Leipold AM, Graveley B, Saliba AE, Pyle AM. Characterization and implementation of the MarathonRT template-switching reaction to expand the capabilities of RNA-seq. RNA (NEW YORK, N.Y.) 2024; 30:1495-1512. [PMID: 39174298 PMCID: PMC11482623 DOI: 10.1261/rna.080032.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 08/08/2024] [Indexed: 08/24/2024]
Abstract
End-to-end RNA-sequencing methods that capture 5'-sequence content without cumbersome library manipulations are of great interest, particularly for analysis of long RNAs. While template-switching methods have been developed for RNA sequencing by distributive short-read RTs, such as the MMLV RTs used in SMART-Seq methods, they have not been adapted to leverage the power of ultraprocessive RTs, such as those derived from group II introns. To facilitate this transition, we dissected the individual processes that guide the enzymatic specificity and efficiency of the multistep template-switching reaction carried out by RTs, in this case, by MarathonRT. Remarkably, this is the first study of its kind, for any RT. First, we characterized the nucleotide specificity of nontemplated addition (NTA) reaction that occurs when the RT extends past the RNA 5'-terminus. We then evaluated the binding specificity of specialized template-switching oligonucleotides, optimizing their sequences and chemical properties to guide efficient template-switching reaction. Having dissected and optimized these individual steps, we then unified them into a procedure for performing RNA sequencing with MarathonRT enzymes, using a well-characterized RNA reference set. The resulting reads span a six-log range in transcript concentration and accurately represent the input RNA identities in both length and composition. We also performed RNA-seq from total human RNA and poly(A)-enriched RNA, with short- and long-read sequencing demonstrating that MarathonRT enhances the discovery of unseen RNA molecules by conventional RT. Altogether, we have generated a new pipeline for rapid, accurate sequencing of complex RNA libraries containing mixtures of long RNA transcripts.
Collapse
Affiliation(s)
- Li-Tao Guo
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, Connecticut 06520, USA
| | - Anastasiya Grinko
- Helmholtz Institute for RNA-based Infection Research (HIRI), Helmholtz-Centre for Infection Research (HZI), 97080 Würzburg, Germany
| | - Sara Olson
- Genetics and Genome Sciences, University of Connecticut Health, Farmington, Connecticut 06030, USA
| | - Alexander M Leipold
- Helmholtz Institute for RNA-based Infection Research (HIRI), Helmholtz-Centre for Infection Research (HZI), 97080 Würzburg, Germany
- University of Würzburg, Faculty of Medicine, Institute of Molecular Infection Biology (IMIB), 97070 Würzburg, Germany
| | - Brenton Graveley
- Genetics and Genome Sciences, University of Connecticut Health, Farmington, Connecticut 06030, USA
| | - Antoine-Emmanuel Saliba
- Helmholtz Institute for RNA-based Infection Research (HIRI), Helmholtz-Centre for Infection Research (HZI), 97080 Würzburg, Germany
- University of Würzburg, Faculty of Medicine, Institute of Molecular Infection Biology (IMIB), 97070 Würzburg, Germany
| | - Anna Marie Pyle
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, Connecticut 06520, USA
- Department of Chemistry, Yale University, New Haven, Connecticut 06520, USA
- Howard Hughes Medical Institute, Chevy Chase, Maryland 20815, USA
| |
Collapse
|
3
|
Cheng Y, Xu SM, Santucci K, Lindner G, Janitz M. Machine learning and related approaches in transcriptomics. Biochem Biophys Res Commun 2024; 724:150225. [PMID: 38852503 DOI: 10.1016/j.bbrc.2024.150225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 05/18/2024] [Accepted: 06/03/2024] [Indexed: 06/11/2024]
Abstract
Data acquisition for transcriptomic studies used to be the bottleneck in the transcriptomic analytical pipeline. However, recent developments in transcriptome profiling technologies have increased researchers' ability to obtain data, resulting in a shift in focus to data analysis. Incorporating machine learning to traditional analytical methods allows the possibility of handling larger volumes of complex data more efficiently. Many bioinformaticians, especially those unfamiliar with ML in the study of human transcriptomics and complex biological systems, face a significant barrier stemming from their limited awareness of the current landscape of ML utilisation in this field. To address this gap, this review endeavours to introduce those individuals to the general types of ML, followed by a comprehensive range of more specific techniques, demonstrated through examples of their incorporation into analytical pipelines for human transcriptome investigations. Important computational aspects such as data pre-processing, task formulation, results (performance of ML models), and validation methods are encompassed. In hope of better practical relevance, there is a strong focus on studies published within the last five years, almost exclusively examining human transcriptomes, with outcomes compared with standard non-ML tools.
Collapse
Affiliation(s)
- Yuning Cheng
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Si-Mei Xu
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Kristina Santucci
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Grace Lindner
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Michael Janitz
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, 2052, Australia.
| |
Collapse
|
4
|
Desideri F, Grazzi A, Lisi M, Setti A, Santini T, Colantoni A, Proietti G, Carvelli A, Tartaglia GG, Ballarino M, Bozzoni I. CyCoNP lncRNA establishes cis and trans RNA-RNA interactions to supervise neuron physiology. Nucleic Acids Res 2024; 52:9936-9952. [PMID: 38989616 PMCID: PMC11381359 DOI: 10.1093/nar/gkae590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 05/30/2024] [Accepted: 07/03/2024] [Indexed: 07/12/2024] Open
Abstract
The combination of morphogenetic and transcription factors together with the synergic aid of noncoding RNAs and their cognate RNA binding proteins contribute to shape motor neurons (MN) identity. Here, we extend the noncoding perspective of human MN, by detailing the molecular and biological activity of CyCoNP (as Cytoplasmic Coordinator of Neural Progenitors) a highly expressed and MN-enriched human lncRNA. Through in silico prediction, in vivo RNA purification and loss of function experiments followed by RNA-sequencing, we found that CyCoNP sustains a specific neuron differentiation program, required for the physiology of both neuroblastoma cells and hiPSC-derived MN, which mainly involves miR-4492 and NCAM1 mRNA. We propose a novel lncRNA-mediated 'dual mode' of action, in which CyCoNP acts in trans as a classical RNA sponge by sequestering miR-4492 from its pro-neuronal targets, including NCAM1 mRNA, and at the same time it plays an additional role in cis by interacting with NCAM1 mRNA and regulating the availability and localization of the miR-4492 in its proximity. These data highlight novel insights into the noncoding RNA-mediated control of human neuron physiology and point out the importance of lncRNA-mediated interactions for the spatial distribution of regulatory molecules.
Collapse
Affiliation(s)
- Fabio Desideri
- Center for Life Nano- & Neuro-Science of Istituto Italiano di Tecnologia (IIT), 00161 Rome, Italy
| | - Alessandro Grazzi
- Center for Life Nano- & Neuro-Science of Istituto Italiano di Tecnologia (IIT), 00161 Rome, Italy
- Department of Biology and Biotechnologies "Charles Darwin", Sapienza University of Rome, 00185 Rome, Italy
| | - Michela Lisi
- Department of Biology and Biotechnologies "Charles Darwin", Sapienza University of Rome, 00185 Rome, Italy
| | - Adriano Setti
- Department of Biology and Biotechnologies "Charles Darwin", Sapienza University of Rome, 00185 Rome, Italy
| | - Tiziana Santini
- Department of Biology and Biotechnologies "Charles Darwin", Sapienza University of Rome, 00185 Rome, Italy
| | - Alessio Colantoni
- Center for Life Nano- & Neuro-Science of Istituto Italiano di Tecnologia (IIT), 00161 Rome, Italy
- Department of Biology and Biotechnologies "Charles Darwin", Sapienza University of Rome, 00185 Rome, Italy
| | - Gabriele Proietti
- Centre for Human Technologies (CHT), Istituto Italiano di Tecnologia (IIT), 16152 Genova, Italy
| | - Andrea Carvelli
- Department of Neuroscience, The Scripps Research institute, La Jolla, CA 92037, USA
| | - Gian Gaetano Tartaglia
- Centre for Human Technologies (CHT), Istituto Italiano di Tecnologia (IIT), 16152 Genova, Italy
| | - Monica Ballarino
- Department of Biology and Biotechnologies "Charles Darwin", Sapienza University of Rome, 00185 Rome, Italy
| | - Irene Bozzoni
- Center for Life Nano- & Neuro-Science of Istituto Italiano di Tecnologia (IIT), 00161 Rome, Italy
- Department of Biology and Biotechnologies "Charles Darwin", Sapienza University of Rome, 00185 Rome, Italy
| |
Collapse
|
5
|
Loving RK, Sullivan DK, Reese F, Rebboah E, Sakr J, Rezaie N, Liang HY, Filimban G, Kawauchi S, Oakes C, Trout D, Williams BA, MacGregor G, Wold BJ, Mortazavi A, Pachter L. Long-read sequencing transcriptome quantification with lr-kallisto. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.19.604364. [PMID: 39071335 PMCID: PMC11275803 DOI: 10.1101/2024.07.19.604364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
RNA abundance quantification has become routine and affordable thanks to high-throughput "short-read" technologies that provide accurate molecule counts at the gene level. Similarly accurate and affordable quantification of definitive full-length, transcript isoforms has remained a stubborn challenge, despite its obvious biological significance across a wide range of problems. "Long-read" sequencing platforms now produce data-types that can, in principle, drive routine definitive isoform quantification. However some particulars of contemporary long-read datatypes, together with isoform complexity and genetic variation, present bioinformatic challenges. We show here, using ONT data, that fast and accurate quantification of long-read data is possible and that it is improved by exome capture. To perform quantifications we developed lr-kallisto, which adapts the kallisto bulk and single-cell RNA-seq quantification methods for long-read technologies.
Collapse
Affiliation(s)
- Rebekah K. Loving
- Division of Biology and Biological Engineering, California Institute of Technology, USA
| | - Delaney K. Sullivan
- Division of Biology and Biological Engineering, California Institute of Technology, USA
- UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Fairlie Reese
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, USA
| | - Elisabeth Rebboah
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, USA
| | - Jasmine Sakr
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, USA
| | - Narges Rezaie
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, USA
| | - Heidi Y. Liang
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, USA
| | - Ghassan Filimban
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, USA
| | - Shimako Kawauchi
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
| | - Conrad Oakes
- Division of Biology and Biological Engineering, California Institute of Technology, USA
| | - Diane Trout
- Division of Biology and Biological Engineering, California Institute of Technology, USA
| | - Brian A. Williams
- Division of Biology and Biological Engineering, California Institute of Technology, USA
| | - Grant MacGregor
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
| | - Barbara J. Wold
- Division of Biology and Biological Engineering, California Institute of Technology, USA
| | - Ali Mortazavi
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, USA
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, USA
| |
Collapse
|
6
|
Berrocal-Rubio MA, Pawer YDJ, Dinevska M, De Paoli-Iseppi R, Widodo SS, Gleeson J, Rajab N, De Nardo W, Hallab J, Li A, Mantamadiotis T, Clark MB, Wells CA. Discovery of NRG1-VII: the myeloid-derived class of NRG1. BMC Genomics 2024; 25:814. [PMID: 39210279 PMCID: PMC11360300 DOI: 10.1186/s12864-024-10723-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 08/19/2024] [Indexed: 09/04/2024] Open
Abstract
The growth factor Neuregulin-1 (NRG1) has pleiotropic roles in proliferation and differentiation of the stem cell niche in different tissues. It has been implicated in gut, brain and muscle development and repair. Six isoform classes of NRG1 and over 28 protein isoforms have been previously described. Here we report a new class of NRG1, designated NRG1-VII to denote that these NRG1 isoforms arise from a myeloid-specific transcriptional start site (TSS) previously uncharacterized. Long-read sequencing was used to identify eight high-confidence NRG1-VII transcripts. These transcripts presented major structural differences from one another, through the use of cassette exons and alternative stop codons. Expression of NRG1-VII was confirmed in primary human monocytes and tissue resident macrophages and induced pluripotent stem cell-derived macrophages (iPSC-derived macrophages). Isoform switching via cassette exon usage and alternate polyadenylation was apparent during monocyte maturation and macrophage differentiation. NRG1-VII is the major class expressed by the myeloid lineage, including tissue-resident macrophages. Analysis of public gene expression data indicates that monocytes and macrophages are a primary source of NRG1. The size and structure of class VII isoforms suggests that they may be more diffusible through tissues than other NRG1 classes. However, the specific roles of class VII variants in tissue homeostasis and repair have not yet been determined.
Collapse
Affiliation(s)
- Miguel A Berrocal-Rubio
- Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Australia
| | - Yair David Joseph Pawer
- Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Australia
| | - Marija Dinevska
- Department of Surgery, Royal Melbourne Hospital, The University of Melbourne, Melbourne, Australia
| | - Ricardo De Paoli-Iseppi
- Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Australia
| | - Samuel S Widodo
- Department of Surgery, Royal Melbourne Hospital, The University of Melbourne, Melbourne, Australia
| | - Josie Gleeson
- Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Australia
| | - Nadia Rajab
- Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Australia
| | - Will De Nardo
- Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Australia
| | - Jeannette Hallab
- Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Australia
| | - Anran Li
- Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Australia
| | - Theo Mantamadiotis
- Department of Surgery, Royal Melbourne Hospital, The University of Melbourne, Melbourne, Australia
- Department of Microbiology and Immunology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Australia
| | - Michael B Clark
- Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Australia
| | - Christine A Wells
- Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Australia.
| |
Collapse
|
7
|
Kabza M, Ritter A, Byrne A, Sereti K, Le D, Stephenson W, Sterne-Weiler T. Accurate long-read transcript discovery and quantification at single-cell, pseudo-bulk and bulk resolution with Isosceles. Nat Commun 2024; 15:7316. [PMID: 39183289 PMCID: PMC11345431 DOI: 10.1038/s41467-024-51584-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 08/07/2024] [Indexed: 08/27/2024] Open
Abstract
Accurate detection and quantification of mRNA isoforms from nanopore long-read sequencing remains challenged by technical noise, particularly in single cells. To address this, we introduce Isosceles, a computational toolkit that outperforms other methods in isoform detection sensitivity and quantification accuracy across single-cell, pseudo-bulk and bulk resolution levels, as demonstrated using synthetic and biologically-derived datasets. Here we show Isosceles improves the fidelity of single-cell transcriptome quantification at the isoform-level, and enables flexible downstream analysis. As a case study, we apply Isosceles, uncovering coordinated splicing within and between neuronal differentiation lineages. Isosceles is suitable to be applied in diverse biological systems, facilitating studies of cellular heterogeneity across biomedical research applications.
Collapse
Affiliation(s)
- Michal Kabza
- Roche Informatics, F. Hoffmann-La Roche Ltd, Poznań, Poland
| | - Alexander Ritter
- Computational Biology & Translation, Genentech Inc., South San Francisco, CA, USA
| | - Ashley Byrne
- Department of Next Generation Sequencing and Microchemistry, Proteomics and Lipidomics, Genentech Inc., South San Francisco, CA, USA
| | - Kostianna Sereti
- Department of Discovery Oncology, Genentech Inc., South San Francisco, CA, USA
| | - Daniel Le
- Department of Next Generation Sequencing and Microchemistry, Proteomics and Lipidomics, Genentech Inc., South San Francisco, CA, USA
| | - William Stephenson
- Department of Next Generation Sequencing and Microchemistry, Proteomics and Lipidomics, Genentech Inc., South San Francisco, CA, USA
| | - Timothy Sterne-Weiler
- Computational Biology & Translation, Genentech Inc., South San Francisco, CA, USA.
- Department of Discovery Oncology, Genentech Inc., South San Francisco, CA, USA.
| |
Collapse
|
8
|
Santucci K, Cheng Y, Xu SM, Janitz M. Enhancing novel isoform discovery: leveraging nanopore long-read sequencing and machine learning approaches. Brief Funct Genomics 2024:elae031. [PMID: 39158328 DOI: 10.1093/bfgp/elae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 07/29/2024] [Accepted: 07/31/2024] [Indexed: 08/20/2024] Open
Abstract
Long-read sequencing technologies can capture entire RNA transcripts in a single sequencing read, reducing the ambiguity in constructing and quantifying transcript models in comparison to more common and earlier methods, such as short-read sequencing. Recent improvements in the accuracy of long-read sequencing technologies have expanded the scope for novel splice isoform detection and have also enabled a far more accurate reconstruction of complex splicing patterns and transcriptomes. Additionally, the incorporation and advancements of machine learning and deep learning algorithms in bioinformatic software have significantly improved the reliability of long-read sequencing transcriptomic studies. However, there is a lack of consensus on what bioinformatic tools and pipelines produce the most precise and consistent results. Thus, this review aims to discuss and compare the performance of available methods for novel isoform discovery with long-read sequencing technologies, with 25 tools being presented. Furthermore, this review intends to demonstrate the need for developing standard analytical pipelines, tools, and transcript model conventions for novel isoform discovery and transcriptomic studies.
Collapse
Affiliation(s)
- Kristina Santucci
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Yuning Cheng
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Si-Mei Xu
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Michael Janitz
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| |
Collapse
|
9
|
Ji HJ, Pertea M. Enhancing transcriptome expression quantification through accurate assignment of long RNA sequencing reads with TranSigner. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.13.589356. [PMID: 39185147 PMCID: PMC11343119 DOI: 10.1101/2024.04.13.589356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Recently developed long-read RNA sequencing technologies promise to provide a more accurate and comprehensive view of transcriptomes compared to short-read sequencers, primarily due to their capability to achieve full-length sequencing of transcripts. However, realizing this potential requires computational tools tailored to process long reads, which exhibit a higher error rate than short reads. Existing methods for assembling and quantifying long-read data often disagree on expressed transcripts and their abundance levels, leading researchers to lack confidence in the transcriptomes produced using this data. One approach to address the uncertainties in transcriptome assembly and quantification is by assigning the long reads to transcripts, enabling a more detailed characterization of transcript support at the read level. Here, we introduce TranSigner, a versatile tool that assigns long reads to any input transcriptome. TranSigner consists of three consecutive modules performing: read alignment to the given transcripts, computation of read-to-transcript compatibility based on alignment scores and positions, and execution of an expectation-maximization algorithm to probabilistically assign reads to transcripts and estimate transcript abundances. Using simulated data and experimental datasets from three well-studied organisms - Homo sapiens, Arabidopsis thaliana, and Mus musculus - we demonstrate that TranSigner achieves accurate read assignments, obtaining higher accuracy in transcript abundance estimation compared to existing tools.
Collapse
Affiliation(s)
- Hyun Joo Ji
- Center for Computational Biology, Johns Hopkins University; Baltimore, MD
- Department of Computer Science, Johns Hopkins University; Baltimore, MD
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University; Baltimore, MD
- Department of Computer Science, Johns Hopkins University; Baltimore, MD
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD
| |
Collapse
|
10
|
Abebe JS, Alwie Y, Fuhrmann E, Leins J, Mai J, Verstraten R, Schreiner S, Wilson AC, Depledge DP. Nanopore guided annotation of transcriptome architectures. mSystems 2024; 9:e0050524. [PMID: 38953320 PMCID: PMC11265410 DOI: 10.1128/msystems.00505-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 06/11/2024] [Indexed: 07/04/2024] Open
Abstract
Nanopore direct RNA sequencing (DRS) enables the capture and full-length sequencing of native RNAs, without recoding or amplification bias. Resulting data sets may be interrogated to define the identity and location of chemically modified ribonucleotides, as well as the length of poly(A) tails, on individual RNA molecules. The success of these analyses is highly dependent on the provision of high-resolution transcriptome annotations in combination with workflows that minimize misalignments and other analysis artifacts. Existing software solutions for generating high-resolution transcriptome annotations are poorly suited to small gene-dense genomes of viruses due to the challenge of identifying distinct transcript isoforms where alternative splicing and overlapping RNAs are prevalent. To resolve this, we identified key characteristics of DRS data sets that inform resulting read alignments and developed the nanopore guided annotation of transcriptome architectures (NAGATA) software package (https://github.com/DepledgeLab/NAGATA). We demonstrate, using a combination of synthetic and original DRS data sets derived from adenoviruses, herpesviruses, coronaviruses, and human cells, that NAGATA outperforms existing transcriptome annotation software and yields a consistently high level of precision and recall when reconstructing both gene sparse and gene-dense transcriptomes. Finally, we apply NAGATA to generate the first high-resolution transcriptome annotation of the neglected pathogen human adenovirus type F41 (HAdV-41) for which we identify 77 distinct transcripts encoding at least 23 different proteins. IMPORTANCE The transcriptome of an organism denotes the full repertoire of encoded RNAs that may be expressed. This is critical to understanding the biology of an organism and for accurate transcriptomic and epitranscriptomic-based analyses. Annotating transcriptomes remains a complex task, particularly in small gene-dense organisms such as viruses which maximize their coding capacity through overlapping RNAs. To resolve this, we have developed a new software nanopore guided annotation of transcriptome architectures (NAGATA) which utilizes nanopore direct RNA sequencing (DRS) datasets to rapidly produce high-resolution transcriptome annotations for diverse viruses and other organisms.
Collapse
Affiliation(s)
- Jonathan S. Abebe
- Department of Microbiology, New York University School of Medicine, New York, New York, USA
| | - Yasmine Alwie
- Institute of Virology, Hannover Medical School, Hannover, Germany
| | - Erik Fuhrmann
- Institute of Virology, Hannover Medical School, Hannover, Germany
| | - Jonas Leins
- Institute of Virology, Hannover Medical School, Hannover, Germany
| | - Julia Mai
- Institute of Virology, Hannover Medical School, Hannover, Germany
- Institute of Virology, University Medical Center, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Ruth Verstraten
- Institute of Virology, Hannover Medical School, Hannover, Germany
- German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Hannover, Germany
| | - Sabrina Schreiner
- Institute of Virology, Hannover Medical School, Hannover, Germany
- Institute of Virology, University Medical Center, Albert-Ludwigs-University Freiburg, Freiburg, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany
| | - Angus C. Wilson
- Department of Microbiology, New York University School of Medicine, New York, New York, USA
| | - Daniel P. Depledge
- Department of Microbiology, New York University School of Medicine, New York, New York, USA
- Institute of Virology, Hannover Medical School, Hannover, Germany
- German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Hannover, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany
| |
Collapse
|
11
|
Apostolides M, Choi B, Navickas A, Saberi A, Soto LM, Goodarzi H, Najafabadi HS. Accurate isoform quantification by joint short- and long-read RNA-sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.11.603067. [PMID: 39026819 PMCID: PMC11257535 DOI: 10.1101/2024.07.11.603067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Accurate quantification of transcript isoforms is crucial for understanding gene regulation, functional diversity, and cellular behavior. Existing RNA sequencing methods have significant limitations: short-read (SR) sequencing provides high depth but struggles with isoform deconvolution, whereas long-read (LR) sequencing offers isoform resolution at the cost of lower depth, higher noise, and technical biases. Addressing this gap, we introduce Multi-Platform Aggregation and Quantification of Transcripts (MPAQT), a generative model that combines the complementary strengths of different sequencing platforms to achieve state-of-the-art isoform-resolved transcript quantification, as demonstrated by extensive simulations and experimental benchmarks. By applying MPAQT to an in vitro model of human embryonic stem cell differentiation into cortical neurons, followed by machine learning-based modeling of transcript abundances, we show that untranslated regions (UTRs) are major determinants of isoform proportion and exon usage; this effect is mediated through isoform-specific sequence features embedded in UTRs, which likely interact with RNA-binding proteins that modulate mRNA stability. These findings highlight MPAQT's potential to enhance our understanding of transcriptomic complexity and underline the role of splicing-independent post-transcriptional mechanisms in shaping the isoform and exon usage landscape of the cell.
Collapse
Affiliation(s)
- Michael Apostolides
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Victor P. Dahdaleh Institute of Genomic Medicine, Montreal, QC, Canada
| | - Benedict Choi
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Albertas Navickas
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Present address: Institut Curie, PSL Research University, CNRS UMR3348, INSERM U1278, Orsay, France
| | - Ali Saberi
- Victor P. Dahdaleh Institute of Genomic Medicine, Montreal, QC, Canada
- Department of Electrical and Computer Engineering, McGill University, Montreal, Canada
| | - Larisa M. Soto
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Victor P. Dahdaleh Institute of Genomic Medicine, Montreal, QC, Canada
| | - Hani Goodarzi
- Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA
- Department of Urology, University of California, San Francisco, San Francisco, CA, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Arc Institute, 3181 Porter Drive, Palo Alto, CA, USA
| | - Hamed S. Najafabadi
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Victor P. Dahdaleh Institute of Genomic Medicine, Montreal, QC, Canada
- McGill Centre for RNA Sciences, McGill University, Montreal, Canada
| |
Collapse
|
12
|
Tang AD, Felton C, Hrabeta-Robinson E, Volden R, Vollmers C, Brooks AN. Detecting haplotype-specific transcript variation in long reads with FLAIR2. Genome Biol 2024; 25:173. [PMID: 38956576 PMCID: PMC11218413 DOI: 10.1186/s13059-024-03301-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 06/06/2024] [Indexed: 07/04/2024] Open
Abstract
BACKGROUND RNA-seq has brought forth significant discoveries regarding aberrations in RNA processing, implicating these RNA variants in a variety of diseases. Aberrant splicing and single nucleotide variants (SNVs) in RNA have been demonstrated to alter transcript stability, localization, and function. In particular, the upregulation of ADAR, an enzyme that mediates adenosine-to-inosine editing, has been previously linked to an increase in the invasiveness of lung adenocarcinoma cells and associated with splicing regulation. Despite the functional importance of studying splicing and SNVs, the use of short-read RNA-seq has limited the community's ability to interrogate both forms of RNA variation simultaneously. RESULTS We employ long-read sequencing technology to obtain full-length transcript sequences, elucidating cis-effects of variants on splicing changes at a single molecule level. We develop a computational workflow that augments FLAIR, a tool that calls isoform models expressed in long-read data, to integrate RNA variant calls with the associated isoforms that bear them. We generate nanopore data with high sequence accuracy from H1975 lung adenocarcinoma cells with and without knockdown of ADAR. We apply our workflow to identify key inosine isoform associations to help clarify the prominence of ADAR in tumorigenesis. CONCLUSIONS Ultimately, we find that a long-read approach provides valuable insight toward characterizing the relationship between RNA variants and splicing patterns.
Collapse
Affiliation(s)
- Alison D Tang
- Department of Biomolecular Engineering, University of California, Santa Cruz, USA
| | - Colette Felton
- Department of Biomolecular Engineering, University of California, Santa Cruz, USA
| | - Eva Hrabeta-Robinson
- Department of Biomolecular Engineering, University of California, Santa Cruz, USA
| | - Roger Volden
- Department of Biomolecular Engineering, University of California, Santa Cruz, USA
| | - Christopher Vollmers
- Department of Biomolecular Engineering, University of California, Santa Cruz, USA
| | - Angela N Brooks
- Department of Biomolecular Engineering, University of California, Santa Cruz, USA.
| |
Collapse
|
13
|
Pardo-Palacios FJ, Wang D, Reese F, Diekhans M, Carbonell-Sala S, Williams B, Loveland JE, De María M, Adams MS, Balderrama-Gutierrez G, Behera AK, Gonzalez Martinez JM, Hunt T, Lagarde J, Liang CE, Li H, Meade MJ, Moraga Amador DA, Prjibelski AD, Birol I, Bostan H, Brooks AM, Çelik MH, Chen Y, Du MRM, Felton C, Göke J, Hafezqorani S, Herwig R, Kawaji H, Lee J, Li JL, Lienhard M, Mikheenko A, Mulligan D, Nip KM, Pertea M, Ritchie ME, Sim AD, Tang AD, Wan YK, Wang C, Wong BY, Yang C, Barnes I, Berry AE, Capella-Gutierrez S, Cousineau A, Dhillon N, Fernandez-Gonzalez JM, Ferrández-Peral L, Garcia-Reyero N, Götz S, Hernández-Ferrer C, Kondratova L, Liu T, Martinez-Martin A, Menor C, Mestre-Tomás J, Mudge JM, Panayotova NG, Paniagua A, Repchevsky D, Ren X, Rouchka E, Saint-John B, Sapena E, Sheynkman L, Smith ML, Suner MM, Takahashi H, Youngworth IA, Carninci P, Denslow ND, Guigó R, Hunter ME, Maehr R, Shen Y, Tilgner HU, Wold BJ, Vollmers C, Frankish A, Au KF, Sheynkman GM, Mortazavi A, Conesa A, Brooks AN. Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. Nat Methods 2024; 21:1349-1363. [PMID: 38849569 PMCID: PMC11543605 DOI: 10.1038/s41592-024-02298-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 05/03/2024] [Indexed: 06/09/2024]
Abstract
The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse and manatee species. Developers utilized these data to address challenges in transcript isoform detection, quantification and de novo transcript detection. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. Incorporating additional orthogonal data and replicate samples is advised when aiming to detect rare and novel transcripts or using reference-free approaches. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.
Collapse
Affiliation(s)
| | - Dingjie Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Fairlie Reese
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Sílvia Carbonell-Sala
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Brian Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK
| | - Maite De María
- Department of Physiological Sciences, College of Veterinary Medicine, Gainesville, FL, USA
- Cherokee Nation System Solutions, contractor to the US Geological Survey-Wetland and Aquatic Research Center, Gainesville, FL, USA
| | - Matthew S Adams
- Department of Molecular Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Gabriela Balderrama-Gutierrez
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
| | - Amit K Behera
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Jose M Gonzalez Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK
| | - Julien Lagarde
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Flomics Biotech, SL, Barcelona, Spain
| | - Cindy E Liang
- Department of Molecular Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Haoran Li
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Marcus Jerryd Meade
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - David A Moraga Amador
- Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL, USA
| | - Andrey D Prjibelski
- Department of Computer Science, University of Helsinki, Helsinki, Finland
- Center for Bioinformatics and Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - Hamed Bostan
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC, USA
| | - Ashley M Brooks
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC, USA
| | - Muhammed Hasan Çelik
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
| | - Ying Chen
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Mei R M Du
- Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
| | - Colette Felton
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Jonathan Göke
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore
| | - Saber Hafezqorani
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - Ralf Herwig
- Department Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Berlin, Germany
| | - Hideya Kawaji
- Research Center for Genome & Medical Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Joseph Lee
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Jian-Liang Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC, USA
| | - Matthias Lienhard
- Department Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Berlin, Germany
| | - Alla Mikheenko
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - Dennis Mulligan
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Ka Ming Nip
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - Mihaela Pertea
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Matthew E Ritchie
- Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Andre D Sim
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Alison D Tang
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Yuk Kei Wan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Changqing Wang
- Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
| | - Brandon Y Wong
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Chen Yang
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK
| | - Andrew E Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK
| | | | - Alyssa Cousineau
- Program in Molecular Medicine, Diabetes Center of Excellence, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Namrita Dhillon
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Luis Ferrández-Peral
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
| | - Natàlia Garcia-Reyero
- Energy, Installations & Environment, Office of the Assistant Secretary of Defense, Washington, DC, USA
| | | | | | | | | | | | | | - Jorge Mestre-Tomás
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK
| | - Nedka G Panayotova
- Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL, USA
| | - Alejandro Paniagua
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
| | | | - Xingjie Ren
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Eric Rouchka
- Department of Biochemistry & Molecular Genetics, University of Louisville, Louisville, KY, USA
| | - Brandon Saint-John
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Enrique Sapena
- European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Leon Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Melissa Laird Smith
- Department of Biochemistry & Molecular Genetics, University of Louisville, Louisville, KY, USA
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK
| | - Hazuki Takahashi
- Center for Integrative Medical Sciences, Laboratory for Transcriptome Technology, RIKEN, Yokohama, Japan
| | | | - Piero Carninci
- Center for Integrative Medical Sciences, Laboratory for Transcriptome Technology, RIKEN, Yokohama, Japan
- Human Technopole, Milano, Italy
| | - Nancy D Denslow
- Department of Physiological Sciences, College of Veterinary Medicine, Gainesville, FL, USA
- Center for Environmental and Human Toxicology, Department of Physiological Sciences, University of Florida, Gainesville, FL, USA
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Margaret E Hunter
- US Geological Survey, Wetland and Aquatic Research Center, Gainesville, FL, USA
| | - Rene Maehr
- Program in Molecular Medicine, Diabetes Center of Excellence, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Yin Shen
- Institute for Human Genetics, Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Hagen U Tilgner
- Brain and Mind Research Institute and Center for Neurogenetics, Weill Cornell Medicine, New York City, NY, USA
| | - Barbara J Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Christopher Vollmers
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA.
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK.
| | - Kin Fai Au
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| | - Gloria M Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA.
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA.
- UVA Cancer Center, University of Virginia, Charlottesville, VA, USA.
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA.
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA.
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain.
- Microbiology and Cell Science Department, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL, USA.
| | - Angela N Brooks
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA.
| |
Collapse
|
14
|
Page ML, Aguzzoli Heberle B, Brandon JA, Wadsworth ME, Gordon LA, Nations KA, Ebbert MTW. Surveying the landscape of RNA isoform diversity and expression across 9 GTEx tissues using long-read sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.13.579945. [PMID: 38405825 PMCID: PMC10888753 DOI: 10.1101/2024.02.13.579945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Even though alternative RNA splicing was discovered nearly 50 years ago (1977), we still understand very little about most isoforms arising from a single gene, including in which tissues they are expressed and if their functions differ. Human gene annotations suggest remarkable transcriptional complexity, with approximately 252,798 distinct RNA isoform annotations from 62,710 gene bodies (Ensembl v109; 2023), emphasizing the need to understand their biological effects. For example, 256 gene bodies have ≥50 annotated isoforms and 30 have ≥100, where one protein-coding gene (MAPK10) even has 192 distinct RNA isoform annotations. Whether such isoform diversity results from biological redundancy or spurious alternative splicing (i.e., noise), or whether individual isoforms have specialized functions (even if subtle) remains a mystery for most genes. Recent studies by Aguzzoli-Heberle et al., Leung et al., and Glinos et al. demonstrated long-read RNAseq enables improved RNA isoform quantification for essentially any tissue, cell type, or biological condition (e.g., disease, development, aging, etc.), making it possible to better assess individual isoform expression and function. While each study provided important discoveries related to RNA isoform diversity, deeper exploration is needed. We sought to quantify and characterize real isoform usage across tissues (compared to annotations). We used long-read RNAseq data from 58 GTEx samples across nine tissues (three brain, two heart, muscle, lung, liver, and cultured fibroblasts) generated by Glinos et al. and found considerable isoform diversity within and across tissues. Cerebellar hemisphere was the most transcriptionally complex tissue (22,522 distinct isoforms; 3,726 unique); liver was least diverse (12,435 distinct isoforms; 1,039 unique). We highlight gene clusters exhibiting high tissue-specific isoform diversity per tissue (e.g., TPM1 expresses 19 in heart's atrial appendage). We also validated 447 of the 700 new isoforms discovered by Aguzzoli-Heberle et al. and found that 88 were expressed in all nine tissues, while 58 were specific to a single tissue. This study represents a broad survey of the RNA isoform landscape, demonstrating isoform diversity across nine tissues and emphasizes the need to better understand how individual isoforms from a single gene body contribute to human health and disease.
Collapse
Affiliation(s)
- Madeline L. Page
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
- Division of Biomedical Informatics, Department of Internal Medicine, College of Medicine, University of Kentucky, Lexington, KY
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY
| | - Bernardo Aguzzoli Heberle
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
- Division of Biomedical Informatics, Department of Internal Medicine, College of Medicine, University of Kentucky, Lexington, KY
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY
| | - J. Anthony Brandon
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
- Division of Biomedical Informatics, Department of Internal Medicine, College of Medicine, University of Kentucky, Lexington, KY
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY
| | - Mark E. Wadsworth
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
- Division of Biomedical Informatics, Department of Internal Medicine, College of Medicine, University of Kentucky, Lexington, KY
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY
| | - Lacey A. Gordon
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
- Division of Biomedical Informatics, Department of Internal Medicine, College of Medicine, University of Kentucky, Lexington, KY
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY
| | - Kayla A. Nations
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
- Division of Biomedical Informatics, Department of Internal Medicine, College of Medicine, University of Kentucky, Lexington, KY
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY
| | - Mark T. W. Ebbert
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
- Division of Biomedical Informatics, Department of Internal Medicine, College of Medicine, University of Kentucky, Lexington, KY
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY
| |
Collapse
|
15
|
Jones EF, Howton TC, Flanary VL, Clark AD, Lasseigne BN. Long-read RNA sequencing identifies region- and sex-specific C57BL/6J mouse brain mRNA isoform expression and usage. Mol Brain 2024; 17:40. [PMID: 38902764 PMCID: PMC11188239 DOI: 10.1186/s13041-024-01112-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 06/08/2024] [Indexed: 06/22/2024] Open
Abstract
Alternative splicing (AS) contributes to the biological heterogeneity between species, sexes, tissues, and cell types. Many diseases are either caused by alterations in AS or by alterations to AS. Therefore, measuring AS accurately and efficiently is critical for assessing molecular phenotypes, including those associated with disease. Long-read sequencing enables more accurate quantification of differentially spliced isoform expression than short-read sequencing approaches, and third-generation platforms facilitate high-throughput experiments. To assess differences in AS across the cerebellum, cortex, hippocampus, and striatum by sex, we generated and analyzed Oxford Nanopore Technologies (ONT) long-read RNA sequencing (lrRNA-Seq) C57BL/6J mouse brain cDNA libraries. From > 85 million reads that passed quality control metrics, we calculated differential gene expression (DGE), differential transcript expression (DTE), and differential transcript usage (DTU) across brain regions and by sex. We found significant DGE, DTE, and DTU across brain regions and that the cerebellum had the most differences compared to the other three regions. Additionally, we found region-specific differential splicing between sexes, with the most sex differences in DTU in the cortex and no DTU in the hippocampus. We also report on two distinct patterns of sex DTU we observed, sex-divergent and sex-specific, that could potentially help explain sex differences in the prevalence and prognosis of various neurological and psychiatric disorders in future studies. Finally, we built a Shiny web application for researchers to explore the data further. Our study provides a resource for the community; it underscores the importance of AS in biological heterogeneity and the utility of long-read sequencing to better understand AS in the brain.
Collapse
Affiliation(s)
- Emma F Jones
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States of America
| | - Timothy C Howton
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States of America
| | - Victoria L Flanary
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States of America
| | - Amanda D Clark
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States of America
| | - Brittany N Lasseigne
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, United States of America.
| |
Collapse
|
16
|
Gribling-Burrer AS, Bohn P, Smyth RP. Isoform-specific RNA structure determination using Nano-DMS-MaP. Nat Protoc 2024; 19:1835-1865. [PMID: 38347203 DOI: 10.1038/s41596-024-00959-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 12/12/2023] [Indexed: 06/12/2024]
Abstract
RNA structure determination is essential to understand how RNA carries out its diverse biological functions. In cells, RNA isoforms are readily expressed with partial variations within their sequences due, for example, to alternative splicing, heterogeneity in the transcription start site, RNA processing or differential termination/polyadenylation. Nanopore dimethyl sulfate mutational profiling (Nano-DMS-MaP) is a method for in situ isoform-specific RNA structure determination. Unlike similar methods that rely on short sequencing reads, Nano-DMS-MaP employs nanopore sequencing to resolve the structures of long and highly similar RNA molecules to reveal their previously hidden structural differences. This Protocol describes the development and applications of Nano-DMS-MaP and outlines the main considerations for designing and implementing a successful experiment: from bench to data analysis. In cell probing experiments can be carried out by an experienced molecular biologist in 3-4 d. Data analysis requires good knowledge of command line tools and Python scripts and requires a further 3-5 d.
Collapse
Affiliation(s)
- Anne-Sophie Gribling-Burrer
- Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg, Germany.
| | - Patrick Bohn
- Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg, Germany.
| | - Redmond P Smyth
- Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg, Germany.
- Faculty of Medicine, University of Würzburg, Würzburg, Germany.
| |
Collapse
|
17
|
Aguzzoli Heberle B, Brandon JA, Page ML, Nations KA, Dikobe KI, White BJ, Gordon LA, Fox GA, Wadsworth ME, Doyle PH, Williams BA, Fox EJ, Shantaraman A, Ryten M, Goodwin S, Ghiban E, Wappel R, Mavruk-Eskipehlivan S, Miller JB, Seyfried NT, Nelson PT, Fryer JD, Ebbert MTW. Mapping medically relevant RNA isoform diversity in the aged human frontal cortex with deep long-read RNA-seq. Nat Biotechnol 2024:10.1038/s41587-024-02245-9. [PMID: 38778214 DOI: 10.1038/s41587-024-02245-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 04/15/2024] [Indexed: 05/25/2024]
Abstract
Determining whether the RNA isoforms from medically relevant genes have distinct functions could facilitate direct targeting of RNA isoforms for disease treatment. Here, as a step toward this goal for neurological diseases, we sequenced 12 postmortem, aged human frontal cortices (6 Alzheimer disease cases and 6 controls; 50% female) using one Oxford Nanopore PromethION flow cell per sample. We identified 1,917 medically relevant genes expressing multiple isoforms in the frontal cortex where 1,018 had multiple isoforms with different protein-coding sequences. Of these 1,018 genes, 57 are implicated in brain-related diseases including major depression, schizophrenia, Parkinson's disease and Alzheimer disease. Our study also uncovered 53 new RNA isoforms in medically relevant genes, including several where the new isoform was one of the most highly expressed for that gene. We also reported on five mitochondrially encoded, spliced RNA isoforms. We found 99 differentially expressed RNA isoforms between cases with Alzheimer disease and controls.
Collapse
Affiliation(s)
- Bernardo Aguzzoli Heberle
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY, USA
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - J Anthony Brandon
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY, USA
| | - Madeline L Page
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY, USA
| | - Kayla A Nations
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY, USA
| | - Ketsile I Dikobe
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY, USA
| | - Brendan J White
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY, USA
| | - Lacey A Gordon
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY, USA
| | - Grant A Fox
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY, USA
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - Mark E Wadsworth
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY, USA
| | - Patricia H Doyle
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY, USA
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - Brittney A Williams
- Department of Pharmacology and Nutritional Sciences, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - Edward J Fox
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA, USA
| | | | - Mina Ryten
- UK Dementia Research Institute at The University of Cambridge, Cambridge, UK
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Cambridge, UK
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Elena Ghiban
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Robert Wappel
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | - Justin B Miller
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY, USA
- Division of Biomedical Informatics, Internal Medicine, College of Medicine, University of Kentucky, Lexington, KY, USA
- Department of Pathology and Laboratory Medicine, University of Kentucky, Lexington, KY, USA
- Microbiology, Immunology and Molecular Genetics, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - Nicholas T Seyfried
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA, USA
| | - Peter T Nelson
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY, USA
| | - John D Fryer
- Department of Neuroscience, Mayo Clinic, Scottsdale, AZ, USA
| | - Mark T W Ebbert
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY, USA.
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY, USA.
- Division of Biomedical Informatics, Internal Medicine, College of Medicine, University of Kentucky, Lexington, KY, USA.
| |
Collapse
|
18
|
Maździarz M, Krawczyk K, Kurzyński M, Paukszto Ł, Szablińska-Piernik J, Szczecińska M, Sulima P, Sawicki J. Epitranscriptome insights into Riccia fluitans L. (Marchantiophyta) aquatic transition using nanopore direct RNA sequencing. BMC PLANT BIOLOGY 2024; 24:399. [PMID: 38745128 PMCID: PMC11094948 DOI: 10.1186/s12870-024-05114-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 05/07/2024] [Indexed: 05/16/2024]
Abstract
BACKGROUND Riccia fluitans, an amphibious liverwort, exhibits a fascinating adaptation mechanism to transition between terrestrial and aquatic environments. Utilizing nanopore direct RNA sequencing, we try to capture the complex epitranscriptomic changes undergone in response to land-water transition. RESULTS A significant finding is the identification of 45 differentially expressed genes (DEGs), with a split of 33 downregulated in terrestrial forms and 12 upregulated in aquatic forms, indicating a robust transcriptional response to environmental changes. Analysis of N6-methyladenosine (m6A) modifications revealed 173 m6A sites in aquatic and only 27 sites in the terrestrial forms, indicating a significant increase in methylation in the former, which could facilitate rapid adaptation to changing environments. The aquatic form showed a global elongation bias in poly(A) tails, which is associated with increased mRNA stability and efficient translation, enhancing the plant's resilience to water stress. Significant differences in polyadenylation signals were observed between the two forms, with nine transcripts showing notable changes in tail length, suggesting an adaptive mechanism to modulate mRNA stability and translational efficiency in response to environmental conditions. This differential methylation and polyadenylation underline a sophisticated layer of post-transcriptional regulation, enabling Riccia fluitans to fine-tune gene expression in response to its living conditions. CONCLUSIONS These insights into transcriptome dynamics offer a deeper understanding of plant adaptation strategies at the molecular level, contributing to the broader knowledge of plant biology and evolution. These findings underscore the sophisticated post-transcriptional regulatory strategies Riccia fluitans employs to navigate the challenges of aquatic versus terrestrial living, highlighting the plant's dynamic adaptation to environmental stresses and its utility as a model for studying adaptation mechanisms in amphibious plants.
Collapse
Affiliation(s)
- Mateusz Maździarz
- Department of Botany and Evolutionary Ecology, University of Warmia and Mazury in Olsztyn, Plac Łódzki 1, Olsztyn, 10-719, Poland
| | - Katarzyna Krawczyk
- Department of Botany and Evolutionary Ecology, University of Warmia and Mazury in Olsztyn, Plac Łódzki 1, Olsztyn, 10-719, Poland
| | - Mateusz Kurzyński
- Department of Botany and Evolutionary Ecology, University of Warmia and Mazury in Olsztyn, Plac Łódzki 1, Olsztyn, 10-719, Poland
| | - Łukasz Paukszto
- Department of Botany and Evolutionary Ecology, University of Warmia and Mazury in Olsztyn, Plac Łódzki 1, Olsztyn, 10-719, Poland
| | - Joanna Szablińska-Piernik
- Department of Botany and Evolutionary Ecology, University of Warmia and Mazury in Olsztyn, Plac Łódzki 1, Olsztyn, 10-719, Poland
| | - Monika Szczecińska
- Department of Botany and Evolutionary Ecology, University of Warmia and Mazury in Olsztyn, Plac Łódzki 1, Olsztyn, 10-719, Poland
| | - Paweł Sulima
- Department of Genetics, Plant Breeding and Bioresource Engineering, University of Warmia and Mazury in Olsztyn, Plac Łódzki 3, Olsztyn, 10-724, Poland
| | - Jakub Sawicki
- Department of Botany and Evolutionary Ecology, University of Warmia and Mazury in Olsztyn, Plac Łódzki 1, Olsztyn, 10-719, Poland.
| |
Collapse
|
19
|
Su Y, Yu Z, Jin S, Ai Z, Yuan R, Chen X, Xue Z, Guo Y, Chen D, Liang H, Liu Z, Liu W. Comprehensive assessment of mRNA isoform detection methods for long-read sequencing data. Nat Commun 2024; 15:3972. [PMID: 38730241 PMCID: PMC11087464 DOI: 10.1038/s41467-024-48117-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 04/19/2024] [Indexed: 05/12/2024] Open
Abstract
The advancement of Long-Read Sequencing (LRS) techniques has significantly increased the length of sequencing to several kilobases, thereby facilitating the identification of alternative splicing events and isoform expressions. Recently, numerous computational tools for isoform detection using long-read sequencing data have been developed. Nevertheless, there remains a deficiency in comparative studies that systemically evaluate the performance of these tools, which are implemented with different algorithms, under various simulations that encompass potential influencing factors. In this study, we conducted a benchmark analysis of thirteen methods implemented in nine tools capable of identifying isoform structures from long-read RNA-seq data. We evaluated their performances using simulated data, which represented diverse sequencing platforms generated by an in-house simulator, RNA sequins (sequencing spike-ins) data, as well as experimental data. Our findings demonstrate IsoQuant as a highly effective tool for isoform detection with LRS, with Bambu and StringTie2 also exhibiting strong performance. These results offer valuable guidance for future research on alternative splicing analysis and the ongoing improvement of tools for isoform detection using LRS data.
Collapse
Affiliation(s)
- Yaqi Su
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
| | - Zhejian Yu
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Siqian Jin
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Zhipeng Ai
- Division of Human Reproduction and Developmental Genetics, Women's Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310006, Zhejiang, China
| | - Ruihong Yuan
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Xinyi Chen
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Ziwei Xue
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Yixin Guo
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Di Chen
- Center for Reproductive Medicine of the Second Affiliated Hospital Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China
- Centre for Regeneration and Cell Therapy of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Hongqing Liang
- Division of Human Reproduction and Developmental Genetics, Women's Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310006, Zhejiang, China
| | - Zuozhu Liu
- Zhejiang University-Angel Align Inc. R&D Center for Intelligent Healthcare, Zhejiang University-University of Illinois at Urbana-Champaign Institute (ZJU-UIUC Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China
| | - Wanlu Liu
- Department of Orthopedic Surgery of the Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310009, Zhejiang, China.
- Centre of Biomedical Systems and Informatics of Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), International Campus, Zhejiang University, Haining, 314400, Zhejiang, China.
- Future Health Laboratory, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314100, China.
- Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
20
|
Abebe JS, Alwie Y, Fuhrmann E, Leins J, Mai J, Verstraten R, Schreiner S, Wilson AC, Depledge DP. Nanopore Guided Annotation of Transcriptome Architectures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.02.587744. [PMID: 38617228 PMCID: PMC11014626 DOI: 10.1101/2024.04.02.587744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
High-resolution annotations of transcriptomes from all domains of life are essential for many sequencing-based RNA analyses, including Nanopore direct RNA sequencing (DRS), which would otherwise be hindered by misalignments and other analysis artefacts. DRS allows the capture and full-length sequencing of native RNAs, without recoding or amplification bias, and resulting data may be interrogated to define the identity and location of chemically modified ribonucleotides, as well as the length of poly(A) tails on individual RNA molecules. Existing software solutions for generating high-resolution transcriptome annotations are poorly suited to small gene dense organisms such as viruses due to the challenge of identifying distinct transcript isoforms where alternative splicing and overlapping RNAs are prevalent. To resolve this, we identified key characteristics of DRS datasets and developed a novel approach to transcriptome. We demonstrate, using a combination of synthetic and original datasets, that our novel approach yields a high level of precision and recall when reconstructing both gene sparse and gene dense transcriptomes from DRS datasets. We further apply this approach to generate a new high resolution transcriptome annotation of the neglected pathogen human adenovirus type F 41 for which we identify 77 distinct transcripts encoding at least 23 different proteins.
Collapse
Affiliation(s)
- Jonathan S. Abebe
- Department of Microbiology, New York University School of Medicine, New York, NY, USA
| | - Yasmine Alwie
- Institute of Virology, Hannover Medical School, Hannover, Germany
| | - Erik Fuhrmann
- Institute of Virology, Hannover Medical School, Hannover, Germany
| | - Jonas Leins
- Institute of Virology, Hannover Medical School, Hannover, Germany
| | - Julia Mai
- Institute of Virology, Hannover Medical School, Hannover, Germany
- Institute of Virology, University Medical Center, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Ruth Verstraten
- Institute of Virology, Hannover Medical School, Hannover, Germany
- German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Hannover, Germany
| | - Sabrina Schreiner
- Institute of Virology, Hannover Medical School, Hannover, Germany
- Institute of Virology, University Medical Center, Albert-Ludwigs-University Freiburg, Freiburg, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany
| | - Angus C. Wilson
- Department of Microbiology, New York University School of Medicine, New York, NY, USA
| | - Daniel P. Depledge
- Department of Microbiology, New York University School of Medicine, New York, NY, USA
- Institute of Virology, Hannover Medical School, Hannover, Germany
- German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Hannover, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany
| |
Collapse
|
21
|
Yuan CU, Quah FX, Hemberg M. Single-cell and spatial transcriptomics: Bridging current technologies with long-read sequencing. Mol Aspects Med 2024; 96:101255. [PMID: 38368637 DOI: 10.1016/j.mam.2024.101255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 01/30/2024] [Accepted: 02/07/2024] [Indexed: 02/20/2024]
Abstract
Single-cell technologies have transformed biomedical research over the last decade, opening up new possibilities for understanding cellular heterogeneity, both at the genomic and transcriptomic level. In addition, more recent developments of spatial transcriptomics technologies have made it possible to profile cells in their tissue context. In parallel, there have been substantial advances in sequencing technologies, and the third generation of methods are able to produce reads that are tens of kilobases long, with error rates matching the second generation short reads. Long reads technologies make it possible to better map large genome rearrangements and quantify isoform specific abundances. This further improves our ability to characterize functionally relevant heterogeneity. Here, we show how researchers have begun to combine single-cell, spatial transcriptomics, and long-read technologies, and how this is resulting in powerful new approaches to profiling both the genome and the transcriptome. We discuss the achievements so far, and we highlight remaining challenges and opportunities.
Collapse
Affiliation(s)
- Chengwei Ulrika Yuan
- Department of Biochemistry, University of Cambridge, Cambridge, UK; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Fu Xiang Quah
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Martin Hemberg
- Gene Lay Institute, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
22
|
Murali M, Saquing J, Lu S, Gao Z, Jordan B, Wakefield ZP, Fiszbein A, Cooper DR, Castaldi PJ, Korkin D, Sheynkman G. Biosurfer for systematic tracking of regulatory mechanisms leading to protein isoform diversity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.15.585320. [PMID: 38559226 PMCID: PMC10980011 DOI: 10.1101/2024.03.15.585320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Long-read RNA sequencing has shed light on transcriptomic complexity, but questions remain about the functionality of downstream protein products. We introduce Biosurfer, a computational approach for comparing protein isoforms, while systematically tracking the transcriptional, splicing, and translational variations that underlie differences in the sequences of the protein products. Using Biosurfer, we analyzed the differences in 32,799 pairs of GENCODE annotated protein isoforms, finding a majority (70%) of variable N-termini are due to the alternative transcription start sites, while only 9% arise from 5' UTR alternative splicing. Biosurfer's detailed tracking of nucleotide-to-residue relationships helped reveal an uncommonly tracked source of single amino acid residue changes arising from the codon splits at junctions. For 17% of internal sequence changes, such split codon patterns lead to single residue differences, termed "ragged codons". Of variable C-termini, 72% involve splice- or intron retention-induced reading frameshifts. We found an unusual pattern of reading frame changes, in which the first frameshift is closely followed by a distinct second frameshift that restores the original frame, which we term a "snapback" frameshift. We analyzed long read RNA-seq-predicted proteome of a human cell line and found similar trends as compared to our GENCODE analysis, with the exception of a higher proportion of isoforms predicted to undergo nonsense-mediated decay. Biosurfer's comprehensive characterization of long-read RNA-seq datasets should accelerate insights of the functional role of protein isoforms, providing mechanistic explanation of the origins of the proteomic diversity driven by the alternative splicing. Biosurfer is available as a Python package at https://github.com/sheynkman-lab/biosurfer.
Collapse
Affiliation(s)
- Mayank Murali
- Broad Institute of MIT and Harvard University, Cambridge, MA, USA
| | - Jamie Saquing
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Senbao Lu
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Ziyang Gao
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Ben Jordan
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Zachary Peters Wakefield
- Bioinformatics Program, Boston University, Boston, MA, USA
- Department of Biology, Boston University, Boston, MA, USA
| | - Ana Fiszbein
- Bioinformatics Program, Boston University, Boston, MA, USA
- Department of Biology, Boston University, Boston, MA, USA
| | - David R. Cooper
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Peter J. Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Division of General Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Dmitry Korkin
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Gloria Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, USA
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
- UVA Cancer Center, University of Virginia, Charlottesville, VA, USA
| |
Collapse
|
23
|
Jousheghani ZZ, Patro R. Oarfish: Enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.28.582591. [PMID: 38464200 PMCID: PMC10925290 DOI: 10.1101/2024.02.28.582591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Motivation Long read sequencing technology is becoming an increasingly indispensable tool in genomic and transcriptomic analysis. In transcriptomics in particular, long reads offer the possibility of sequencing full-length isoforms, which can vastly simplify the identification of novel transcripts and transcript quantification. However, despite this promise, the focus of much long read method development to date has been on transcript identification, with comparatively little attention paid to quantification. Yet, due to differences in the underlying protocols and technologies, lower throughput (i.e. fewer reads sequenced per sample compared to short read technologies), as well as technical artifacts, long read quantification remains a challenge, motivating the continued development and assessment of quantification methods tailored to this increasingly prevalent type of data. Results We introduce a new method and software tool for long read transcript quantification called oarfish. Our model incorporates a novel and innovative coverage score, which affects the conditional probability of fragment assignment in the underlying probabilistic model. We demonstrate that by accounting for this coverage information, oarfish is able to produce more accurate quantification estimates than existing long read quantification methods, particularly when one considers the primary isoforms present in a particular cell line or tissue type. Availability and Implementation Oarfish is implemented in the Rust programming language, and is made available as free and open-source software under the BSD 3-clause license. The source code is available at https://www.github.com/COMBINE-lab/oarfish.
Collapse
Affiliation(s)
- Zahra Zare Jousheghani
- Department of Electrical and Computer Engineering, University of Maryland, College Park, 20742, Maryland, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, 20742, Maryland, USA
| |
Collapse
|
24
|
Seddighi S, Qi YA, Brown AL, Wilkins OG, Bereda C, Belair C, Zhang YJ, Prudencio M, Keuss MJ, Khandeshi A, Pickles S, Kargbo-Hill SE, Hawrot J, Ramos DM, Yuan H, Roberts J, Sacramento EK, Shah SI, Nalls MA, Colón-Mercado JM, Reyes JF, Ryan VH, Nelson MP, Cook CN, Li Z, Screven L, Kwan JY, Mehta PR, Zanovello M, Hallegger M, Shantaraman A, Ping L, Koike Y, Oskarsson B, Staff NP, Duong DM, Ahmed A, Secrier M, Ule J, Jacobson S, Reich DS, Rohrer JD, Malaspina A, Dickson DW, Glass JD, Ori A, Seyfried NT, Maragkakis M, Petrucelli L, Fratta P, Ward ME. Mis-spliced transcripts generate de novo proteins in TDP-43-related ALS/FTD. Sci Transl Med 2024; 16:eadg7162. [PMID: 38277467 PMCID: PMC11325748 DOI: 10.1126/scitranslmed.adg7162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 01/23/2024] [Indexed: 01/28/2024]
Abstract
Functional loss of TDP-43, an RNA binding protein genetically and pathologically linked to amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD), leads to the inclusion of cryptic exons in hundreds of transcripts during disease. Cryptic exons can promote the degradation of affected transcripts, deleteriously altering cellular function through loss-of-function mechanisms. Here, we show that mRNA transcripts harboring cryptic exons generated de novo proteins in TDP-43-depleted human iPSC-derived neurons in vitro, and de novo peptides were found in cerebrospinal fluid (CSF) samples from patients with ALS or FTD. Using coordinated transcriptomic and proteomic studies of TDP-43-depleted human iPSC-derived neurons, we identified 65 peptides that mapped to 12 cryptic exons. Cryptic exons identified in TDP-43-depleted human iPSC-derived neurons were predictive of cryptic exons expressed in postmortem brain tissue from patients with TDP-43 proteinopathy. These cryptic exons produced transcript variants that generated de novo proteins. We found that the inclusion of cryptic peptide sequences in proteins altered their interactions with other proteins, thereby likely altering their function. Last, we showed that 18 de novo peptides across 13 genes were present in CSF samples from patients with ALS/FTD spectrum disorders. The demonstration of cryptic exon translation suggests new mechanisms for ALS/FTD pathophysiology downstream of TDP-43 dysfunction and may provide a potential strategy to assay TDP-43 function in patient CSF.
Collapse
Affiliation(s)
- Sahba Seddighi
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Yue A Qi
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Anna-Leigh Brown
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Oscar G Wilkins
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
- Francis Crick Institute, London, UK
| | - Colleen Bereda
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Cedric Belair
- Laboratory of Genetics and Genomics, National Institute on Aging, Intramural Research Program, National Institutes of Health, Baltimore, MD, USA
| | - Yong-Jie Zhang
- Department of Neuroscience, Mayo Clinic, Jacksonville, FL, USA
- Neuroscience Graduate Program, Mayo Clinic Graduate School of Biomedical Sciences, Jacksonville, FL, USA
| | - Mercedes Prudencio
- Department of Neuroscience, Mayo Clinic, Jacksonville, FL, USA
- Neuroscience Graduate Program, Mayo Clinic Graduate School of Biomedical Sciences, Jacksonville, FL, USA
| | - Matthew J Keuss
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Aditya Khandeshi
- Laboratory of Genetics and Genomics, National Institute on Aging, Intramural Research Program, National Institutes of Health, Baltimore, MD, USA
| | - Sarah Pickles
- Department of Neuroscience, Mayo Clinic, Jacksonville, FL, USA
- Neuroscience Graduate Program, Mayo Clinic Graduate School of Biomedical Sciences, Jacksonville, FL, USA
| | - Sarah E Kargbo-Hill
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - James Hawrot
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Daniel M Ramos
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Hebao Yuan
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Jessica Roberts
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Erika Kelmer Sacramento
- Leibniz Institute on Aging, Fritz Lipmann Institute (FLI), Beutenbergstrasse 11, 07745 Jena, Germany
| | - Syed I Shah
- Data Tecnica International, Washington, DC, USA
| | - Mike A Nalls
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Data Tecnica International, Washington, DC, USA
| | - Jennifer M Colón-Mercado
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Joel F Reyes
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Veronica H Ryan
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Matthew P Nelson
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Casey N Cook
- Department of Neuroscience, Mayo Clinic, Jacksonville, FL, USA
- Neuroscience Graduate Program, Mayo Clinic Graduate School of Biomedical Sciences, Jacksonville, FL, USA
| | - Ziyi Li
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Data Tecnica International, Washington, DC, USA
| | - Laurel Screven
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Justin Y Kwan
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Puja R Mehta
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Matteo Zanovello
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Martina Hallegger
- Francis Crick Institute, London, UK
- UK Dementia Research Institute at King's College London, London, UK
| | | | - Lingyan Ping
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA, USA
| | - Yuka Koike
- Department of Neuroscience, Mayo Clinic, Jacksonville, FL, USA
- Neuroscience Graduate Program, Mayo Clinic Graduate School of Biomedical Sciences, Jacksonville, FL, USA
| | - Björn Oskarsson
- Department of Neuroscience, Mayo Clinic, Jacksonville, FL, USA
- Neuroscience Graduate Program, Mayo Clinic Graduate School of Biomedical Sciences, Jacksonville, FL, USA
| | - Nathan P Staff
- Department of Neurology, Mayo Clinic, Rochester, MN, USA
| | - Duc M Duong
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA, USA
| | - Aisha Ahmed
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Maria Secrier
- Department of Genetics, Evolution and Environment, UCL Genetics Institute, UCL, London, UK
| | - Jernej Ule
- Francis Crick Institute, London, UK
- UK Dementia Research Institute at King's College London, London, UK
| | - Steven Jacobson
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Daniel S Reich
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Jonathan D Rohrer
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Andrea Malaspina
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Dennis W Dickson
- Department of Neuroscience, Mayo Clinic, Jacksonville, FL, USA
- Neuroscience Graduate Program, Mayo Clinic Graduate School of Biomedical Sciences, Jacksonville, FL, USA
| | - Jonathan D Glass
- Department of Neurology, Center for Neurodegenerative Diseases, Emory University, Atlanta, GA, USA
| | - Alessandro Ori
- Leibniz Institute on Aging, Fritz Lipmann Institute (FLI), Beutenbergstrasse 11, 07745 Jena, Germany
| | - Nicholas T Seyfried
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA, USA
- Department of Neurology, Emory University School of Medicine, Atlanta, GA, USA
| | - Manolis Maragkakis
- Laboratory of Genetics and Genomics, National Institute on Aging, Intramural Research Program, National Institutes of Health, Baltimore, MD, USA
| | - Leonard Petrucelli
- Department of Neuroscience, Mayo Clinic, Jacksonville, FL, USA
- Neuroscience Graduate Program, Mayo Clinic Graduate School of Biomedical Sciences, Jacksonville, FL, USA
| | - Pietro Fratta
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
- Francis Crick Institute, London, UK
| | - Michael E Ward
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
25
|
Durydivka O, Gazdarica M, Vecerkova K, Radenkovic S, Blahos J. Multiple Sgip1 splice variants inhibit cannabinoid receptor 1 internalization. Gene 2024; 892:147851. [PMID: 37783296 DOI: 10.1016/j.gene.2023.147851] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 09/23/2023] [Accepted: 09/27/2023] [Indexed: 10/04/2023]
Abstract
Alternative splicing can often result in the expression of distinct protein isoforms from a single gene, with specific composition and properties. SH3-containing GRB2-like protein 3-interacting protein 1 (Sgip1) is a brain-enriched protein that regulates clathrin-mediated endocytosis and interferes with the internalization of cannabinoid receptor 1. Several research groups have studied the physiological importance of Sgip1, and four Sgip1 protein isoforms have been described to date, while the NCBI Gene database predicts the expression of 20 splice variants from the Sgip1 gene in mice. In this work, we cloned 15 Sgip1 splice variants from the mouse brain, including 11 novel splice variants. The cloned splice variants differed in exon composition within two Sgip1 regions: the membrane phospholipid-binding domain and the proline-rich region. All the Sgip1 splice isoforms had similar stability and comparable ability to inhibit the internalization of cannabinoid receptor 1. None of the isoforms influenced the internalization of the µ-opioid receptor. We confirm the expression of Sgip1 splice variants described in previous studies or predicted in silico. Our data provide a basis for further studies exploring the significance of Sgip1 splicing, and we suggest a new classification of Sgip1 splice variants to unify their nomenclature.
Collapse
Affiliation(s)
- Oleh Durydivka
- Institute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20 Prague, Czech Republic
| | - Matej Gazdarica
- Institute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20 Prague, Czech Republic
| | - Katerina Vecerkova
- Institute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20 Prague, Czech Republic; Department of Informatics and Chemistry, University of Chemistry and Technology, Technicka 5, 166 28 Prague, Czech Republic
| | - Silvia Radenkovic
- Institute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20 Prague, Czech Republic
| | - Jaroslav Blahos
- Institute of Molecular Genetics of the Czech Academy of Sciences, Videnska 1083, 142 20 Prague, Czech Republic.
| |
Collapse
|
26
|
Jones EF, Howton TC, Flanary VL, Clark AD, Lasseigne BN. Long-read RNA sequencing identifies region- and sex-specific C57BL/6J mouse brain mRNA isoform expression and usage. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.11.575219. [PMID: 38260631 PMCID: PMC10802568 DOI: 10.1101/2024.01.11.575219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Alternative splicing (AS) contributes to the biological heterogeneity between species, sexes, tissues, and cell types. Many diseases are either caused by alterations in AS or by alterations to AS. Therefore, measuring AS accurately and efficiently is critical for assessing molecular phenotypes, including those associated with disease. Long-read sequencing enables more accurate quantification of differentially spliced isoform expression than short-read sequencing approaches, and third-generation platforms facilitate high-throughput experiments. To assess differences in AS across the cerebellum, cortex, hippocampus, and striatum by sex, we generated and analyzed Oxford Nanopore Technologies (ONT) long-read RNA sequencing (lrRNA-Seq) C57BL/6J mouse brain cDNA libraries. From >85 million reads that passed quality control metrics, we calculated differential gene expression (DGE), differential transcript expression (DTE), and differential transcript usage (DTU) across brain regions and by sex. We found significant DGE, DTE, and DTU across brain regions and that the cerebellum had the most differences compared to the other three regions. Additionally, we found region-specific differential splicing between sexes, with the most sex differences in DTU in the cortex and no DTU in the hippocampus. We also report on two distinct patterns of sex DTU we observed, sex-divergent and sex-specific, that could potentially help explain sex differences in the prevalence and prognosis of various neurological and psychiatric disorders in future studies. Finally, we built a Shiny web application for researchers to explore the data further. Our study provides a resource for the community; it underscores the importance of AS in biological heterogeneity and the utility of long-read sequencing to better understand AS in the brain.
Collapse
Affiliation(s)
- Emma F. Jones
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Timothy C. Howton
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Victoria L. Flanary
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Amanda D. Clark
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Brittany N. Lasseigne
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| |
Collapse
|
27
|
Corre M, Boehm V, Besic V, Kurowska A, Viry A, Mohammad A, Sénamaud-Beaufort C, Thomas-Chollier M, Lebreton A. Alternative splicing induced by bacterial pore-forming toxins sharpens CIRBP-mediated cell response to Listeria infection. Nucleic Acids Res 2023; 51:12459-12475. [PMID: 37941135 PMCID: PMC10711537 DOI: 10.1093/nar/gkad1033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 10/09/2023] [Accepted: 10/20/2023] [Indexed: 11/10/2023] Open
Abstract
Cell autonomous responses to intracellular bacteria largely depend on reorganization of gene expression. To gain isoform-level resolution of these modes of regulation, we combined long- and short-read transcriptomic analyses of the response of intestinal epithelial cells to infection by the foodborne pathogen Listeria monocytogenes. Among the most striking isoform-based types of regulation, expression of the cellular stress response regulator CIRBP (cold-inducible RNA-binding protein) and of several SRSFs (serine/arginine-rich splicing factors) switched from canonical transcripts to nonsense-mediated decay-sensitive isoforms by inclusion of 'poison exons'. We showed that damage to host cell membranes caused by bacterial pore-forming toxins (listeriolysin O, perfringolysin, streptolysin or aerolysin) led to the dephosphorylation of SRSFs via the inhibition of the kinase activity of CLK1, thereby driving CIRBP alternative splicing. CIRBP isoform usage was found to have consequences on infection, since selective repression of canonical CIRBP reduced intracellular bacterial load while that of the poison exon-containing isoform exacerbated it. Consistently, CIRBP-bound mRNAs were shifted towards stress-relevant transcripts in infected cells, with increased mRNA levels or reduced translation efficiency for some targets. Our results thus generalize the alternative splicing of CIRBP and SRSFs as a common response to biotic or abiotic stresses by extending its relevance to the context of bacterial infection.
Collapse
Affiliation(s)
- Morgane Corre
- Group Bacterial infection, response & dynamics, Institut de biologie de l’ENS (IBENS), École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Volker Boehm
- Institute for Genetics, University of Cologne, 50674 Cologne, Germany
- Center for Molecular Medicine Cologne (CMMC), University of Cologne, 50931 Cologne, Germany
| | - Vinko Besic
- Group Bacterial infection, response & dynamics, Institut de biologie de l’ENS (IBENS), École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Anna Kurowska
- Group Bacterial infection, response & dynamics, Institut de biologie de l’ENS (IBENS), École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Anouk Viry
- Group Bacterial infection, response & dynamics, Institut de biologie de l’ENS (IBENS), École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Ammara Mohammad
- GenomiqueENS, Institut de Biologie de l’ENS (IBENS), École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Catherine Sénamaud-Beaufort
- GenomiqueENS, Institut de Biologie de l’ENS (IBENS), École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Morgane Thomas-Chollier
- Group Bacterial infection, response & dynamics, Institut de biologie de l’ENS (IBENS), École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
- GenomiqueENS, Institut de Biologie de l’ENS (IBENS), École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Alice Lebreton
- Group Bacterial infection, response & dynamics, Institut de biologie de l’ENS (IBENS), École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
- INRAE, Micalis Institute, 78350 Jouy-en-Josas, France
| |
Collapse
|
28
|
Heberle BA, Brandon JA, Page ML, Nations KA, Dikobe KI, White BJ, Gordon LA, Fox GA, Wadsworth ME, Doyle PH, Williams BA, Fox EJ, Shantaraman A, Ryten M, Goodwin S, Ghiban E, Wappel R, Mavruk-Eskipehlivan S, Miller JB, Seyfried NT, Nelson PT, Fryer JD, Ebbert MTW. Using deep long-read RNAseq in Alzheimer's disease brain to assess medical relevance of RNA isoform diversity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.06.552162. [PMID: 37609156 PMCID: PMC10441303 DOI: 10.1101/2023.08.06.552162] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Due to alternative splicing, human protein-coding genes average over eight RNA isoforms, resulting in nearly four distinct protein coding sequences per gene. Long-read RNAseq (IsoSeq) enables more accurate quantification of isoforms, shedding light on their specific roles. To assess the medical relevance of measuring RNA isoform expression, we sequenced 12 aged human frontal cortices (6 Alzheimer's disease cases and 6 controls; 50% female) using one Oxford Nanopore PromethION flow cell per sample. Our study uncovered 53 new high-confidence RNA isoforms in medically relevant genes, including several where the new isoform was one of the most highly expressed for that gene. Specific examples include WDR4 (61%; microcephaly), MYL3 (44%; hypertrophic cardiomyopathy), and MTHFS (25%; major depression, schizophrenia, bipolar disorder). Other notable genes with new high-confidence isoforms include CPLX2 (10%; schizophrenia, epilepsy) and MAOB (9%; targeted for Parkinson's disease treatment). We identified 1,917 medically relevant genes expressing multiple isoforms in human frontal cortex, where 1,018 had multiple isoforms with different protein coding sequences, demonstrating the need to better understand how individual isoforms from a single gene body are involved in human health and disease, if at all. Exactly 98 of the 1,917 genes are implicated in brain-related diseases, including Alzheimer's disease genes such as APP (Aβ precursor protein; five), MAPT (tau protein; four), and BIN1 (eight). As proof of concept, we also found 99 differentially expressed RNA isoforms between Alzheimer's cases and controls, despite the genes themselves not exhibiting differential expression. Our findings highlight the significant knowledge gaps in RNA isoform diversity and their medical relevance. Deep long-read RNA sequencing will be necessary going forward to fully comprehend the medical relevance of individual isoforms for a "single" gene.
Collapse
Affiliation(s)
- Bernardo Aguzzoli Heberle
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY
| | | | - Madeline L. Page
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
| | - Kayla A. Nations
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
| | - Ketsile I. Dikobe
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
| | - Brendan J. White
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
| | - Lacey A. Gordon
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
| | - Grant A. Fox
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY
| | - Mark E. Wadsworth
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
| | - Patricia H. Doyle
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY
| | - Brittney A. Williams
- Department of Pharmacology and Nutritional Sciences, College of Medicine, University of Kentucky, Lexington, KY
| | - Edward J. Fox
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA, USA
| | | | - Mina Ryten
- Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, United States
| | - Elena Ghiban
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, United States
| | - Robert Wappel
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, United States
| | | | - Justin B. Miller
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
- Division of Biomedical Informatics, Internal Medicine, College of Medicine, University of Kentucky, Lexington, KY
- Department of Pathology and Laboratory Medicine, University of Kentucky, Lexington, KY, USA
- Microbiology, Immunology and Molecular Genetics, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - Nicholas T. Seyfried
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA, USA
| | - Peter T. Nelson
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
| | - John D. Fryer
- Department of Neuroscience, Mayo Clinic, Scottsdale, Arizona
| | - Mark T. W. Ebbert
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY
- Department of Neuroscience, College of Medicine, University of Kentucky, Lexington, KY
- Division of Biomedical Informatics, Internal Medicine, College of Medicine, University of Kentucky, Lexington, KY
| |
Collapse
|
29
|
Mestre-Tomás J, Liu T, Pardo-Palacios F, Conesa A. SQANTI-SIM: a simulator of controlled transcript novelty for lrRNA-seq benchmark. Genome Biol 2023; 24:286. [PMID: 38082294 PMCID: PMC10712166 DOI: 10.1186/s13059-023-03127-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 11/27/2023] [Indexed: 12/18/2023] Open
Abstract
Long-read RNA sequencing has emerged as a powerful tool for transcript discovery, even in well-annotated organisms. However, assessing the accuracy of different methods in identifying annotated and novel transcripts remains a challenge. Here, we present SQANTI-SIM, a versatile tool that wraps around popular long-read simulators to allow precise management of transcript novelty based on the structural categories defined by SQANTI3. By selectively excluding specific transcripts from the reference dataset, SQANTI-SIM effectively emulates scenarios involving unannotated transcripts. Furthermore, the tool provides customizable features and supports the simulation of additional types of data, representing the first multi-omics simulation tool for the lrRNA-seq field.
Collapse
Affiliation(s)
- Jorge Mestre-Tomás
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedrátic Agustín Escardino Benlloch, Paterna, 46980, Spain
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Camino de Vera, Valencia, 46022, Spain
| | - Tianyuan Liu
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedrátic Agustín Escardino Benlloch, Paterna, 46980, Spain
| | - Francisco Pardo-Palacios
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedrátic Agustín Escardino Benlloch, Paterna, 46980, Spain
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedrátic Agustín Escardino Benlloch, Paterna, 46980, Spain.
| |
Collapse
|
30
|
Humphrey J, Brophy E, Kosoy R, Zeng B, Coccia E, Mattei D, Ravi A, Efthymiou AG, Navarro E, Muller BZ, Snijders GJLJ, Allan A, Münch A, Kitata RB, Kleopoulos SP, Argyriou S, Shao Z, Francoeur N, Tsai CF, Gritsenko MA, Monroe ME, Paurus VL, Weitz KK, Shi T, Sebra R, Liu T, de Witte LD, Goate AM, Bennett DA, Haroutunian V, Hoffman GE, Fullard JF, Roussos P, Raj T. Long-read RNA-seq atlas of novel microglia isoforms elucidates disease-associated genetic regulation of splicing. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.01.23299073. [PMID: 38076956 PMCID: PMC10705658 DOI: 10.1101/2023.12.01.23299073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
Microglia, the innate immune cells of the central nervous system, have been genetically implicated in multiple neurodegenerative diseases. We previously mapped the genetic regulation of gene expression and mRNA splicing in human microglia, identifying several loci where common genetic variants in microglia-specific regulatory elements explain disease risk loci identified by GWAS. However, identifying genetic effects on splicing has been challenging due to the use of short sequencing reads to identify causal isoforms. Here we present the isoform-centric microglia genomic atlas (isoMiGA) which leverages the power of long-read RNA-seq to identify 35,879 novel microglia isoforms. We show that the novel microglia isoforms are involved in stimulation response and brain region specificity. We then quantified the expression of both known and novel isoforms in a multi-ethnic meta-analysis of 555 human microglia short-read RNA-seq samples from 391 donors, the largest to date, and found associations with genetic risk loci in Alzheimer's disease and Parkinson's disease. We nominate several loci that may act through complex changes in isoform and splice site usage.
Collapse
Affiliation(s)
- Jack Humphrey
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Erica Brophy
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Roman Kosoy
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Biao Zeng
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Elena Coccia
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Daniele Mattei
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ashvin Ravi
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Anastasia G. Efthymiou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Elisa Navarro
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Biochemistry and Molecular Biology, Faculty of Medicine (Universidad Complutense de Madrid), Madrid, Spain
- Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas (CIBERNED), Madrid, Spain
- Instituto Ramon y Cajal de Investigacion Sanitaria (IRYCIS), Madrid, Spain
| | - Benjamin Z. Muller
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Gijsje JLJ Snijders
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Amanda Allan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alexandra Münch
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Reta Birhanu Kitata
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Steven P Kleopoulos
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Stathis Argyriou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Zhiping Shao
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Nancy Francoeur
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Chia-Feng Tsai
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Marina A Gritsenko
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Matthew E Monroe
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Vanessa L Paurus
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Karl K Weitz
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Tujin Shi
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Robert Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Tao Liu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Lot D. de Witte
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Alison M. Goate
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - David A. Bennett
- Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, Illinois, USA
| | - Vahram Haroutunian
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mental Illness Research Education, and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY, USA
| | - Gabriel E. Hoffman
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - John F. Fullard
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Panos Roussos
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
- Mental Illness Research Education, and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY, USA
| | - Towfique Raj
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
31
|
Schuster J, Ritchie ME, Gouil Q. Restrander: rapid orientation and artefact removal for long-read cDNA data. NAR Genom Bioinform 2023; 5:lqad108. [PMID: 38143957 PMCID: PMC10748469 DOI: 10.1093/nargab/lqad108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 11/07/2023] [Accepted: 12/14/2023] [Indexed: 12/26/2023] Open
Abstract
In transcriptomic analyses, it is helpful to keep track of the strand of the RNA molecules. However, the Oxford Nanopore long-read cDNA sequencing protocols generate reads that correspond to either the first or second-strand cDNA, therefore the strandedness of the initial transcript has to be inferred bioinformatically. Reverse transcription and PCR can also introduce artefacts which should be flagged in data pre-processing. Here we introduce Restrander, a lightning-fast and highly accurate tool for restranding and removing artefacts in long-read cDNA sequencing data. Thanks to its C++ implementation, Restrander was faster than Oxford Nanopore Technologies' existing tool Pychopper, and correctly restranded more reads due to its strategy of searching for polyA/T tails in addition to primer sequences from the reverse transcription and template-switch steps. We found that restranding improved the process of visualising and exploring data, and increased the number of novel isoforms discovered by bambu, particularly in regions where sense and anti-sense transcripts co-occur. The artefact detection implemented in Restrander quantifies reads lacking the correct 5' and 3' ends, a useful feature in quality control for library preparation. Restrander is pre-configured for all major cDNA protocols, and can be customised with user-defined primers. Restrander is available at https://github.com/mritchielab/restrander.
Collapse
Affiliation(s)
- Jakob Schuster
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, VIC 3010, Australia
| | - Matthew E Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, VIC 3010, Australia
| | - Quentin Gouil
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, VIC 3010, Australia
| |
Collapse
|
32
|
Naarmann-de Vries IS, Gjerga E, Gandor CLA, Dieterich C. Adaptive sampling for nanopore direct RNA-sequencing. RNA (NEW YORK, N.Y.) 2023; 29:1939-1949. [PMID: 37673469 PMCID: PMC10653383 DOI: 10.1261/rna.079727.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 08/14/2023] [Indexed: 09/08/2023]
Abstract
Nanopore long-read sequencing enables real-time monitoring and controlling of individual nanopores. This allows us to enrich or deplete specific sequences in DNA sequencing in a process called "adaptive sampling." So far, adaptive sampling (AS) was not applicable to the direct sequencing of RNA. Here, we show that AS is feasible and useful for direct RNA sequencing (DRS), which has its specific technical and biological challenges. Using a well-controlled in vitro transcript-based model system, we identify essential characteristics and parameter settings for AS in DRS, as the superior performance of depletion over enrichment. Here, the efficiency of depletion is close to the theoretical maximum. Additionally, we demonstrate that AS efficiently depletes specific transcripts in transcriptome-wide sequencing applications. Specifically, we applied our AS approach to poly(A)-enriched RNA samples from human-induced pluripotent stem cell-derived cardiomyocytes and mouse whole heart tissue and show efficient 2.5- to 2.8-fold depletion of highly abundant mitochondrial-encoded transcripts. Finally, we characterize depletion and enrichment performance for complex transcriptome subsets, that is, at the level of the entire Chromosome 11, proving the general applicability of direct RNA AS. Our analyses provide evidence that AS is especially useful to enable the detection of lowly expressed transcripts and reduce the sequencing of highly abundant disturbing transcripts.
Collapse
Affiliation(s)
- Isabel S Naarmann-de Vries
- Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg, 69120 Heidelberg, Germany
- German Center for Cardiovascular Research (DZHK), Partner site Heidelberg/Mannheim, 69120 Heidelberg, Germany
| | - Enio Gjerga
- Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg, 69120 Heidelberg, Germany
- German Center for Cardiovascular Research (DZHK), Partner site Heidelberg/Mannheim, 69120 Heidelberg, Germany
| | - Catharina L A Gandor
- Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg, 69120 Heidelberg, Germany
| | - Christoph Dieterich
- Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg, 69120 Heidelberg, Germany
- German Center for Cardiovascular Research (DZHK), Partner site Heidelberg/Mannheim, 69120 Heidelberg, Germany
| |
Collapse
|
33
|
Liu Z, Zhu C, Steinmetz LM, Wei W. Identification and quantification of small exon-containing isoforms in long-read RNA sequencing data. Nucleic Acids Res 2023; 51:e104. [PMID: 37843096 PMCID: PMC10639058 DOI: 10.1093/nar/gkad810] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 08/03/2023] [Accepted: 09/20/2023] [Indexed: 10/17/2023] Open
Abstract
Small exons are pervasive in transcriptomes across organisms, and their quantification in RNA isoforms is crucial for understanding gene functions. Although long-read RNA-seq based on Oxford Nanopore Technologies (ONT) offers the advantage of covering transcripts in full length, its lower base accuracy poses challenges for identifying individual exons, particularly microexons (≤ 30 nucleotides). Here, we systematically assess small exons quantification in synthetic and human ONT RNA-seq datasets. We demonstrate that reads containing small exons are often not properly aligned, affecting the quantification of relevant transcripts. Thus, we develop a local-realignment method for misaligned exons (MisER), which remaps reads with misaligned exons to the transcript references. Using synthetic and simulated datasets, we demonstrate the high sensitivity and specificity of MisER for the quantification of transcripts containing small exons. Moreover, MisER enabled us to identify small exons with a higher percent spliced-in index (PSI) in neural, particularly neural-regulated microexons, when comparing 14 neural to 16 non-neural tissues in humans. Our work introduces an improved quantification method for long-read RNA-seq and especially facilitates studies using ONT long-reads to elucidate the regulation of genes involving small exons.
Collapse
Affiliation(s)
- Zhen Liu
- Lingang Laboratory, Shanghai, Shanghai 200031, China
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, Shanghai 200031, China
| | - Chenchen Zhu
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Lars M Steinmetz
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA 94305, USA
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA 94304, USA
| | - Wu Wei
- Lingang Laboratory, Shanghai, Shanghai 200031, China
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, Shanghai 200031, China
- Center for Biomedical Informatics, Shanghai Children's Hospital, Shanghai Jiao Tong University, Shanghai, Shanghai 200040, China
| |
Collapse
|
34
|
Heruye S, Myslinski J, Zeng C, Zollman A, Makino S, Nanamatsu A, Mir Q, Janga SC, Doud EH, Eadon MT, Maier B, Hamada M, Tran TM, Dagher PC, Hato T. Inflammation primes the kidney for recovery by activating AZIN1 A-to-I editing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.09.566426. [PMID: 37986799 PMCID: PMC10659426 DOI: 10.1101/2023.11.09.566426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
The progression of kidney disease varies among individuals, but a general methodology to quantify disease timelines is lacking. Particularly challenging is the task of determining the potential for recovery from acute kidney injury following various insults. Here, we report that quantitation of post-transcriptional adenosine-to-inosine (A-to-I) RNA editing offers a distinct genome-wide signature, enabling the delineation of disease trajectories in the kidney. A well-defined murine model of endotoxemia permitted the identification of the origin and extent of A-to-I editing, along with temporally discrete signatures of double-stranded RNA stress and Adenosine Deaminase isoform switching. We found that A-to-I editing of Antizyme Inhibitor 1 (AZIN1), a positive regulator of polyamine biosynthesis, serves as a particularly useful temporal landmark during endotoxemia. Our data indicate that AZIN1 A-to-I editing, triggered by preceding inflammation, primes the kidney and activates endogenous recovery mechanisms. By comparing genetically modified human cell lines and mice locked in either A-to-I edited or uneditable states, we uncovered that AZIN1 A-to-I editing not only enhances polyamine biosynthesis but also engages glycolysis and nicotinamide biosynthesis to drive the recovery phenotype. Our findings implicate that quantifying AZIN1 A-to-I editing could potentially identify individuals who have transitioned to an endogenous recovery phase. This phase would reflect their past inflammation and indicate their potential for future recovery.
Collapse
Affiliation(s)
- Segewkal Heruye
- Department of Medicine, Indiana University School of Medicine
| | - Jered Myslinski
- Department of Medicine, Indiana University School of Medicine
| | - Chao Zeng
- Faculty of Science and Engineering, Waseda University, Tokyo
| | - Amy Zollman
- Department of Medicine, Indiana University School of Medicine
| | - Shinichi Makino
- Department of Medicine, Indiana University School of Medicine
| | - Azuma Nanamatsu
- Department of Medicine, Indiana University School of Medicine
| | - Quoseena Mir
- Luddy School of Informatics, Computing, and Engineering, Indiana University
| | | | - Emma H Doud
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine
| | - Michael T Eadon
- Department of Medicine, Indiana University School of Medicine
| | - Bernhard Maier
- Department of Medicine, Indiana University School of Medicine
| | - Michiaki Hamada
- Faculty of Science and Engineering, Waseda University, Tokyo
- AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory, National Institute of Advanced Industrial Science and Technology, Tokyo
- Graduate School of Medicine, Nippon Medical School, Tokyo
| | - Tuan M Tran
- Department of Medicine, Indiana University School of Medicine
- Richard L. Roudebush Veterans Affairs Medical Center, Indianapolis
| | - Pierre C Dagher
- Department of Medicine, Indiana University School of Medicine
| | - Takashi Hato
- Department of Medicine, Indiana University School of Medicine
- Richard L. Roudebush Veterans Affairs Medical Center, Indianapolis
- Department of Medical and Molecular Genetics, Indiana University School of Medicine
| |
Collapse
|
35
|
Dong X, Du MRM, Gouil Q, Tian L, Jabbari JS, Bowden R, Baldoni PL, Chen Y, Smyth GK, Amarasinghe SL, Law CW, Ritchie ME. Benchmarking long-read RNA-sequencing analysis tools using in silico mixtures. Nat Methods 2023; 20:1810-1821. [PMID: 37783886 DOI: 10.1038/s41592-023-02026-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 08/25/2023] [Indexed: 10/04/2023]
Abstract
The lack of benchmark data sets with inbuilt ground-truth makes it challenging to compare the performance of existing long-read isoform detection and differential expression analysis workflows. Here, we present a benchmark experiment using two human lung adenocarcinoma cell lines that were each profiled in triplicate together with synthetic, spliced, spike-in RNAs (sequins). Samples were deeply sequenced on both Illumina short-read and Oxford Nanopore Technologies long-read platforms. Alongside the ground-truth available via the sequins, we created in silico mixture samples to allow performance assessment in the absence of true positives or true negatives. Our results show that StringTie2 and bambu outperformed other tools from the six isoform detection tools tested, DESeq2, edgeR and limma-voom were best among the five differential transcript expression tools tested and there was no clear front-runner for performing differential transcript usage analysis between the five tools compared, which suggests further methods development is needed for this application.
Collapse
Affiliation(s)
- Xueyi Dong
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia.
| | - Mei R M Du
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
| | - Quentin Gouil
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Luyi Tian
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
- Guangzhou National Laboratory, Guangzhou, China
| | - Jafar S Jabbari
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Rory Bowden
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Pedro L Baldoni
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Yunshun Chen
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Gordon K Smyth
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- School of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Shanika L Amarasinghe
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
- The Australian Regenerative Medicine Institute, Monash University, Clayton, Victoria, Australia
| | - Charity W Law
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Matthew E Ritchie
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia.
| |
Collapse
|
36
|
Pardo-Palacios FJ, Wang D, Reese F, Diekhans M, Carbonell-Sala S, Williams B, Loveland JE, De María M, Adams MS, Balderrama-Gutierrez G, Behera AK, Gonzalez JM, Hunt T, Lagarde J, Liang CE, Li H, Jerryd Meade M, Moraga Amador DA, Prjibelski AD, Birol I, Bostan H, Brooks AM, Hasan Çelik M, Chen Y, Du MR, Felton C, Göke J, Hafezqorani S, Herwig R, Kawaji H, Lee J, Liang Li J, Lienhard M, Mikheenko A, Mulligan D, Ming Nip K, Pertea M, Ritchie ME, Sim AD, Tang AD, Kei Wan Y, Wang C, Wong BY, Yang C, Barnes I, Berry A, Capella S, Dhillon N, Fernandez-Gonzalez JM, Ferrández-Peral L, Garcia-Reyero N, Goetz S, Hernández-Ferrer C, Kondratova L, Liu T, Martinez-Martin A, Menor C, Mestre-Tomás J, Mudge JM, Panayotova NG, Paniagua A, Repchevsky D, Rouchka E, Saint-John B, Sapena E, Sheynkman L, Laird Smith M, Suner MM, Takahashi H, Youngworth IA, Carninci P, Denslow ND, Guigó R, Hunter ME, Tilgner HU, Wold BJ, Vollmers C, Frankish A, Fai Au K, Sheynkman GM, Mortazavi A, Conesa A, Brooks AN. Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.25.550582. [PMID: 37546854 PMCID: PMC10402094 DOI: 10.1101/2023.07.25.550582] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.
Collapse
Affiliation(s)
- Francisco J. Pardo-Palacios
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
- These authors contributed equally to this work
| | - Dingjie Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA
- These authors contributed equally to this work
| | - Fairlie Reese
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
- These authors contributed equally to this work
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, USA
- These authors contributed equally to this work
| | - Sílvia Carbonell-Sala
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- These authors contributed equally to this work
| | - Brian Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
- These authors contributed equally to this work
| | - Jane E. Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- These authors contributed equally to this work
| | - Maite De María
- Department of Physiological Sciences, College of Veterinary Medicine, University of Florida, Gainesville, USA
- Center for Environmental and Human Toxicology, University of Florida, Gainesville, USA
- These authors contributed equally to this work
| | - Matthew S. Adams
- Molecular Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, USA
- These authors contributed equally to this work
| | - Gabriela Balderrama-Gutierrez
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
- These authors contributed equally to this work
| | - Amit K. Behera
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
- These authors contributed equally to this work
| | - Jose M. Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- These authors contributed equally to this work
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- These authors contributed equally to this work
| | - Julien Lagarde
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Flomics Biotech, Dr Aiguader 88, Barcelona 08003, Spain
- These authors contributed equally to this work
| | - Cindy E. Liang
- Molecular Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, USA
- These authors contributed equally to this work
| | - Haoran Li
- Department of Biomedical Informatics, The Ohio State University, Columbus, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA
- These authors contributed equally to this work
| | - Marcus Jerryd Meade
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, USA
- These authors contributed equally to this work
| | - David A. Moraga Amador
- Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, USA
- These authors contributed equally to this work
| | - Andrey D. Prjibelski
- Department of Computer Science, University of Helsinki, Helsinki, Finland
- Center for Bioinformatics and Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
- These authors contributed equally to this work
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
| | - Hamed Bostan
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, USA
| | - Ashley M. Brooks
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, USA
| | - Muhammed Hasan Çelik
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Ying Chen
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Mei R,M. Du
- Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
| | - Colette Felton
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Jonathan Göke
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore
| | - Saber Hafezqorani
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
| | - Ralf Herwig
- Department Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Berlin, Germany
| | - Hideya Kawaji
- Research Center for Genome & Medical Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Joseph Lee
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Jian Liang Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, USA
| | - Matthias Lienhard
- Department Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Berlin, Germany
| | - Alla Mikheenko
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - Dennis Mulligan
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Ka Ming Nip
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
| | - Mihaela Pertea
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, USA
| | - Matthew E. Ritchie
- Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Andre D. Sim
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Alison D. Tang
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Yuk Kei Wan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Changqing Wang
- Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
| | - Brandon Y. Wong
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, USA
| | - Chen Yang
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Namrita Dhillon
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | | | - Luis Ferrández-Peral
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
| | - Natàlia Garcia-Reyero
- Environmental Laboratory, US Army Engineer Research & Development Center, Vicksburg, USA
| | | | | | | | | | | | | | - Jorge Mestre-Tomás
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
| | - Jonathan M. Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nedka G. Panayotova
- Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, USA
| | - Alejandro Paniagua
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
| | | | - Eric Rouchka
- Department of Biochemistry & Molecular Genetics, University of Louisville, Louisville, USA
| | - Brandon Saint-John
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Enrique Sapena
- European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK, UK
| | - Leon Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, USA
| | - Melissa Laird Smith
- Department of Biochemistry & Molecular Genetics, University of Louisville, Louisville, USA
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Hazuki Takahashi
- Center for Integrative Medical Sciences, Laboratory for Transcriptome Technology, RIKEN, Yokohama, Japan
| | | | - Piero Carninci
- Center for Integrative Medical Sciences, Laboratory for Transcriptome Technology, RIKEN, Yokohama, Japan
- Human Technopole, Milano, Italy
| | - Nancy D. Denslow
- Department of Physiological Sciences, College of Veterinary Medicine, University of Florida, Gainesville, USA
- Center for Environmental and Human Toxicology, Department of Physiological Sciences,, University of Florida, Gainesville, USA
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Margaret E. Hunter
- U.S. Geological Survey, Wetland and Aquatic Research Center, Gainesville, USA
| | - Hagen U. Tilgner
- Brain and Mind Research Institute and Center for Neurogenetics, Weill Cornell Medicine, New York City, USA
| | - Barbara J. Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Christopher Vollmers
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kin Fai Au
- Department of Biomedical Informatics, The Ohio State University, Columbus, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA
| | - Gloria M. Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, USA
- Center for Public Health Genomics
- UVA Cancer Center, University of Virginia, Charlottesville, USA
| | - Ali Mortazavi
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
- Microbiology and Cell Science Department, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, USA
| | - Angela N. Brooks
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, USA
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| |
Collapse
|
37
|
Calvo-Roitberg E, Daniels RF, Pai AA. Challenges in identifying mRNA transcript starts and ends from long-read sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.26.550536. [PMID: 37546743 PMCID: PMC10402045 DOI: 10.1101/2023.07.26.550536] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Long-read sequencing (LRS) technologies have the potential to revolutionize scientific discoveries in RNA biology, especially by enabling the comprehensive identification and quantification of full length mRNA isoforms. However, inherently high error rates make the analysis of long-read sequencing data challenging. While these error rates have been characterized for sequence and splice site identification, it is still unclear how accurately LRS reads represent transcript start and end sites. Here, we systematically assess the variability and accuracy of mRNA terminal ends identified by LRS reads across multiple sequencing platforms. We find substantial inconsistencies in both the start and end coordinates of LRS reads spanning a gene, such that LRS reads often fail to accurately recapitulate annotated or empirically derived terminal ends of mRNA molecules. To address this challenge, we introduce an approach to condition reads based on empirically derived terminal ends and identified a subset of reads that are more likely to represent full-length transcripts. Our approach can improve transcriptome analyses by enhancing the fidelity of transcript terminal end identification, but may result in lower power to quantify genes or discover novel isoforms. Thus, it is necessary to be cautious when selecting sequencing approaches and/or interpreting data from long-read RNA sequencing.
Collapse
Affiliation(s)
| | - Rachel F Daniels
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA
| | - Athma A Pai
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA
| |
Collapse
|