1
|
Tsai YL, Wolf EJ, Fluke KA, Fuchs RT, Dai N, Johnson SR, Sun Z, Elkins L, Burkhart BW, Santangelo TJ, Corrêa IR. Comprehensive nucleoside analysis of archaeal RNA modification profiles reveals an m 7G in the conserved P loop of 23S rRNA. Cell Rep 2025; 44:115471. [PMID: 40131932 DOI: 10.1016/j.celrep.2025.115471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Revised: 01/17/2025] [Accepted: 03/05/2025] [Indexed: 03/27/2025] Open
Abstract
Extremophilic Archaea employ diverse RNA modifications for survival. Our understanding of the modified nucleosides and their functions in Archaea is far from complete. Here, we establish an extensive profile of nucleoside modifications in thermophilic and mesophilic Archaea. Through liquid chromatography-tandem mass spectrometry (LC-MS/MS) and rigorous non-coding RNA depletion, we identify four previously unannotated modifications in archaeal mRNA. Nucleoside analysis conducted on total, large, small, and mRNA-enriched subfractions of hyperthermophile Thermococcus kodakarensis reveals modifications whose relative abundance is dynamically responsive to growth temperatures. To predict archaeal RNA-modifying enzymes, we leverage open-access databases to compare putative functional domains with previously annotated enzymes. Our approach leads to the discovery of a methyltransferase responsible for the installation of m7G in the P loop of 23S rRNA peptidyl transferase center in T. kodakarensis. The methyltransferase activity is confirmed in vitro with synthetic substrates and in vivo by assessing the presence of the m7G modification in a genetic deletion strain.
Collapse
MESH Headings
- RNA, Ribosomal, 23S/metabolism
- RNA, Ribosomal, 23S/chemistry
- RNA, Ribosomal, 23S/genetics
- RNA, Archaeal/metabolism
- RNA, Archaeal/genetics
- RNA, Archaeal/chemistry
- Thermococcus/genetics
- Thermococcus/metabolism
- Nucleosides/metabolism
- Methyltransferases/metabolism
- RNA Processing, Post-Transcriptional
Collapse
Affiliation(s)
| | - Eric J Wolf
- New England Biolabs Inc., Beverly, MA 01915, USA
| | - Kristin A Fluke
- Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - Ryan T Fuchs
- New England Biolabs Inc., Beverly, MA 01915, USA
| | - Nan Dai
- New England Biolabs Inc., Beverly, MA 01915, USA
| | | | - Zhiyi Sun
- New England Biolabs Inc., Beverly, MA 01915, USA
| | - Liam Elkins
- Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - Brett W Burkhart
- Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - Thomas J Santangelo
- Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO 80523, USA
| | | |
Collapse
|
2
|
Gumińska N, Matylla-Kulińska K, Krawczyk PS, Maj M, Orzeł W, Mackiewicz Z, Brouze A, Mroczek S, Dziembowski A. Direct profiling of non-adenosines in poly(A) tails of endogenous and therapeutic mRNAs with Ninetails. Nat Commun 2025; 16:2664. [PMID: 40102414 PMCID: PMC11920217 DOI: 10.1038/s41467-025-57787-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 11/27/2024] [Indexed: 03/20/2025] Open
Abstract
Stability and translation of mRNAs, both endogenous and therapeutic, is determined by poly(A) tail. Direct RNA sequencing enables single-molecule measurements of poly(A) lengths, avoiding amplification bias. It also holds potential for observation of non-adenosines within poly(A), known to influence mRNA fate. However, there is no computational method to detect composite tails in Direct Sequencing data. To address this gap, we introduce the Ninetails, a neural network-based tool that accurately identifies and quantifies non-adenosines in poly(A) tails. Examination of different biological contexts revealed widespread non-adenosine decorations, with frequencies influenced by the origin of poly(A) tails differing by mRNA class, cell type, and species. Notably, substrates of cytoplasmic TENT5-polymerases and mitochondrially encoded mRNAs are enriched in composite tails. For mRNA therapeutics, we show that the composition of poly(A) tails in mRNA vaccines is dynamic during its cellular lifetime and that the manufacturing protocol of synthetic mRNAs affects the purity of poly(A) tails.
Collapse
Affiliation(s)
- Natalia Gumińska
- Laboratory of RNA Biology, International Institute of Molecular and Cell Biology, Warsaw, Poland
| | - Katarzyna Matylla-Kulińska
- Laboratory of RNA Biology, International Institute of Molecular and Cell Biology, Warsaw, Poland
- Faculty of Biology, University of Warsaw, Warsaw, Poland
| | - Paweł S Krawczyk
- Laboratory of RNA Biology, International Institute of Molecular and Cell Biology, Warsaw, Poland
| | | | - Wiktoria Orzeł
- Laboratory of RNA Biology, International Institute of Molecular and Cell Biology, Warsaw, Poland
- Faculty of Biology, University of Warsaw, Warsaw, Poland
| | - Zuzanna Mackiewicz
- Laboratory of RNA Biology, International Institute of Molecular and Cell Biology, Warsaw, Poland
| | - Aleksandra Brouze
- Laboratory of RNA Biology, International Institute of Molecular and Cell Biology, Warsaw, Poland
| | - Seweryn Mroczek
- Laboratory of RNA Biology, International Institute of Molecular and Cell Biology, Warsaw, Poland
- Faculty of Biology, University of Warsaw, Warsaw, Poland
| | - Andrzej Dziembowski
- Laboratory of RNA Biology, International Institute of Molecular and Cell Biology, Warsaw, Poland.
- Faculty of Biology, University of Warsaw, Warsaw, Poland.
| |
Collapse
|
3
|
Ye F, Chen X, Li Y, Ju A, Sheng Y, Duan L, Zhang J, Zhang Z, Al-Rasheid KAS, Stover NA, Gao S. Comprehensive genome annotation of the model ciliate Tetrahymena thermophila by in-depth epigenetic and transcriptomic profiling. Nucleic Acids Res 2025; 53:gkae1177. [PMID: 39657783 PMCID: PMC11754650 DOI: 10.1093/nar/gkae1177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 10/29/2024] [Accepted: 11/12/2024] [Indexed: 12/12/2024] Open
Abstract
The ciliate Tetrahymena thermophila is a well-established unicellular model eukaryote, contributing significantly to foundational biological discoveries. Despite its acknowledged importance, current studies on Tetrahymena biology face challenges due to gene annotation inaccuracy, particularly the notable absence of untranslated regions (UTRs). To comprehensively annotate the Tetrahymena macronuclear genome, we collected extensive transcriptomic data spanning various cell stages. To ascertain transcript orientation and transcription start/end sites, we incorporated data on epigenetic marks displaying enrichment towards the 5' end of gene bodies, including H3 lysine 4 tri-methylation (H3K4me3), histone variant H2A.Z, nucleosome positioning and N6-methyldeoxyadenine (6mA). Cap-seq data was subsequently applied to validate the accuracy of identified transcription start sites. Additionally, we integrated Nanopore direct RNA sequencing (DRS), strand-specific RNA sequencing (RNA-seq) and assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) data. Using a newly developed bioinformatic pipeline, coupled with manual curation and experimental validation, our work yielded substantial improvements to the current gene models, including the addition of 2,481 new genes, updates to 23,936 existing genes, and the incorporation of 8,339 alternatively spliced isoforms. Furthermore, novel UTR information was annotated for 26,687 high-confidence genes. Intriguingly, 20% of protein-coding genes were identified to have natural antisense transcripts characterized by high diversity in alternative splicing, thus offering insights into understanding transcriptional regulation. Our work will enhance the utility of Tetrahymena as a robust genetic toolkit for advancing biological research, and provides a promising framework for genome annotation in other eukaryotes.
Collapse
Affiliation(s)
- Fei Ye
- MOE Key Laboratory of Evolution & Marine Biodiversity and Institute of Evolution & Marine Biodiversity, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao 266237, China
| | - Xiao Chen
- Laboratory of Marine Protozoan Biodiversity & Evolution, Marine College, Shandong University, Weihai 264209, China
- Suzhou Research Institute, Shandong University, Suzhou 215123, China
| | - Yuan Li
- MOE Key Laboratory of Evolution & Marine Biodiversity and Institute of Evolution & Marine Biodiversity, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao 266237, China
| | - Aili Ju
- MOE Key Laboratory of Evolution & Marine Biodiversity and Institute of Evolution & Marine Biodiversity, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao 266237, China
| | - Yalan Sheng
- Shum Yiu Foon Shum Bik Chuen Memorial Centre for Cancer and Inflammation Research, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, SAR, China
| | - Lili Duan
- MOE Key Laboratory of Evolution & Marine Biodiversity and Institute of Evolution & Marine Biodiversity, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao 266237, China
| | - Jiachen Zhang
- MOE Key Laboratory of Evolution & Marine Biodiversity and Institute of Evolution & Marine Biodiversity, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao 266237, China
| | - Zhe Zhang
- MOE Key Laboratory of Evolution & Marine Biodiversity and Institute of Evolution & Marine Biodiversity, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao 266237, China
| | - Khaled A S Al-Rasheid
- Zoology Department, College of Science, King Saud University, Riyadh 11451, Saudi Arabia
| | - Naomi A Stover
- Department of Biology, Bradley University, Peoria, IL 61625, USA
| | - Shan Gao
- MOE Key Laboratory of Evolution & Marine Biodiversity and Institute of Evolution & Marine Biodiversity, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Biology and Biotechnology, Qingdao Marine Science and Technology Center, Qingdao 266237, China
| |
Collapse
|
4
|
Calvo-Roitberg E, Daniels RF, Pai AA. Challenges in identifying mRNA transcript starts and ends from long-read sequencing data. Genome Res 2024; 34:1719-1734. [PMID: 39567236 DOI: 10.1101/gr.279559.124] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 08/16/2024] [Indexed: 11/22/2024]
Abstract
Long-read sequencing (LRS) technologies have the potential to revolutionize scientific discoveries in RNA biology through the comprehensive identification and quantification of full-length mRNA isoforms. Despite great promise, challenges remain in the widespread implementation of LRS technologies for RNA-based applications, including concerns about low coverage, high sequencing error, and robust computational pipelines. Although much focus has been placed on defining mRNA exon composition and structure with LRS data, less careful characterization has been done of the ability to assess the terminal ends of isoforms, specifically, transcription start and end sites. Such characterization is crucial for completely delineating full mRNA molecules and regulatory consequences. However, there are substantial inconsistencies in both start and end coordinates of LRS reads spanning a gene, such that LRS reads often fail to accurately recapitulate annotated or empirically derived terminal ends of mRNA molecules. Here, we describe the specific challenges of identifying and quantifying mRNA terminal ends with LRS technologies and how these issues influence biological interpretations of LRS data. We then review recent experimental and computational advances designed to alleviate these problems, with ideal use cases for each approach. Finally, we outline anticipated developments and necessary improvements for the characterization of terminal ends from LRS data.
Collapse
Affiliation(s)
- Ezequiel Calvo-Roitberg
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, Massachusetts 01605, USA
| | - Rachel F Daniels
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, Massachusetts 01605, USA
| | - Athma A Pai
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, Massachusetts 01605, USA
| |
Collapse
|
5
|
Xu Z, Zheng X, Fan J, Jiao Y, Huang S, Xie Y, Xu S, Lu Y, Liu A, Liu R, Yang Y, Luo GZ, Pan T, Wang X. Microbiome-induced reprogramming in post-transcriptional landscape using nanopore direct RNA sequencing. Cell Rep 2024; 43:114798. [PMID: 39365698 DOI: 10.1016/j.celrep.2024.114798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 08/10/2024] [Accepted: 09/10/2024] [Indexed: 10/06/2024] Open
Abstract
It has been widely recognized that the microbiota has the capacity to shape host gene expression and physiological functions. However, there remains a paucity of comprehensive study revealing the host transcriptional landscape regulated by the microbiota. Here, we comprehensively examined mRNA landscapes in mouse tissues (brain and cecum) from specific-pathogen-free and germ-free mice using nanopore direct RNA sequencing. Our results show that the microbiome has global influence on a host's RNA modifications (m6A, m5C, Ψ), isoform generation, poly(A) tail length, and transcript abundance in both brain and cecum tissues. Moreover, the microbiome exerts tissue-specific effects on various post-transcriptional regulatory processes. In addition, the microbiome impacts the coordination of multiple RNA modifications in host brain and cecum tissues. In conclusion, we establish the relationship between microbial regulation and gene expression. Our results help the understanding of the mechanisms by which the microbiome reprograms host gene expression.
Collapse
Affiliation(s)
- Zihe Xu
- Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China; School of Life Sciences, South China Normal University, Guangzhou 510631, China
| | - Xiaoqi Zheng
- Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China; School of Life Sciences, South China Normal University, Guangzhou 510631, China
| | - Jiajun Fan
- School of Life Sciences, South China Normal University, Guangzhou 510631, China
| | - Yuting Jiao
- School of Life Sciences, South China Normal University, Guangzhou 510631, China
| | - Sihao Huang
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, IL 60637, USA
| | - Yingyuan Xie
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China
| | - Shunlan Xu
- School of Life Sciences, South China Normal University, Guangzhou 510631, China
| | - Yi Lu
- Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China; School of Life Sciences, South China Normal University, Guangzhou 510631, China
| | - Anrui Liu
- Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China; School of Life Sciences, South China Normal University, Guangzhou 510631, China
| | - Runzhou Liu
- School of Life Sciences, South China Normal University, Guangzhou 510631, China
| | - Ying Yang
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Guan-Zheng Luo
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China
| | - Tao Pan
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, IL 60637, USA
| | - Xiaoyun Wang
- Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China; School of Life Sciences, South China Normal University, Guangzhou 510631, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
6
|
Lucas MC, Pryszcz LP, Medina R, Milenkovic I, Camacho N, Marchand V, Motorin Y, Ribas de Pouplana L, Novoa EM. Quantitative analysis of tRNA abundance and modifications by nanopore RNA sequencing. Nat Biotechnol 2024; 42:72-86. [PMID: 37024678 PMCID: PMC10791586 DOI: 10.1038/s41587-023-01743-6] [Citation(s) in RCA: 68] [Impact Index Per Article: 68.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 03/08/2023] [Indexed: 04/08/2023]
Abstract
Transfer RNAs (tRNAs) play a central role in protein translation. Studying them has been difficult in part because a simple method to simultaneously quantify their abundance and chemical modifications is lacking. Here we introduce Nano-tRNAseq, a nanopore-based approach to sequence native tRNA populations that provides quantitative estimates of both tRNA abundances and modification dynamics in a single experiment. We show that default nanopore sequencing settings discard the vast majority of tRNA reads, leading to poor sequencing yields and biased representations of tRNA abundances based on their transcript length. Re-processing of raw nanopore current intensity signals leads to a 12-fold increase in the number of recovered tRNA reads and enables recapitulation of accurate tRNA abundances. We then apply Nano-tRNAseq to Saccharomyces cerevisiae tRNA populations, revealing crosstalks and interdependencies between different tRNA modification types within the same molecule and changes in tRNA populations in response to oxidative stress.
Collapse
Affiliation(s)
- Morghan C Lucas
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Leszek P Pryszcz
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Rebeca Medina
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Ivan Milenkovic
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Noelia Camacho
- Institute for Research in Biomedicine (IRB), Barcelona, Spain
| | - Virginie Marchand
- CNRS-Université de Lorraine, UAR2008 IBSLor/UMR7365 IMoPA, Nancy, France
| | - Yuri Motorin
- CNRS-Université de Lorraine, UAR2008 IBSLor/UMR7365 IMoPA, Nancy, France
| | - Lluís Ribas de Pouplana
- Institute for Research in Biomedicine (IRB), Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
| | - Eva Maria Novoa
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
| |
Collapse
|
7
|
Yvon M, German TL, Ullman DE, Dasgupta R, Parker MH, Ben-Mahmoud S, Verdin E, Gognalons P, Ancelin A, Laï Kee Him J, Girard J, Vernerey MS, Fernandez E, Filloux D, Roumagnac P, Bron P, Michalakis Y, Blanc S. The genome of a bunyavirus cannot be defined at the level of the viral particle but only at the scale of the viral population. Proc Natl Acad Sci U S A 2023; 120:e2309412120. [PMID: 37983500 PMCID: PMC10691328 DOI: 10.1073/pnas.2309412120] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 10/21/2023] [Indexed: 11/22/2023] Open
Abstract
Bunyaviruses are enveloped negative or ambisense single-stranded RNA viruses with a genome divided into several segments. The canonical view depicts each viral particle packaging one copy of each genomic segment in one polarity named the viral strand. Several opposing observations revealed nonequal ratios of the segments, uneven number of segments per virion, and even packaging of viral complementary strands. Unfortunately, these observations result from studies often addressing other questions, on distinct viral species, and not using accurate quantitative methods. Hence, what RNA segments and strands are packaged as the genome of any bunyavirus remains largely ambiguous. We addressed this issue by first investigating the virion size distribution and RNA content in populations of the tomato spotted wilt virus (TSWV) using microscopy and tomography. These revealed heterogeneity in viral particle volume and amount of RNA content, with a surprising lack of correlation between the two. Then, the ratios of all genomic segments and strands were established using RNA sequencing and qRT-PCR. Within virions, both plus and minus strands (but no mRNA) are packaged for each of the three L, M, and S segments, in reproducible nonequimolar proportions determined by those in total cell extracts. These results show that virions differ in their genomic content but together build up a highly reproducible genetic composition of the viral population. This resembles the genome formula described for multipartite viruses, with which some species of the order Bunyavirales may share some aspects of the way of life, particularly emerging properties at a supravirion scale.
Collapse
Affiliation(s)
- Michel Yvon
- PHIM, Univ Montpellier, INRAE, CIRAD, IRD, Institut Agro, Montpellier34398, France
| | - Thomas L. German
- Department of Entomology, University of Wisconsin, Wisconsin53706, Madison
| | - Diane E. Ullman
- Department of Entomology and Nematology, University of California, California95616, Davis
| | - Ranjit Dasgupta
- Department of Entomology, University of Wisconsin, Wisconsin53706, Madison
| | - Maxwell H. Parker
- Department of Entomology, University of Wisconsin, Wisconsin53706, Madison
| | - Sulley Ben-Mahmoud
- Department of Entomology and Nematology, University of California, California95616, Davis
| | - Eric Verdin
- Pathologie végétale, INRAE, Avignon84143, France
| | | | - Aurélie Ancelin
- CBS, Univ Montpellier, CNRS, INSERM, Montpellier34090, France
| | | | - Justine Girard
- CBS, Univ Montpellier, CNRS, INSERM, Montpellier34090, France
| | | | - Emmanuel Fernandez
- PHIM, Univ Montpellier, INRAE, CIRAD, IRD, Institut Agro, Montpellier34398, France
| | - Denis Filloux
- PHIM, Univ Montpellier, INRAE, CIRAD, IRD, Institut Agro, Montpellier34398, France
| | - Philippe Roumagnac
- PHIM, Univ Montpellier, INRAE, CIRAD, IRD, Institut Agro, Montpellier34398, France
| | - Patrick Bron
- CBS, Univ Montpellier, CNRS, INSERM, Montpellier34090, France
| | | | - Stéphane Blanc
- PHIM, Univ Montpellier, INRAE, CIRAD, IRD, Institut Agro, Montpellier34398, France
| |
Collapse
|
8
|
Amaral P, Carbonell-Sala S, De La Vega FM, Faial T, Frankish A, Gingeras T, Guigo R, Harrow JL, Hatzigeorgiou AG, Johnson R, Murphy TD, Pertea M, Pruitt KD, Pujar S, Takahashi H, Ulitsky I, Varabyou A, Wells CA, Yandell M, Carninci P, Salzberg SL. The status of the human gene catalogue. Nature 2023; 622:41-47. [PMID: 37794265 PMCID: PMC10575709 DOI: 10.1038/s41586-023-06490-x] [Citation(s) in RCA: 52] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 07/27/2023] [Indexed: 10/06/2023]
Abstract
Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.
Collapse
Affiliation(s)
- Paulo Amaral
- INSPER Institute of Education and Research, Sao Paulo, Brazil
| | | | - Francisco M De La Vega
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
- Tempus Labs, Chicago, IL, USA
| | | | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Thomas Gingeras
- Department of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Roderic Guigo
- Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Jennifer L Harrow
- Centre for Genomics Research, Discovery Sciences, AstraZeneca, Royston, UK
| | - Artemis G Hatzigeorgiou
- Department of Computer Science and Biomedical Informatics, Universithy of Thessaly, Lamia, Greece
- Hellenic Pasteur Institute, Athens, Greece
| | - Rory Johnson
- School of Biology and Environmental Science, University College Dublin, Dublin, Ireland
- Conway Institute of Biomedical and Biomolecular Research, University College Dublin, Dublin, Ireland
- Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Department for BioMedical Research, University of Bern, Bern, Switzerland
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Shashikant Pujar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Hazuki Takahashi
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Igor Ulitsky
- Department of Immunology and Regenerative Biology, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| | - Ales Varabyou
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Christine A Wells
- Stem Cell Systems, Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, Victoria, Australia
| | - Mark Yandell
- Departent of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Piero Carninci
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
- Human Technopole, Milan, Italy.
| | - Steven L Salzberg
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
9
|
Chen Y, Sim A, Wan YK, Yeo K, Lee JJX, Ling MH, Love MI, Göke J. Context-aware transcript quantification from long-read RNA-seq data with Bambu. Nat Methods 2023; 20:1187-1195. [PMID: 37308696 PMCID: PMC10448944 DOI: 10.1038/s41592-023-01908-w] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 05/08/2023] [Indexed: 06/14/2023]
Abstract
Most approaches to transcript quantification rely on fixed reference annotations; however, the transcriptome is dynamic and depending on the context, such static annotations contain inactive isoforms for some genes, whereas they are incomplete for others. Here we present Bambu, a method that performs machine-learning-based transcript discovery to enable quantification specific to the context of interest using long-read RNA-sequencing. To identify novel transcripts, Bambu estimates the novel discovery rate, which replaces arbitrary per-sample thresholds with a single, interpretable, precision-calibrated parameter. Bambu retains the full-length and unique read counts, enabling accurate quantification in presence of inactive isoforms. Compared to existing methods for transcript discovery, Bambu achieves greater precision without sacrificing sensitivity. We show that context-aware annotations improve quantification for both novel and known transcripts. We apply Bambu to quantify isoforms from repetitive HERVH-LTR7 retrotransposons in human embryonic stem cells, demonstrating the ability for context-specific transcript expression analysis.
Collapse
Affiliation(s)
- Ying Chen
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Andre Sim
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Yuk Kei Wan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore
| | - Keith Yeo
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Joseph Jing Xian Lee
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Min Hao Ling
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Jonathan Göke
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore.
- Department of Statistics and Data Science, National University of Singapore, Singapore, Republic of Singapore.
| |
Collapse
|
10
|
Zheng P, Zhou C, Ding Y, Liu B, Lu L, Zhu F, Duan S. Nanopore sequencing technology and its applications. MedComm (Beijing) 2023; 4:e316. [PMID: 37441463 PMCID: PMC10333861 DOI: 10.1002/mco2.316] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 05/29/2023] [Accepted: 05/31/2023] [Indexed: 07/15/2023] Open
Abstract
Since the development of Sanger sequencing in 1977, sequencing technology has played a pivotal role in molecular biology research by enabling the interpretation of biological genetic codes. Today, nanopore sequencing is one of the leading third-generation sequencing technologies. With its long reads, portability, and low cost, nanopore sequencing is widely used in various scientific fields including epidemic prevention and control, disease diagnosis, and animal and plant breeding. Despite initial concerns about high error rates, continuous innovation in sequencing platforms and algorithm analysis technology has effectively addressed its accuracy. During the coronavirus disease (COVID-19) pandemic, nanopore sequencing played a critical role in detecting the severe acute respiratory syndrome coronavirus-2 virus genome and containing the pandemic. However, a lack of understanding of this technology may limit its popularization and application. Nanopore sequencing is poised to become the mainstream choice for preventing and controlling COVID-19 and future epidemics while creating value in other fields such as oncology and botany. This work introduces the contributions of nanopore sequencing during the COVID-19 pandemic to promote public understanding and its use in emerging outbreaks worldwide. We discuss its application in microbial detection, cancer genomes, and plant genomes and summarize strategies to improve its accuracy.
Collapse
Affiliation(s)
- Peijie Zheng
- Department of Clinical MedicineSchool of MedicineZhejiang University City CollegeHangzhouChina
| | - Chuntao Zhou
- Department of Clinical MedicineSchool of MedicineZhejiang University City CollegeHangzhouChina
| | - Yuemin Ding
- Department of Clinical MedicineSchool of MedicineZhejiang University City CollegeHangzhouChina
- Institute of Translational Medicine, School of MedicineZhejiang University City CollegeHangzhouChina
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, School of MedicineZhejiang University City CollegeHangzhouChina
| | - Bin Liu
- Department of Clinical MedicineSchool of MedicineZhejiang University City CollegeHangzhouChina
| | - Liuyi Lu
- Department of Clinical MedicineSchool of MedicineZhejiang University City CollegeHangzhouChina
| | - Feng Zhu
- Department of Clinical MedicineSchool of MedicineZhejiang University City CollegeHangzhouChina
| | - Shiwei Duan
- Department of Clinical MedicineSchool of MedicineZhejiang University City CollegeHangzhouChina
- Institute of Translational Medicine, School of MedicineZhejiang University City CollegeHangzhouChina
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, School of MedicineZhejiang University City CollegeHangzhouChina
| |
Collapse
|
11
|
Mastrorosa FK, Miller DE, Eichler EE. Applications of long-read sequencing to Mendelian genetics. Genome Med 2023; 15:42. [PMID: 37316925 PMCID: PMC10266321 DOI: 10.1186/s13073-023-01194-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 05/18/2023] [Indexed: 06/16/2023] Open
Abstract
Advances in clinical genetic testing, including the introduction of exome sequencing, have uncovered the molecular etiology for many rare and previously unsolved genetic disorders, yet more than half of individuals with a suspected genetic disorder remain unsolved after complete clinical evaluation. A precise genetic diagnosis may guide clinical treatment plans, allow families to make informed care decisions, and permit individuals to participate in N-of-1 trials; thus, there is high interest in developing new tools and techniques to increase the solve rate. Long-read sequencing (LRS) is a promising technology for both increasing the solve rate and decreasing the amount of time required to make a precise genetic diagnosis. Here, we summarize current LRS technologies, give examples of how they have been used to evaluate complex genetic variation and identify missing variants, and discuss future clinical applications of LRS. As costs continue to decrease, LRS will find additional utility in the clinical space fundamentally changing how pathological variants are discovered and eventually acting as a single-data source that can be interrogated multiple times for clinical service.
Collapse
Affiliation(s)
| | - Danny E Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington and Seattle Children's Hospital, Seattle, WA, 98195, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, 98195, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, 98195, USA.
| |
Collapse
|
12
|
Amaral P, Carbonell-Sala S, De La Vega FM, Faial T, Frankish A, Gingeras T, Guigo R, Harrow JL, Hatzigeorgiou AG, Johnson R, Murphy TD, Pertea M, Pruitt KD, Pujar S, Takahashi H, Ulitsky I, Varabyou A, Wells CA, Yandell M, Carninci P, Salzberg SL. The status of the human gene catalogue. ARXIV 2023:arXiv:2303.13996v1. [PMID: 36994150 PMCID: PMC10055485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
Scientists have been trying to identify all of the genes in the human genome since the initial draft of the genome was published in 2001. Over the intervening years, much progress has been made in identifying protein-coding genes, and the estimated number has shrunk to fewer than 20,000, although the number of distinct protein-coding isoforms has expanded dramatically. The invention of high-throughput RNA sequencing and other technological breakthroughs have led to an explosion in the number of reported non-coding RNA genes, although most of them do not yet have any known function. A combination of recent advances offers a path forward to identifying these functions and towards eventually completing the human gene catalogue. However, much work remains to be done before we have a universal annotation standard that includes all medically significant genes, maintains their relationships with different reference genomes, and describes clinically relevant genetic variants.
Collapse
Affiliation(s)
- Paulo Amaral
- INSPER Institute of Education and Research, São Paulo, SP, Brasil
| | - Silvia Carbonell-Sala
- Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Catalonia, Spain
| | - Francisco M. De La Vega
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA; Tempus Labs, Inc., Chicago, IL
| | | | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thomas Gingeras
- Department of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
| | - Roderic Guigo
- Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Jennifer L Harrow
- Centre for Genomics Research, Discovery Sciences, AstraZeneca, Da Vinci Building. Melbourn Science Park, Royston UK SG8 6HB
| | - Artemis G. Hatzigeorgiou
- Universithy of Thessaly, Department of Computer Science and Biomedical Informatics, Lamia, Greece; Hellenic Pasteur Institute, Athens, Greece
| | - Rory Johnson
- School of Biology and Environmental Science, University College Dublin, D04 V1W8 Dublin, Ireland; Conway Institute of Biomedical and Biomolecular Research, University College Dublin, D04 V1W8 Dublin, Ireland; Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, 3010 Bern, Switzerland; Department for BioMedical Research, University of Bern, 3008 Bern, Switzerland
| | - Terence D. Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Kim D. Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Shashikant Pujar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Hazuki Takahashi
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama Kanagawa 230-0045 Japan
| | - Igor Ulitsky
- Department of Immunology and Regenerative Biology; Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Ales Varabyou
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Christine A. Wells
- Stem Cell Systems, Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville 3010 Vic Australia
| | - Mark Yandell
- Departent of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Piero Carninci
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Human Technopole, via Rita Levi Montalcini 1, Milan 20157 Italy
| | - Steven L. Salzberg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Immunology and Regenerative Biology; Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot 76100, Israel
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
13
|
Castaldi PJ, Abood A, Farber CR, Sheynkman GM. Bridging the splicing gap in human genetics with long-read RNA sequencing: finding the protein isoform drivers of disease. Hum Mol Genet 2022; 31:R123-R136. [PMID: 35960994 PMCID: PMC9585682 DOI: 10.1093/hmg/ddac196] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 02/04/2023] Open
Abstract
Aberrant splicing underlies many human diseases, including cancer, cardiovascular diseases and neurological disorders. Genome-wide mapping of splicing quantitative trait loci (sQTLs) has shown that genetic regulation of alternative splicing is widespread. However, identification of the corresponding isoform or protein products associated with disease-associated sQTLs is challenging with short-read RNA-seq, which cannot precisely characterize full-length transcript isoforms. Furthermore, contemporary sQTL interpretation often relies on reference transcript annotations, which are incomplete. Solutions to these issues may be found through integration of newly emerging long-read sequencing technologies. Long-read sequencing offers the capability to sequence full-length mRNA transcripts and, in some cases, to link sQTLs to transcript isoforms containing disease-relevant protein alterations. Here, we provide an overview of sQTL mapping approaches, the use of long-read sequencing to characterize sQTL effects on isoforms, the linkage of RNA isoforms to protein-level functions and comment on future directions in the field. Based on recent progress, long-read RNA sequencing promises to be part of the human disease genetics toolkit to discover and treat protein isoforms causing rare and complex diseases.
Collapse
Affiliation(s)
- Peter J Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Division of General Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Abdullah Abood
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Charles R Farber
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Gloria M Sheynkman
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22903, USA
- UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA 22903, USA
| |
Collapse
|
14
|
Affiliation(s)
- Miten Jain
- Northeastern University, Boston, MA, USA.
| | | | | | - Mark Akeson
- University of California, Santa Cruz, CA, USA.
| |
Collapse
|
15
|
Ugolini C, Mulroney L, Leger A, Castelli M, Criscuolo E, Williamson MK, Davidson AD, Almuqrin A, Giambruno R, Jain M, Frigè G, Olsen H, Tzertzinis G, Schildkraut I, Wulf MG, Corrêa IR, Ettwiller L, Clementi N, Clementi M, Mancini N, Birney E, Akeson M, Nicassio F, Matthews D, Leonardi T. Nanopore ReCappable sequencing maps SARS-CoV-2 5' capping sites and provides new insights into the structure of sgRNAs. Nucleic Acids Res 2022; 50:3475-3489. [PMID: 35244721 PMCID: PMC8989550 DOI: 10.1093/nar/gkac144] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 02/05/2022] [Accepted: 02/16/2022] [Indexed: 01/09/2023] Open
Abstract
The SARS-CoV-2 virus has a complex transcriptome characterised by multiple, nested subgenomic RNAsused to express structural and accessory proteins. Long-read sequencing technologies such as nanopore direct RNA sequencing can recover full-length transcripts, greatly simplifying the assembly of structurally complex RNAs. However, these techniques do not detect the 5' cap, thus preventing reliable identification and quantification of full-length, coding transcript models. Here we used Nanopore ReCappable Sequencing (NRCeq), a new technique that can identify capped full-length RNAs, to assemble a complete annotation of SARS-CoV-2 sgRNAs and annotate the location of capping sites across the viral genome. We obtained robust estimates of sgRNA expression across cell lines and viral isolates and identified novel canonical and non-canonical sgRNAs, including one that uses a previously un-annotated leader-to-body junction site. The data generated in this work constitute a useful resource for the scientific community and provide important insights into the mechanisms that regulate the transcription of SARS-CoV-2 sgRNAs.
Collapse
Affiliation(s)
- Camilla Ugolini
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia, 20139 Milano, Italy
| | - Logan Mulroney
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia, 20139 Milano, Italy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
- Biomolecular Engineering Department, UC Santa Cruz, CA 95064, USA
| | - Adrien Leger
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Matteo Castelli
- Laboratory of Microbiology and Virology, Vita-Salute San Raffaele University; via Olgettina 58, 20132 Milan, Italy
| | - Elena Criscuolo
- Laboratory of Microbiology and Virology, Vita-Salute San Raffaele University; via Olgettina 58, 20132 Milan, Italy
| | - Maia Kavanagh Williamson
- School of Cellular and Molecular Medicine, Faculty of Life Sciences, University Walk, University of Bristol, Bristol BS8 1TD, UK
| | - Andrew D Davidson
- School of Cellular and Molecular Medicine, Faculty of Life Sciences, University Walk, University of Bristol, Bristol BS8 1TD, UK
| | - Abdulaziz Almuqrin
- School of Cellular and Molecular Medicine, Faculty of Life Sciences, University Walk, University of Bristol, Bristol BS8 1TD, UK
- Department of Clinical Laboratory Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Roberto Giambruno
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia, 20139 Milano, Italy
| | - Miten Jain
- Biomolecular Engineering Department, UC Santa Cruz, CA 95064, USA
| | - Gianmaria Frigè
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, 20139 Milano, Italy
| | - Hugh Olsen
- Biomolecular Engineering Department, UC Santa Cruz, CA 95064, USA
| | | | | | | | | | | | - Nicola Clementi
- Laboratory of Microbiology and Virology, Vita-Salute San Raffaele University; via Olgettina 58, 20132 Milan, Italy
- Laboratory of Medical Microbiology and Virology, IRCCS San Raffaele Scientific Institute; via Olgettina 60, 20132 Milan, Italy
| | - Massimo Clementi
- Laboratory of Microbiology and Virology, Vita-Salute San Raffaele University; via Olgettina 58, 20132 Milan, Italy
- Laboratory of Medical Microbiology and Virology, IRCCS San Raffaele Scientific Institute; via Olgettina 60, 20132 Milan, Italy
| | - Nicasio Mancini
- Laboratory of Microbiology and Virology, Vita-Salute San Raffaele University; via Olgettina 58, 20132 Milan, Italy
- Laboratory of Medical Microbiology and Virology, IRCCS San Raffaele Scientific Institute; via Olgettina 60, 20132 Milan, Italy
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Mark Akeson
- Biomolecular Engineering Department, UC Santa Cruz, CA 95064, USA
| | - Francesco Nicassio
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia, 20139 Milano, Italy
| | - David A Matthews
- School of Cellular and Molecular Medicine, Faculty of Life Sciences, University Walk, University of Bristol, Bristol BS8 1TD, UK
| | - Tommaso Leonardi
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia, 20139 Milano, Italy
| |
Collapse
|
16
|
Gleeson J, Leger A, Prawer YDJ, Lane TA, Harrison PJ, Haerty W, Clark MB. Accurate expression quantification from nanopore direct RNA sequencing with NanoCount. Nucleic Acids Res 2022; 50:e19. [PMID: 34850115 PMCID: PMC8886870 DOI: 10.1093/nar/gkab1129] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 10/23/2021] [Accepted: 10/27/2021] [Indexed: 11/13/2022] Open
Abstract
Accurately quantifying gene and isoform expression changes is essential to understanding cell functions, differentiation and disease. Sequencing full-length native RNAs using long-read direct RNA sequencing (DRS) has the potential to overcome many limitations of short and long-read sequencing methods that require RNA fragmentation, cDNA synthesis or PCR. However, there are a lack of tools specifically designed for DRS and its ability to identify differential expression in complex organisms is poorly characterised. We developed NanoCount for fast, accurate transcript isoform quantification in DRS and demonstrate it outperforms similar methods. Using synthetic controls and human SH-SY5Y cell differentiation into neuron-like cells, we show that DRS accurately quantifies RNA expression and identifies differential expression of genes and isoforms. Differential expression of 231 genes, 333 isoforms, plus 27 isoform switches were detected between undifferentiated and differentiated SH-SY5Y cells and samples clustered by differentiation state at the gene and isoform level. Genes upregulated in neuron-like cells were associated with neurogenesis. NanoCount quantification of thousands of novel isoforms discovered with DRS likewise enabled identification of their differential expression. Our results demonstrate enhanced DRS isoform quantification with NanoCount and establish the ability of DRS to identify biologically relevant differential expression of genes and isoforms.
Collapse
Affiliation(s)
- Josie Gleeson
- Centre for Stem Cell Systems, Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC, Australia
| | - Adrien Leger
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Yair D J Prawer
- Centre for Stem Cell Systems, Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC, Australia
| | - Tracy A Lane
- Department of Psychiatry, University of Oxford, Oxford, UK
| | - Paul J Harrison
- Department of Psychiatry, University of Oxford, Oxford, UK
- Oxford Health NHS Foundation Trust, Oxford, UK
| | - Wilfried Haerty
- The Earlham Institute, Norwich, UK
- School of Biological Sciences, University of East Anglia, Norwich, UK
| | - Michael B Clark
- Centre for Stem Cell Systems, Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC, Australia
- Department of Psychiatry, University of Oxford, Oxford, UK
| |
Collapse
|