1
|
Lu Z, Xiao X, Zheng Q, Wang X, Xu L. Assessing next-generation sequencing-based computational methods for predicting transcriptional regulators with query gene sets. Brief Bioinform 2024; 25:bbae366. [PMID: 39082650 PMCID: PMC11289684 DOI: 10.1093/bib/bbae366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 06/21/2024] [Accepted: 07/18/2024] [Indexed: 08/03/2024] Open
Abstract
This article provides an in-depth review of computational methods for predicting transcriptional regulators (TRs) with query gene sets. Identification of TRs is of utmost importance in many biological applications, including but not limited to elucidating biological development mechanisms, identifying key disease genes, and predicting therapeutic targets. Various computational methods based on next-generation sequencing (NGS) data have been developed in the past decade, yet no systematic evaluation of NGS-based methods has been offered. We classified these methods into two categories based on shared characteristics, namely library-based and region-based methods. We further conducted benchmark studies to evaluate the accuracy, sensitivity, coverage, and usability of NGS-based methods with molecular experimental datasets. Results show that BART, ChIP-Atlas, and Lisa have relatively better performance. Besides, we point out the limitations of NGS-based methods and explore potential directions for further improvement.
Collapse
Affiliation(s)
- Zeyu Lu
- Department of Statistics and Data Science, Moody School of Graduate and Advanced Studies, Southern Methodist University, 3225 Daniel Ave., P.O. Box 750332, Dallas, TX, United States
| | - Xue Xiao
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, United States
| | - Qiang Zheng
- Division of Data Science, College of Science, University of Texas at Arlington, 501 S. Nedderman Dr., Arlington, TX 76019, United States
| | - Xinlei Wang
- Division of Data Science, College of Science, University of Texas at Arlington, 501 S. Nedderman Dr., Arlington, TX 76019, United States
- Department of Mathematics, University of Texas at Arlington, 411 S. Nedderman Dr., Arlington, TX 76019, United States
| | - Lin Xu
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, United States
- Department of Pediatrics, Division of Hematology/Oncology, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, United States
| |
Collapse
|
2
|
Shine M, Gordon J, Schärfen L, Zigackova D, Herzel L, Neugebauer KM. Co-transcriptional gene regulation in eukaryotes and prokaryotes. Nat Rev Mol Cell Biol 2024; 25:534-554. [PMID: 38509203 PMCID: PMC11199108 DOI: 10.1038/s41580-024-00706-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2024] [Indexed: 03/22/2024]
Abstract
Many steps of RNA processing occur during transcription by RNA polymerases. Co-transcriptional activities are deemed commonplace in prokaryotes, in which the lack of membrane barriers allows mixing of all gene expression steps, from transcription to translation. In the past decade, an extraordinary level of coordination between transcription and RNA processing has emerged in eukaryotes. In this Review, we discuss recent developments in our understanding of co-transcriptional gene regulation in both eukaryotes and prokaryotes, comparing methodologies and mechanisms, and highlight striking parallels in how RNA polymerases interact with the machineries that act on nascent RNA. The development of RNA sequencing and imaging techniques that detect transient transcription and RNA processing intermediates has facilitated discoveries of transcription coordination with splicing, 3'-end cleavage and dynamic RNA folding and revealed physical contacts between processing machineries and RNA polymerases. Such studies indicate that intron retention in a given nascent transcript can prevent 3'-end cleavage and cause transcriptional readthrough, which is a hallmark of eukaryotic cellular stress responses. We also discuss how coordination between nascent RNA biogenesis and transcription drives fundamental aspects of gene expression in both prokaryotes and eukaryotes.
Collapse
Affiliation(s)
- Morgan Shine
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jackson Gordon
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Leonard Schärfen
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Dagmar Zigackova
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Lydia Herzel
- Department of Biology, Chemistry, and Pharmacy, Freie Universität Berlin, Berlin, Germany.
| | - Karla M Neugebauer
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.
| |
Collapse
|
3
|
Chen J, Zhou M, Wu W, Zhang J, Li Y, Li D. STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics. ARXIV 2024:arXiv:2406.06393v2. [PMID: 38947920 PMCID: PMC11213178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Recent advances in multi-modal algorithms have driven and been driven by the increasing availability of large image-text datasets, leading to significant strides in various fields, including computational pathology. However, in most existing medical image-text datasets, the text typically provides high-level summaries that may not sufficiently describe sub-tile regions within a large pathology image. For example, an image might cover an extensive tissue area containing cancerous and healthy regions, but the accompanying text might only specify that this image is a cancer slide, lacking the nuanced details needed for in-depth analysis. In this study, we introduce STimage-1K4M, a novel dataset designed to bridge this gap by providing genomic features for sub-tile images. STimage-1K4M contains 1,149 images derived from spatial transcriptomics data, which captures gene expression information at the level of individual spatial spots within a pathology image. Specifically, each image in the dataset is broken down into smaller sub-image tiles, with each tile paired with 15,000 - 30,000 dimensional gene expressions. With 4,293,195 pairs of sub-tile images and gene expressions, STimage-1K4M offers unprecedented granularity, paving the way for a wide range of advanced research in multi-modal data analysis an innovative applications in computational pathology, and beyond.
Collapse
Affiliation(s)
| | | | - Wenrong Wu
- University of North Carolina at Chapel Hill
| | | | - Yun Li
- University of North Carolina at Chapel Hill
| | - Didong Li
- University of North Carolina at Chapel Hill
| |
Collapse
|
4
|
Yang TT, Zhang JR, Xie ZH, Ren ZL, Yan JW, Ni M. Nanopore sequencing of forensic short tandem repeats using QNome of Qitan Technology. Electrophoresis 2024. [PMID: 38884206 DOI: 10.1002/elps.202300270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 03/21/2024] [Accepted: 04/09/2024] [Indexed: 06/18/2024]
Abstract
Devices of nanopore sequencing can be highly portable and of low cost. Thus, nanopore sequencing is promising in in-field forensic applications. Previous investigations have demonstrated that nanopore sequencing is feasible for genotyping forensic short tandem repeats (STRs) by using sequencers of Oxford Nanopore Technologies. Recently, Qitan Technology launched a new portable nanopore sequencer and became the second supplier in the world. Here, for the first time, we assess the QNome (QNome-3841) for its accuracy in nanopore sequencing of STRs and compare with MinION (MinION Mk1B). We profile 54 STRs of 21 unrelated individuals and 2800M standard DNA. The overall accuracy for diploid STRs and haploid STRs were 53.5% (378 of 706) and 82.7% (134 of 162), respectively, by using QNome. The accuracies were remarkably lower than those of MinION (diploid STRs, 84.5%; haploid, 90.7%), with a similar amount of sequencing data and identical bioinformatics analysis. Although it was not reliable for diploid STRs typing by using QNome, the haploid STRs were consistently correctly typed. The majority of errors (58.8%) in QNome-based STR typing were one-repeat deviations of repeat units in the error from true allele, related with homopolymers in repeats of STRs.
Collapse
Affiliation(s)
- Ting-Ting Yang
- School of Forensic Medicine, Shanxi Medical University, Jinzhong, P. R. China
- Institute of Health Service and Transfusion Medicine, Beijing, P. R. China
- Shanxi Key Laboratory of Forensic Medicine, Jinzhong, P. R. China
| | - Jia-Rong Zhang
- School of Forensic Medicine, Shanxi Medical University, Jinzhong, P. R. China
- Institute of Health Service and Transfusion Medicine, Beijing, P. R. China
- Shanxi Key Laboratory of Forensic Medicine, Jinzhong, P. R. China
| | - Zi-Han Xie
- Institute of Health Service and Transfusion Medicine, Beijing, P. R. China
- School of Life Science, Beijing University of Chemical Technology, Beijing, P. R. China
| | - Zi-Lin Ren
- Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Changchun, P. R. China
- School of Information Science and Technology, and Institution of Computational Biology, Northeast Normal University, Changchun, P. R. China
| | - Jiang-Wei Yan
- School of Forensic Medicine, Shanxi Medical University, Jinzhong, P. R. China
- Shanxi Key Laboratory of Forensic Medicine, Jinzhong, P. R. China
| | - Ming Ni
- Institute of Health Service and Transfusion Medicine, Beijing, P. R. China
| |
Collapse
|
5
|
Yuan CU, Quah FX, Hemberg M. Single-cell and spatial transcriptomics: Bridging current technologies with long-read sequencing. Mol Aspects Med 2024; 96:101255. [PMID: 38368637 DOI: 10.1016/j.mam.2024.101255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 01/30/2024] [Accepted: 02/07/2024] [Indexed: 02/20/2024]
Abstract
Single-cell technologies have transformed biomedical research over the last decade, opening up new possibilities for understanding cellular heterogeneity, both at the genomic and transcriptomic level. In addition, more recent developments of spatial transcriptomics technologies have made it possible to profile cells in their tissue context. In parallel, there have been substantial advances in sequencing technologies, and the third generation of methods are able to produce reads that are tens of kilobases long, with error rates matching the second generation short reads. Long reads technologies make it possible to better map large genome rearrangements and quantify isoform specific abundances. This further improves our ability to characterize functionally relevant heterogeneity. Here, we show how researchers have begun to combine single-cell, spatial transcriptomics, and long-read technologies, and how this is resulting in powerful new approaches to profiling both the genome and the transcriptome. We discuss the achievements so far, and we highlight remaining challenges and opportunities.
Collapse
Affiliation(s)
- Chengwei Ulrika Yuan
- Department of Biochemistry, University of Cambridge, Cambridge, UK; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Fu Xiang Quah
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Martin Hemberg
- Gene Lay Institute, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
6
|
Xie Y, Chan LY, Cheung MY, Li MW, Lam HM. Current technical advancements in plant epitranscriptomic studies. THE PLANT GENOME 2023; 16:e20316. [PMID: 36890704 DOI: 10.1002/tpg2.20316] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 02/05/2023] [Indexed: 06/18/2023]
Abstract
The growth and development of plants are the result of the interplay between the internal developmental programming and plant-environment interactions. Gene expression regulations in plants are made up of multi-level networks. In the past few years, many studies were carried out on co- and post-transcriptional RNA modifications, which, together with the RNA community, are collectively known as the "epitranscriptome." The epitranscriptomic machineries were identified and their functional impacts characterized in a broad range of physiological processes in diverse plant species. There is mounting evidence to suggest that the epitranscriptome provides an additional layer in the gene regulatory network for plant development and stress responses. In the present review, we summarized the epitranscriptomic modifications found so far in plants, including chemical modifications, RNA editing, and transcript isoforms. The various approaches to RNA modification detection were described, with special emphasis on the recent development and application potential of third-generation sequencing. The roles of epitranscriptomic changes in gene regulation during plant-environment interactions were discussed in case studies. This review aims to highlight the importance of epitranscriptomics in the study of gene regulatory networks in plants and to encourage multi-omics investigations using the recent technical advancements.
Collapse
Affiliation(s)
- Yichun Xie
- School of Life Sciences and Centre for Soybean Research of the State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Long-Yiu Chan
- School of Life Sciences and Centre for Soybean Research of the State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Ming-Yan Cheung
- School of Life Sciences and Centre for Soybean Research of the State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Man-Wah Li
- School of Life Sciences and Centre for Soybean Research of the State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Hon-Ming Lam
- School of Life Sciences and Centre for Soybean Research of the State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| |
Collapse
|
7
|
Han SW, Jewell S, Thomas-Tikhonenko A, Barash Y. Contrasting and Combining Transcriptome Complexity Captured by Short and Long RNA Sequencing Reads. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.21.568046. [PMID: 38045232 PMCID: PMC10690182 DOI: 10.1101/2023.11.21.568046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Mapping transcriptomic variations using either short or long reads RNA sequencing is a staple of genomic research. Long reads are able to capture entire isoforms and overcome repetitive regions, while short reads still provides improved coverage and error rates. Yet how to quantitatively compare the technologies, can we combine those, and what may be the benefit of such a combined view remain open questions. We tackle these questions by first creating a pipeline to assess matched long and short reads data using a variety of transcriptome statistics. We find that across datasets, algorithms and technologies, matched short reads data detects roughly 50% more splice junctions, with 10-30% of the splice junctions included at 20% or more are missed by long reads. In contrast, long reads detect many more intron retention events, pointing to the benefit of combining the technologies. We introduce MAJIQ-L, an extension of the MAJIQ software to enable a unified view of transcriptome variations from both technologies and demonstrate its benefits. Our software can be used to assess any future long reads technology or algorithm, and combine it with short reads data for improved transcriptome analysis.
Collapse
Affiliation(s)
- Seong Woo Han
- Department of Computer and Information Sciences, School of Engineering, University of Pennsylvania
| | - San Jewell
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania
| | - Andrei Thomas-Tikhonenko
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia
| | - Yoseph Barash
- Department of Computer and Information Sciences, School of Engineering, University of Pennsylvania
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania
| |
Collapse
|
8
|
Ojala T, Häkkinen AE, Kankuri E, Kankainen M. Current concepts, advances, and challenges in deciphering the human microbiota with metatranscriptomics. Trends Genet 2023; 39:686-702. [PMID: 37365103 DOI: 10.1016/j.tig.2023.05.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 05/24/2023] [Accepted: 05/25/2023] [Indexed: 06/28/2023]
Abstract
Metatranscriptomics refers to the analysis of the collective microbial transcriptome of a sample. Its increased utilization for the characterization of human-associated microbial communities has enabled the discovery of many disease-state related microbial activities. Here, we review the principles of metatranscriptomics-based analysis of human-associated microbial samples. We describe strengths and weaknesses of popular sample preparation, sequencing, and bioinformatics approaches and summarize strategies for their use. We then discuss how human-associated microbial communities have recently been examined and how their characterization may change. We conclude that metatranscriptomics insights into human microbiotas under health and disease have not only expanded our knowledge on human health, but also opened avenues for rational antimicrobial drug use and disease management.
Collapse
Affiliation(s)
- Teija Ojala
- Department of Pharmacology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | | | - Esko Kankuri
- Department of Pharmacology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Matti Kankainen
- Hematology Research Unit, University of Helsinki, Helsinki, Finland; Laboratory of Genetics, HUS Diagnostic Center, Hospital District of Helsinki and Uusimaa (HUS), Helsinki, Finland.
| |
Collapse
|
9
|
Hook PW, Timp W. Beyond assembly: the increasing flexibility of single-molecule sequencing technology. Nat Rev Genet 2023; 24:627-641. [PMID: 37161088 PMCID: PMC10169143 DOI: 10.1038/s41576-023-00600-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2023] [Indexed: 05/11/2023]
Abstract
The maturation of high-throughput short-read sequencing technology over the past two decades has shaped the way genomes are studied. Recently, single-molecule, long-read sequencing has emerged as an essential tool in deciphering genome structure and function, including filling gaps in the human reference genome, measuring the epigenome and characterizing splicing variants in the transcriptome. With recent technological developments, these single-molecule technologies have moved beyond genome assembly and are being used in a variety of ways, including to selectively sequence specific loci with long reads, measure chromatin state and protein-DNA binding in order to investigate the dynamics of gene regulation, and rapidly determine copy number variation. These increasingly flexible uses of single-molecule technologies highlight a young and fast-moving part of the field that is leading to a more accessible era of nucleic acid sequencing.
Collapse
Affiliation(s)
- Paul W Hook
- Department of Biomedical Engineering, Molecular Biology and Genetics, and Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Winston Timp
- Department of Biomedical Engineering, Molecular Biology and Genetics, and Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
10
|
Calvo-Roitberg E, Daniels RF, Pai AA. Challenges in identifying mRNA transcript starts and ends from long-read sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.26.550536. [PMID: 37546743 PMCID: PMC10402045 DOI: 10.1101/2023.07.26.550536] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Long-read sequencing (LRS) technologies have the potential to revolutionize scientific discoveries in RNA biology, especially by enabling the comprehensive identification and quantification of full length mRNA isoforms. However, inherently high error rates make the analysis of long-read sequencing data challenging. While these error rates have been characterized for sequence and splice site identification, it is still unclear how accurately LRS reads represent transcript start and end sites. Here, we systematically assess the variability and accuracy of mRNA terminal ends identified by LRS reads across multiple sequencing platforms. We find substantial inconsistencies in both the start and end coordinates of LRS reads spanning a gene, such that LRS reads often fail to accurately recapitulate annotated or empirically derived terminal ends of mRNA molecules. To address this challenge, we introduce an approach to condition reads based on empirically derived terminal ends and identified a subset of reads that are more likely to represent full-length transcripts. Our approach can improve transcriptome analyses by enhancing the fidelity of transcript terminal end identification, but may result in lower power to quantify genes or discover novel isoforms. Thus, it is necessary to be cautious when selecting sequencing approaches and/or interpreting data from long-read RNA sequencing.
Collapse
Affiliation(s)
| | - Rachel F Daniels
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA
| | - Athma A Pai
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA
| |
Collapse
|
11
|
Stokes T, Cen HH, Kapranov P, Gallagher IJ, Pitsillides AA, Volmar C, Kraus WE, Johnson JD, Phillips SM, Wahlestedt C, Timmons JA. Transcriptomics for Clinical and Experimental Biology Research: Hang on a Seq. ADVANCED GENETICS (HOBOKEN, N.J.) 2023; 4:2200024. [PMID: 37288167 PMCID: PMC10242409 DOI: 10.1002/ggn2.202200024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Indexed: 06/09/2023]
Abstract
Sequencing the human genome empowers translational medicine, facilitating transcriptome-wide molecular diagnosis, pathway biology, and drug repositioning. Initially, microarrays are used to study the bulk transcriptome; but now short-read RNA sequencing (RNA-seq) predominates. Positioned as a superior technology, that makes the discovery of novel transcripts routine, most RNA-seq analyses are in fact modeled on the known transcriptome. Limitations of the RNA-seq methodology have emerged, while the design of, and the analysis strategies applied to, arrays have matured. An equitable comparison between these technologies is provided, highlighting advantages that modern arrays hold over RNA-seq. Array protocols more accurately quantify constitutively expressed protein coding genes across tissue replicates, and are more reliable for studying lower expressed genes. Arrays reveal long noncoding RNAs (lncRNA) are neither sparsely nor lower expressed than protein coding genes. Heterogeneous coverage of constitutively expressed genes observed with RNA-seq, undermines the validity and reproducibility of pathway analyses. The factors driving these observations, many of which are relevant to long-read or single-cell sequencing are discussed. As proposed herein, a reappreciation of bulk transcriptomic methods is required, including wider use of the modern high-density array data-to urgently revise existing anatomical RNA reference atlases and assist with more accurate study of lncRNAs.
Collapse
Affiliation(s)
- Tanner Stokes
- Faculty of ScienceMcMaster UniversityHamiltonL8S 4L8Canada
| | - Haoning Howard Cen
- Life Sciences InstituteUniversity of British ColumbiaVancouverV6T 1Z3Canada
| | | | - Iain J Gallagher
- School of Applied SciencesEdinburgh Napier UniversityEdinburghEH11 4BNUK
| | | | | | | | - James D. Johnson
- Life Sciences InstituteUniversity of British ColumbiaVancouverV6T 1Z3Canada
| | | | | | - James A. Timmons
- Miller School of MedicineUniversity of MiamiMiamiFL33136USA
- William Harvey Research InstituteQueen Mary University LondonLondonEC1M 6BQUK
- Augur Precision Medicine LTDStirlingFK9 5NFUK
| |
Collapse
|
12
|
Joglekar A, Foord C, Jarroux J, Pollard S, Tilgner HU. From words to complete phrases: insight into single-cell isoforms using short and long reads. Transcription 2023; 14:92-104. [PMID: 37314295 PMCID: PMC10807471 DOI: 10.1080/21541264.2023.2213514] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 04/24/2023] [Accepted: 05/07/2023] [Indexed: 06/15/2023] Open
Abstract
The profiling of gene expression patterns to glean biological insights from single cells has become commonplace over the last few years. However, this approach overlooks the transcript contents that can differ between individual cells and cell populations. In this review, we describe early work in the field of single-cell short-read sequencing as well as full-length isoforms from single cells. We then describe recent work in single-cell long-read sequencing wherein some transcript elements have been observed to work in tandem. Based on earlier work in bulk tissue, we motivate the study of combination patterns of other RNA variables. Given that we are still blind to some aspects of isoform biology, we suggest possible future avenues such as CRISPR screens which can further illuminate the function of RNA variables in distinct cell populations.
Collapse
Affiliation(s)
- Anoushka Joglekar
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Careen Foord
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Julien Jarroux
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Shaun Pollard
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Hagen U Tilgner
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| |
Collapse
|
13
|
Joglekar A, Hu W, Zhang B, Narykov O, Diekhans M, Balacco J, Ndhlovu LC, Milner TA, Fedrigo O, Jarvis ED, Sheynkman G, Korkin D, Ross ME, Tilgner HU. Single-cell long-read mRNA isoform regulation is pervasive across mammalian brain regions, cell types, and development. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.02.535281. [PMID: 37066387 PMCID: PMC10103983 DOI: 10.1101/2023.04.02.535281] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/22/2023]
Abstract
RNA isoforms influence cell identity and function. Until recently, technological limitations prevented a genome-wide appraisal of isoform influence on cell identity in various parts of the brain. Using enhanced long-read single-cell isoform sequencing, we comprehensively analyze RNA isoforms in multiple mouse brain regions, cell subtypes, and developmental timepoints from postnatal day 14 (P14) to adult (P56). For 75% of genes, full-length isoform expression varies along one or more axes of phenotypic origin, underscoring the pervasiveness of isoform regulation across multiple scales. As expected, splicing varies strongly between cell types. However, certain gene classes including neurotransmitter release and reuptake as well as synapse turnover, harbor significant variability in the same cell type across anatomical regions, suggesting differences in network activity may influence cell-type identity. Glial brain-region specificity in isoform expression includes strong poly(A)-site regulation, whereas neurons have stronger TSS regulation. Furthermore, developmental patterns of cell-type specific splicing are especially pronounced in the murine adolescent transition from P21 to P28. The same cell type traced across development shows more isoform variability than across adult anatomical regions, indicating a coordinated modulation of functional programs dictating neural development. As most cell-type specific exons in P56 mouse hippocampus behave similarly in newly generated data from human hippocampi, these principles may be extrapolated to human brain. However, human brains have evolved additional cell-type specificity in splicing, suggesting gain-of-function isoforms. Taken together, we present a detailed single-cell atlas of full-length brain isoform regulation across development and anatomical regions, providing a previously unappreciated degree of isoform variability across multiple scales of the brain.
Collapse
Affiliation(s)
- Anoushka Joglekar
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Wen Hu
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | | | - Oleksandr Narykov
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA
- Data Science Program, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Lishomwa C Ndhlovu
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Department of Medicine, Division of Infectious Diseases, Weill Cornell Medicine, New York, NY, USA
| | - Teresa A Milner
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
| | - Olivier Fedrigo
- Vertebrate Genome Lab, the Rockefeller University, New York, NY
| | - Erich D Jarvis
- Vertebrate Genome Lab, the Rockefeller University, New York, NY
- Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY
- Howard Hughes Medical Institute, Chevy Chase, MD
| | - Gloria Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia, USA
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, USA
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
- UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, Virginia, USA
| | - Dmitry Korkin
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA
- Data Science Program, Worcester Polytechnic Institute, Worcester, MA, USA
| | - M Elizabeth Ross
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Hagen U Tilgner
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| |
Collapse
|
14
|
Gauthier NPG, Chorlton SD, Krajden M, Manges AR. Agnostic Sequencing for Detection of Viral Pathogens. Clin Microbiol Rev 2023; 36:e0011922. [PMID: 36847515 PMCID: PMC10035330 DOI: 10.1128/cmr.00119-22] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023] Open
Abstract
The advent of next-generation sequencing (NGS) technologies has expanded our ability to detect and analyze microbial genomes and has yielded novel molecular approaches for infectious disease diagnostics. While several targeted multiplex PCR and NGS-based assays have been widely used in public health settings in recent years, these targeted approaches are limited in that they still rely on a priori knowledge of a pathogen's genome, and an untargeted or unknown pathogen will not be detected. Recent public health crises have emphasized the need to prepare for a wide and rapid deployment of an agnostic diagnostic assay at the start of an outbreak to ensure an effective response to emerging viral pathogens. Metagenomic techniques can nonspecifically sequence all detectable nucleic acids in a sample and therefore do not rely on prior knowledge of a pathogen's genome. While this technology has been reviewed for bacterial diagnostics and adopted in research settings for the detection and characterization of viruses, viral metagenomics has yet to be widely deployed as a diagnostic tool in clinical laboratories. In this review, we highlight recent improvements to the performance of metagenomic viral sequencing, the current applications of metagenomic sequencing in clinical laboratories, as well as the challenges that impede the widespread adoption of this technology.
Collapse
Affiliation(s)
- Nick P. G. Gauthier
- Department of Microbiology and Immunology, University of British Columbia, Vancouver, British Columbia, Canada
| | | | - Mel Krajden
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
- British Columbia Centre for Disease Control, Vancouver, British Columbia, Canada
| | - Amee R. Manges
- British Columbia Centre for Disease Control, Vancouver, British Columbia, Canada
- School of Population and Public Health, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
15
|
Prjibelski AD, Mikheenko A, Joglekar A, Smetanin A, Jarroux J, Lapidus AL, Tilgner HU. Accurate isoform discovery with IsoQuant using long reads. Nat Biotechnol 2023:10.1038/s41587-022-01565-y. [PMID: 36593406 DOI: 10.1038/s41587-022-01565-y] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 10/13/2022] [Indexed: 01/04/2023]
Abstract
Annotating newly sequenced genomes and determining alternative isoforms from long-read RNA data are complex and incompletely solved problems. Here we present IsoQuant-a computational tool using intron graphs that accurately reconstructs transcripts both with and without reference genome annotation. For novel transcript discovery, IsoQuant reduces the false-positive rate fivefold and 2.5-fold for Oxford Nanopore reference-based or reference-free mode, respectively. IsoQuant also improves performance for Pacific Biosciences data.
Collapse
Affiliation(s)
- Andrey D Prjibelski
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia. .,Department of Computer Science, University of Helsinki, Helsinki, Finland.
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Anoushka Joglekar
- Tri-Institutional Computational Biology and Medicine, Weill Cornell Medicine, New York, NY, USA.,Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.,Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | | | - Julien Jarroux
- Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.,Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Alla L Lapidus
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Hagen U Tilgner
- Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA. .,Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
16
|
Castaldi PJ, Abood A, Farber CR, Sheynkman GM. Bridging the splicing gap in human genetics with long-read RNA sequencing: finding the protein isoform drivers of disease. Hum Mol Genet 2022; 31:R123-R136. [PMID: 35960994 PMCID: PMC9585682 DOI: 10.1093/hmg/ddac196] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 02/04/2023] Open
Abstract
Aberrant splicing underlies many human diseases, including cancer, cardiovascular diseases and neurological disorders. Genome-wide mapping of splicing quantitative trait loci (sQTLs) has shown that genetic regulation of alternative splicing is widespread. However, identification of the corresponding isoform or protein products associated with disease-associated sQTLs is challenging with short-read RNA-seq, which cannot precisely characterize full-length transcript isoforms. Furthermore, contemporary sQTL interpretation often relies on reference transcript annotations, which are incomplete. Solutions to these issues may be found through integration of newly emerging long-read sequencing technologies. Long-read sequencing offers the capability to sequence full-length mRNA transcripts and, in some cases, to link sQTLs to transcript isoforms containing disease-relevant protein alterations. Here, we provide an overview of sQTL mapping approaches, the use of long-read sequencing to characterize sQTL effects on isoforms, the linkage of RNA isoforms to protein-level functions and comment on future directions in the field. Based on recent progress, long-read RNA sequencing promises to be part of the human disease genetics toolkit to discover and treat protein isoforms causing rare and complex diseases.
Collapse
Affiliation(s)
- Peter J Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Division of General Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Abdullah Abood
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Charles R Farber
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Gloria M Sheynkman
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22903, USA
- UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA 22903, USA
| |
Collapse
|