51
|
Deng Y, Luo H, Yang Z, Liu L. LncAS2Cancer: a comprehensive database for alternative splicing of lncRNAs across human cancers. Brief Bioinform 2020; 22:5895039. [PMID: 32820322 DOI: 10.1093/bib/bbaa179] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 07/10/2020] [Accepted: 07/13/2020] [Indexed: 02/05/2023] Open
Abstract
Accumulating studies demonstrated that the roles of lncRNAs for tumorigenesis were isoform-dependent and their aberrant splicing patterns in cancers contributed to function specificity. However, there is no existing database focusing on cancer-related alternative splicing of lncRNAs. Here, we developed a comprehensive database called LncAS2Cancer, which collected 5335 bulk RNA sequencing and 1826 single-cell RNA sequencing samples, covering over 30 cancer types. By applying six state-of-the-art splicing algorithms, 50 859 alternative splicing events for 8 splicing types were identified and deposited in the database. In addition, the database contained the following information: (i) splicing patterns of lncRNAs under seven different conditions, such as gene interference, which facilitated to infer potential regulators; (ii) annotation information derived from eight sources and manual curation, to understand the functional impact of affected sequences; (iii) survival analysis to explore potential biomarkers; as well as (iv) a suite of tools to browse, search, visualize and download interesting information. LncAS2Cancer could not only confirm the known cancer-associated lncRNA isoforms but also indicate novel ones. Using the data deposited in LncAS2Cancer, we compared gene model and transcript overlap between lncRNAs and protein-coding genes and discusses how these factors, along with sequencing depth, affected the interpretation of splicing signals. Based on recurrent signals and potential confounders, we proposed a reliable score to prioritize splicing events for further elucidation. Together, with the broad collection of lncRNA splicing patterns and annotation, LncAS2Cancer will provide important new insights into the diverse functional roles of lncRNA isoforms in human cancers. LncAS2Cancer is freely available at https://lncrna2as.cd120.com/.
Collapse
Affiliation(s)
- Yulan Deng
- Department of Thoracic Surgery, West China Hospital, Sichuan University
| | - Hao Luo
- Department of Thoracic Surgery, West China Hospital, Sichuan University
| | - Zhenyu Yang
- Department of Thoracic Surgery, West China Hospital, Sichuan University
| | - Lunxu Liu
- Department of Thoracic Surgery, West China Hospital, Sichuan University
| |
Collapse
|
52
|
Patrick R, Humphreys DT, Janbandhu V, Oshlack A, Ho JW, Harvey RP, Lo KK. Sierra: discovery of differential transcript usage from polyA-captured single-cell RNA-seq data. Genome Biol 2020; 21:167. [PMID: 32641141 PMCID: PMC7341584 DOI: 10.1186/s13059-020-02071-7] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 06/11/2020] [Indexed: 12/12/2022] Open
Abstract
High-throughput single-cell RNA-seq (scRNA-seq) is a powerful tool for studying gene expression in single cells. Most current scRNA-seq bioinformatics tools focus on analysing overall expression levels, largely ignoring alternative mRNA isoform expression. We present a computational pipeline, Sierra, that readily detects differential transcript usage from data generated by commonly used polyA-captured scRNA-seq technology. We validate Sierra by comparing cardiac scRNA-seq cell types to bulk RNA-seq of matched populations, finding significant overlap in differential transcripts. Sierra detects differential transcript usage across human peripheral blood mononuclear cells and the Tabula Muris, and 3 'UTR shortening in cardiac fibroblasts. Sierra is available at https://github.com/VCCRI/Sierra .
Collapse
Affiliation(s)
- Ralph Patrick
- Victor Chang Cardiac Research Institute, 405 Liverpool St., Darlinghurst, 2010 Australia
- St. Vincent’s Clinical School, UNSW Sydney, Kensington, 2052 Australia
| | - David T. Humphreys
- Victor Chang Cardiac Research Institute, 405 Liverpool St., Darlinghurst, 2010 Australia
- St. Vincent’s Clinical School, UNSW Sydney, Kensington, 2052 Australia
| | - Vaibhao Janbandhu
- Victor Chang Cardiac Research Institute, 405 Liverpool St., Darlinghurst, 2010 Australia
- St. Vincent’s Clinical School, UNSW Sydney, Kensington, 2052 Australia
| | - Alicia Oshlack
- Murdoch Children’s Research Institute, Parkville, 3052 Victoria Australia
- Peter MacCallum Cancer Centre, Research Division, 305 Grattan Street, Melbourne, 3000 Victoria Australia
| | - Joshua W.K. Ho
- Victor Chang Cardiac Research Institute, 405 Liverpool St., Darlinghurst, 2010 Australia
- St. Vincent’s Clinical School, UNSW Sydney, Kensington, 2052 Australia
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Richard P. Harvey
- Victor Chang Cardiac Research Institute, 405 Liverpool St., Darlinghurst, 2010 Australia
- St. Vincent’s Clinical School, UNSW Sydney, Kensington, 2052 Australia
- School of Biotechnology and Biomolecular Science, UNSW Sydney, Kensington, 2052 Australia
| | - Kitty K. Lo
- School of Mathematics and Statistics, Faculty of Science, The University of Sydney, Camperdown, 2006 Australia
| |
Collapse
|
53
|
Buen Abad Najar CF, Yosef N, Lareau LF. Coverage-dependent bias creates the appearance of binary splicing in single cells. eLife 2020; 9:54603. [PMID: 32597758 PMCID: PMC7498265 DOI: 10.7554/elife.54603] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 06/28/2020] [Indexed: 12/22/2022] Open
Abstract
Single-cell RNA sequencing provides powerful insight into the factors that determine each cell’s unique identity. Previous studies led to the surprising observation that alternative splicing among single cells is highly variable and follows a bimodal pattern: a given cell consistently produces either one or the other isoform for a particular splicing choice, with few cells producing both isoforms. Here, we show that this pattern arises almost entirely from technical limitations. We analyze alternative splicing in human and mouse single-cell RNA-seq datasets, and model them with a probabilistic simulator. Our simulations show that low gene expression and low capture efficiency distort the observed distribution of isoforms. This gives the appearance of binary splicing outcomes, even when the underlying reality is consistent with more than one isoform per cell. We show that accounting for the true amount of information recovered can produce biologically meaningful measurements of splicing in single cells.
Collapse
Affiliation(s)
| | - Nir Yosef
- Center for Computational Biology, University of California, Berkeley, Berkeley, United States.,Department of Electrical Engineering and Computer Science and the Center for Computational Biology, University of California, Berkeley, Berkeley, United States.,Ragon Institute of MGH, MIT, and Harvard, Cambridge, United States.,Chan Zuckerberg Biohub, San Francisco, San Francisco, United States
| | - Liana F Lareau
- Center for Computational Biology, University of California, Berkeley, Berkeley, United States.,Department of Bioengineering, University of California, Berkeley, Berkeley, United States
| |
Collapse
|
54
|
Hu Y, Wang K, Li M. Detecting differential alternative splicing events in scRNA-seq with or without Unique Molecular Identifiers. PLoS Comput Biol 2020; 16:e1007925. [PMID: 32502143 PMCID: PMC7299405 DOI: 10.1371/journal.pcbi.1007925] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 06/17/2020] [Accepted: 05/04/2020] [Indexed: 11/29/2022] Open
Abstract
The emergence of single-cell RNA-seq (scRNA-seq) technology has made it possible to measure gene expression variations at cellular level. This breakthrough enables the investigation of a wider range of problems including analysis of splicing heterogeneity among individual cells. However, compared to bulk RNA-seq, scRNA-seq data are much noisier due to high technical variability and low sequencing depth. Here we propose SCATS (Single-Cell Analysis of Transcript Splicing) for differential splicing analysis in scRNA-seq, which achieves high sensitivity at low coverage by accounting for technical noise. SCATS models scRNA-seq data either with or without Unique Molecular Identifiers (UMIs). For non-UMI data, SCATS explicitly models technical noise by accounting for capture efficiency and amplification bias through the use of external spike-ins; for UMI data, SCATS models capture efficiency and further accounts for transcriptional burstiness. A key aspect of SCATS lies in its ability to group “exons” that originate from the same isoform(s). Grouping exons is essential in splicing analysis of scRNA-seq data as it naturally aggregates spliced reads across different exons, making it possible to detect splicing events even when sequencing depth is low. To evaluate the performance of SCATS, we analyzed both simulated and real scRNA-seq datasets and compared with existing methods including Census and DEXSeq. We show that SCATS has well controlled type I error rate, and is more powerful than existing methods, especially when splicing difference is small. In contrast, Census suffers from severe type I error inflation, whereas DEXSeq is more conservative. When applied to mouse brain scRNA-seq datasets, SCATS identified more differential splicing events with subtle difference across cell types compared to Census and DEXSeq. With the increasing adoption of scRNA-seq, we believe SCATS will be well-suited for various splicing studies. The implementation of SCATS can be downloaded from https://github.com/huyustats/SCATS. Alternative splicing is a major mechanism for generating transcriptome diversity. However, few published scRNA-seq studies have investigated alternative splicing, and even when studied, methods developed for bulk RNA-seq were utilized. Compared to bulk RNA-seq, scRNA-seq data are much noisier due to high technical variability and low sequencing depth. Methods developed for bulk RNA-seq may not be optimal when analyzing data generated from scRNA-seq experiments. To fill in this gap, we developed SCATS, an open-source software package, which allows analysis of scRNA-seq data with or without Unique Molecular Identifiers (UMIs). SCATS is able to detect splicing events even when sequencing depth is low. When applied to mouse brain scRNA-seq datasets, SCATS identified more differential splicing events with subtle differences across cortical cell types than Census and DEXSeq. Additionally, SCATS accurately characterized splicing heterogeneity across cortical cell types, which was further confirmed by qRT-PCR measurements. Our study highlights the benefit of SCATS for elucidating splicing heterogeneity across cells in scRNA-seq data.
Collapse
Affiliation(s)
- Yu Hu
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Kai Wang
- Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Pennsylvania, United States of America
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
55
|
Wells CA, Choi J. Transcriptional Profiling of Stem Cells: Moving from Descriptive to Predictive Paradigms. Stem Cell Reports 2020; 13:237-246. [PMID: 31412285 PMCID: PMC6700522 DOI: 10.1016/j.stemcr.2019.07.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 07/09/2019] [Accepted: 07/10/2019] [Indexed: 12/24/2022] Open
Abstract
Transcriptional profiling is a powerful tool commonly used to benchmark stem cells and their differentiated progeny. As the wealth of stem cell data builds in public repositories, we highlight common data traps, and review approaches to combine and mine this data for new cell classification and cell prediction tools. We touch on future trends for stem cell profiling, such as single-cell profiling, long-read sequencing, and improved methods for measuring molecular modifications on chromatin and RNA that bring new challenges and opportunities for stem cell analysis.
Collapse
Affiliation(s)
- Christine A Wells
- Centre for Stem Cell Systems, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Parkville 3010, Australia.
| | - Jarny Choi
- Centre for Stem Cell Systems, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Parkville 3010, Australia
| |
Collapse
|
56
|
Stewart BJ, Clatworthy MR. Applying single-cell technologies to clinical pathology: progress in nephropathology. J Pathol 2020; 250:693-704. [PMID: 32125696 PMCID: PMC8651001 DOI: 10.1002/path.5417] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 02/25/2020] [Accepted: 02/27/2020] [Indexed: 12/13/2022]
Abstract
Cells represent the basic building blocks of living organisms. Accurate characterisation of cellular phenotype, intercellular signalling networks, and the spatial organisation of cells within organs is crucial to deliver a better understanding of the processes underpinning physiology, and the perturbations that lead to disease. Single-cell methodologies have increased rapidly in scale and scope in recent years and are set to generate important insights into human disease. Here, we review current practices in nephropathology, which are dominated by relatively simple morphological descriptions of tissue biopsies based on their appearance using light microscopy. Bulk transcriptomics have more recently been used to explore glomerular and tubulointerstitial kidney disease, renal cancer, and the responses to injury and alloimmunity in kidney transplantation, generating novel disease insights and prognostic biomarkers. These studies set the stage for single-cell transcriptomic approaches that reveal cell-type-specific gene expression patterns in health and disease. These technologies allow genome-wide disease susceptibility genes to be interpreted with the knowledge of the specific cell populations within organs that express them, identifying candidate cell types for further study. Single-cell technologies are also moving beyond assaying individual cellular transcriptomes, to measuring the epigenetic landscape of single cells. Single-cell antigen-receptor gene sequencing also enables specific T- and B-cell clones to be tracked in different tissues and disease states. In the coming years these rich 'multi-omic' descriptions of kidney disease will enable histopathological descriptions to be comprehensively integrated with molecular phenotypes, enabling better disease classification and prognostication and the application of personalised treatment strategies. © 2020 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of Pathological Society of Great Britain and Ireland.
Collapse
Affiliation(s)
- Benjamin J Stewart
- Department of MedicineUniversity of CambridgeCambridgeUK
- Cellular GeneticsWellcome Sanger InstituteCambridgeUK
- Cambridge NIHR Biomedical Research CentreAddenbrooke's HospitalCambridgeUK
| | - Menna R Clatworthy
- Department of MedicineUniversity of CambridgeCambridgeUK
- Cellular GeneticsWellcome Sanger InstituteCambridgeUK
- Cambridge NIHR Biomedical Research CentreAddenbrooke's HospitalCambridgeUK
| |
Collapse
|
57
|
Westoby J, Artemov P, Hemberg M, Ferguson-Smith A. Obstacles to detecting isoforms using full-length scRNA-seq data. Genome Biol 2020; 21:74. [PMID: 32293520 PMCID: PMC7087381 DOI: 10.1186/s13059-020-01981-w] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 03/03/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Early single-cell RNA-seq (scRNA-seq) studies suggested that it was unusual to see more than one isoform being produced from a gene in a single cell, even when multiple isoforms were detected in matched bulk RNA-seq samples. However, these studies generally did not consider the impact of dropouts or isoform quantification errors, potentially confounding the results of these analyses. RESULTS In this study, we take a simulation based approach in which we explicitly account for dropouts and isoform quantification errors. We use our simulations to ask to what extent it is possible to study alternative splicing using scRNA-seq. Additionally, we ask what limitations must be overcome to make splicing analysis feasible. We find that the high rate of dropouts associated with scRNA-seq is a major obstacle to studying alternative splicing. In mice and other well-established model organisms, the relatively low rate of isoform quantification errors poses a lesser obstacle to splicing analysis. We find that different models of isoform choice meaningfully change our simulation results. CONCLUSIONS To accurately study alternative splicing with single-cell RNA-seq, a better understanding of isoform choice and the errors associated with scRNA-seq is required. An increase in the capture efficiency of scRNA-seq would also be beneficial. Until some or all of the above are achieved, we do not recommend attempting to resolve isoforms in individual cells using scRNA-seq.
Collapse
Affiliation(s)
- Jennifer Westoby
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA UK
| | - Pavel Artemov
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH UK
| | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA UK
| | - Anne Ferguson-Smith
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH UK
| |
Collapse
|
58
|
Technological advances and computational approaches for alternative splicing analysis in single cells. Comput Struct Biotechnol J 2020; 18:332-343. [PMID: 32099593 PMCID: PMC7033300 DOI: 10.1016/j.csbj.2020.01.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Accepted: 01/26/2020] [Indexed: 12/15/2022] Open
Abstract
Alternative splicing of RNAs generates isoform diversity, resulting in different proteins that are necessary for maintaining cellular function and identity. The discovery of alternative splicing has been revolutionized by next-generation transcriptomic sequencing mainly using bulk RNA-sequencing, which has unravelled RNA splicing and mis-splicing of normal cells under steady-state and stress conditions. Single-cell RNA-sequencing studies have focused on gene-level expression analysis and revealed gene expression signatures distinguishable between different cellular types. Single-cell alternative splicing is an emerging area of research with the promise to reveal transcriptomic dynamics invisible to bulk- and gene-level analysis. In this review, we will discuss the technological advances for single-cell alternative splicing analysis, computational strategies for isoform detection and quantitation in single cells, and current applications of single-cell alternative splicing analysis and its potential future contributions to personalized medicine.
Collapse
|
59
|
Single-cell alternative splicing analysis reveals dominance of single transcript variant. Genomics 2020; 112:2418-2425. [PMID: 31981701 DOI: 10.1016/j.ygeno.2020.01.014] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 01/03/2020] [Accepted: 01/21/2020] [Indexed: 12/29/2022]
Abstract
Alternative splicing contributes to the diversity of gene products by producing multiple transcript variants from one gene. Previous studies have revealed highly variable splicing patterns in single cells, but there is still a controversy in the understanding of the simultaneous expression of multiple transcript variants. Here we show that the dominance of a single transcript variant is a common phenomenon in single cells. We analyzed several single-cell RNA sequencing datasets and observed consistent results. Our results demonstrate that single cells tend to express one major transcript variant of a gene, and the diversity of transcript variants in cell populations mainly results from the heterogeneity of splicing pattern in single cells.
Collapse
|
60
|
Evaluating genetic causes of azoospermia: What can we learn from a complex cellular structure and single-cell transcriptomics of the human testis? Hum Genet 2020; 140:183-201. [PMID: 31950241 DOI: 10.1007/s00439-020-02116-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 01/06/2020] [Indexed: 12/13/2022]
Abstract
Azoospermia is a condition defined as the absence of spermatozoa in the ejaculate, but the testicular phenotype of men with azoospermia may be very variable, ranging from full spermatogenesis, through arrested maturation of germ cells at different stages, to completely degenerated tissue with ghost tubules. Hence, information regarding the cell-type-specific expression patterns is needed to prioritise potential pathogenic variants that contribute to the pathogenesis of azoospermia. Thanks to technological advances within next-generation sequencing, it is now possible to obtain detailed cell-type-specific expression patterns in the testis by single-cell RNA sequencing. However, to interpret single-cell RNA sequencing data properly, substantial knowledge of the highly sophisticated data processing and visualisation methods is needed. Here we review the complex cellular structure of the human testis in different types of azoospermia and outline how known genetic alterations affect the pathology of the testis. We combined the currently available single-cell RNA sequencing datasets originating from the human testis into one dataset covering 62,751 testicular cells, each with a median of 2637 transcripts quantified. We show what effects the most common data-processing steps have, and how different visualisation methods can be used. Furthermore, we calculated expression patterns in pseudotime, and show how splicing rates can be used to determine the velocity of differentiation during spermatogenesis. With the combined dataset we show expression patterns and network analysis of genes known to be involved in the pathogenesis of azoospermia. Finally, we provide the combined dataset as an interactive online resource where expression of genes and different visualisation methods can be explored ( https://testis.cells.ucsc.edu/ ).
Collapse
|
61
|
Amamoto R, Zuccaro E, Curry NC, Khurana S, Chen HH, Cepko CL, Arlotta P. FIN-Seq: transcriptional profiling of specific cell types from frozen archived tissue of the human central nervous system. Nucleic Acids Res 2020; 48:e4. [PMID: 31728515 PMCID: PMC7145626 DOI: 10.1093/nar/gkz968] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Revised: 09/09/2019] [Accepted: 11/12/2019] [Indexed: 12/14/2022] Open
Abstract
Thousands of frozen, archived tissue samples from the human central nervous system (CNS) are currently available in brain banks. As recent developments in RNA sequencing technologies are beginning to elucidate the cellular diversity present within the human CNS, it is becoming clear that an understanding of this diversity would greatly benefit from deeper transcriptional analyses. Single cell and single nucleus RNA profiling provide one avenue to decipher this heterogeneity. An alternative, complementary approach is to profile isolated, pre-defined cell types and use methods that can be applied to many archived human tissue samples that have been stored long-term. Here, we developed FIN-Seq (Frozen Immunolabeled Nuclei Sequencing), a method that accomplishes these goals. FIN-Seq uses immunohistochemical isolation of nuclei of specific cell types from frozen human tissue, followed by bulk RNA-Sequencing. We applied this method to frozen postmortem samples of human cerebral cortex and retina and were able to identify transcripts, including low abundance transcripts, in specific cell types.
Collapse
Affiliation(s)
- Ryoji Amamoto
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA 02138, USA
- Department of Genetics and Ophthalmology, Howard Hughes Medical Institute, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Emanuela Zuccaro
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA 02138, USA
| | - Nathan C Curry
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA 02138, USA
| | - Sonia Khurana
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA 02138, USA
| | - Hsu-Hsin Chen
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA 02138, USA
| | - Constance L Cepko
- Department of Genetics and Ophthalmology, Howard Hughes Medical Institute, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Paola Arlotta
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA 02138, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
62
|
Hatje K, Mühlhausen S, Simm D, Kollmar M. The Protein-Coding Human Genome: Annotating High-Hanging Fruits. Bioessays 2019; 41:e1900066. [PMID: 31544971 DOI: 10.1002/bies.201900066] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 08/07/2019] [Indexed: 12/19/2022]
Abstract
The major transcript variants of human protein-coding genes are annotated to a certain degree of accuracy combining manual curation, transcript data, and proteomics evidence. However, there is considerable disagreement on the annotation of about 2000 genes-they can be protein-coding, noncoding, or pseudogenes-and on the annotation of most of the predicted alternative transcripts. Pure transcriptome mapping approaches seem to be limited in discriminating functional expression from noise. These limitations have partially been overcome by dedicated algorithms to detect alternative spliced micro-exons and wobble splice variants. Recently, knowledge about splice mechanism and protein structure are incorporated into an algorithm to predict neighboring homologous exons, often spliced in a mutually exclusive manner. Predicted exons are evaluated by transcript data, structural compatibility, and evolutionary conservation, revealing hundreds of novel coding exons and splice mechanism re-assignments. The emerging human pan-genome is necessitating distinctive annotations incorporating differences between individuals and between populations.
Collapse
Affiliation(s)
- Klas Hatje
- Roche Pharmaceutical Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstr. 124, 4070, Basel, Switzerland
| | - Stefanie Mühlhausen
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany
| | - Dominic Simm
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany.,Theoretical Computer Science and Algorithmic Methods, Institute of Computer Science, Georg-August-University Göttingen, Goldschmidtstr. 7, 37077, Göttingen, Germany
| | - Martin Kollmar
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany
| |
Collapse
|
63
|
Pan-cancer analysis of clinical relevance of alternative splicing events in 31 human cancers. Oncogene 2019; 38:6678-6695. [PMID: 31391553 DOI: 10.1038/s41388-019-0910-7] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Revised: 06/21/2019] [Accepted: 07/10/2019] [Indexed: 01/16/2023]
Abstract
Alternative splicing represents a critical posttranscriptional regulation of gene expression, which contributes to the protein complexity and mRNA processing. Defects of alternative splicing including genetic alteration and/or altered expression of both pre-mRNA and trans-acting factors give rise to many cancers. By integrally analyzing clinical data and splicing data from TCGA and SpliceSeq databases, a number of splicing events were found clinically relevant in tumor samples. Alternative splicing of KLK2 (KLK2_51239) was found as a potential inducement of nonsense-mediated mRNA decay and associated with poor survival in prostate cancer. Consensus K-means clustering analysis indicated that alternative splicing events could be potentially used for molecular subtype classification of cancers. By random forest survival algorithm, prognostic prediction signatures with well performances were constructed for 31 cancers by using survival-associated alternative splicing events. Furthermore, an online tool for visualization of Kaplan-Meier plots of splicing events in 31 cancers was explored. Briefly, alternative splicing was found of significant clinical relevance with cancers.
Collapse
|
64
|
Perrone B, La Cognata V, Sprovieri T, Ungaro C, Conforti FL, Andò S, Cavallaro S. Alternative Splicing of ALS Genes: Misregulation and Potential Therapies. Cell Mol Neurobiol 2019; 40:1-14. [PMID: 31385134 DOI: 10.1007/s10571-019-00717-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Accepted: 07/31/2019] [Indexed: 12/12/2022]
Abstract
Neurodegenerative disorders such as amyotrophic lateral sclerosis (ALS), spinal muscular atrophy (SMA), Parkinson's, Alzheimer's, and Huntington's disease affect a rapidly increasing population worldwide. Although common pathogenic mechanisms have been identified (e.g., protein aggregation or dysfunction, immune response alteration and axonal degeneration), the molecular events underlying timing, dosage, expression, and location of RNA molecules are still not fully elucidated. In particular, the alternative splicing (AS) mechanism is a crucial player in RNA processing and represents a fundamental determinant for brain development, as well as for the physiological functions of neuronal circuits. Although in recent years our knowledge of AS events has increased substantially, deciphering the molecular interconnections between splicing and ALS remains a complex task and still requires considerable efforts. In the present review, we will summarize the current scientific evidence outlining the involvement of AS in the pathogenic processes of ALS. We will also focus on recent insights concerning the tuning of splicing mechanisms by epigenomic and epi-transcriptomic regulation, providing an overview of the available genomic technologies to investigate AS drivers on a genome-wide scale, even at a single-cell level resolution. In the future, gene therapy strategies and RNA-based technologies may be utilized to intercept or modulate the splicing mechanism and produce beneficial effects against ALS.
Collapse
Affiliation(s)
- Benedetta Perrone
- Institute for Biomedical Research and Innovation, National Research Council, Mangone, Cosenza, Italy
| | - Valentina La Cognata
- Institute for Biomedical Research and Innovation, National Research Council, Catania, Italy
| | - Teresa Sprovieri
- Institute for Biomedical Research and Innovation, National Research Council, Mangone, Cosenza, Italy
| | - Carmine Ungaro
- Institute for Biomedical Research and Innovation, National Research Council, Mangone, Cosenza, Italy
| | - Francesca Luisa Conforti
- Department of Pharmacy, Health and Nutritional Sciences, University of Calabria, Arcavacata di Rende, Cosenza, Italy
| | - Sebastiano Andò
- Department of Pharmacy, Health and Nutritional Sciences, University of Calabria, Arcavacata di Rende, Cosenza, Italy.,Centro Sanitario, University of Calabria, Arcavacata di Rende, Cosenza, Italy
| | - Sebastiano Cavallaro
- Institute for Biomedical Research and Innovation, National Research Council, Catania, Italy.
| |
Collapse
|
65
|
Frankiw L, Baltimore D, Li G. Alternative mRNA splicing in cancer immunotherapy. Nat Rev Immunol 2019; 19:675-687. [PMID: 31363190 DOI: 10.1038/s41577-019-0195-7] [Citation(s) in RCA: 150] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/02/2019] [Indexed: 12/12/2022]
Abstract
Immunotherapies are yielding effective treatments for several previously untreatable cancers. Still, the identification of suitable antigens specific to the tumour that can be targets for cancer vaccines and T cell therapies is a challenge. Alternative processing of mRNA, a phenomenon that has been shown to alter the proteomic diversity of many cancers, may offer the potential of a broadened target space. Here, we discuss the promise of analysing mRNA processing events in cancer cells, with an emphasis on mRNA splicing, for the identification of potential new targets for cancer immunotherapy. Further, we highlight the challenges that must be overcome for this new avenue to have clinical applicability.
Collapse
Affiliation(s)
- Luke Frankiw
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - David Baltimore
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
| | - Guideng Li
- Center of Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China. .,Suzhou Institute of Systems Medicine, Suzhou, China.
| |
Collapse
|
66
|
Sarkar H, Srivastava A, Patro R. Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level. Bioinformatics 2019; 35:i136-i144. [PMID: 31510649 PMCID: PMC6612833 DOI: 10.1093/bioinformatics/btz351] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
SUMMARY With the advancements of high-throughput single-cell RNA-sequencing protocols, there has been a rapid increase in the tools available to perform an array of analyses on the gene expression data that results from such studies. For example, there exist methods for pseudo-time series analysis, differential cell usage, cell-type detection RNA-velocity in single cells, etc. Most analysis pipelines validate their results using known marker genes (which are not widely available for all types of analysis) and by using simulated data from gene-count-level simulators. Typically, the impact of using different read-alignment or unique molecular identifier (UMI) deduplication methods has not been widely explored. Assessments based on simulation tend to start at the level of assuming a simulated count matrix, ignoring the effect that different approaches for resolving UMI counts from the raw read data may produce. Here, we present minnow, a comprehensive sequence-level droplet-based single-cell RNA-sequencing (dscRNA-seq) experiment simulation framework. Minnow accounts for important sequence-level characteristics of experimental scRNA-seq datasets and models effects such as polymerase chain reaction amplification, cellular barcodes (CB) and UMI selection and sequence fragmentation and sequencing. It also closely matches the gene-level ambiguity characteristics that are observed in real scRNA-seq experiments. Using minnow, we explore the performance of some common processing pipelines to produce gene-by-cell count matrices from droplet-bases scRNA-seq data, demonstrate the effect that realistic levels of gene-level sequence ambiguity can have on accurate quantification and show a typical use-case of minnow in assessing the output generated by different quantification pipelines on the simulated experiment. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hirak Sarkar
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Avi Srivastava
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| | - Rob Patro
- Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
| |
Collapse
|
67
|
Chen M, Ji G, Fu H, Lin Q, Ye C, Ye W, Su Y, Wu X. A survey on identification and quantification of alternative polyadenylation sites from RNA-seq data. Brief Bioinform 2019; 21:1261-1276. [PMID: 31267126 DOI: 10.1093/bib/bbz068] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Revised: 05/03/2019] [Accepted: 05/14/2019] [Indexed: 12/13/2022] Open
Abstract
Alternative polyadenylation (APA) has been implicated to play an important role in post-transcriptional regulation by regulating mRNA abundance, stability, localization and translation, which contributes considerably to transcriptome diversity and gene expression regulation. RNA-seq has become a routine approach for transcriptome profiling, generating unprecedented data that could be used to identify and quantify APA site usage. A number of computational approaches for identifying APA sites and/or dynamic APA events from RNA-seq data have emerged in the literature, which provide valuable yet preliminary results that should be refined to yield credible guidelines for the scientific community. In this review, we provided a comprehensive overview of the status of currently available computational approaches. We also conducted objective benchmarking analysis using RNA-seq data sets from different species (human, mouse and Arabidopsis) and simulated data sets to present a systematic evaluation of 11 representative methods. Our benchmarking study showed that the overall performance of all tools investigated is moderate, reflecting that there is still lot of scope to improve the prediction of APA site or dynamic APA events from RNA-seq data. Particularly, prediction results from individual tools differ considerably, and only a limited number of predicted APA sites or genes are common among different tools. Accordingly, we attempted to give some advice on how to assess the reliability of the obtained results. We also proposed practical recommendations on the appropriate method applicable to diverse scenarios and discussed implications and future directions relevant to profiling APA from RNA-seq data.
Collapse
Affiliation(s)
- Moliang Chen
- Department of Automation, Xiamen University, Xiamen 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen 361005, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen 361005, China
| | - Hongjuan Fu
- Department of Automation, Xiamen University, Xiamen 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen 361005, China
| | - Qianmin Lin
- Xiang' an hospital of Xiamen university, Xiamen 361005, China
| | - Congting Ye
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, China
| | - Wenbin Ye
- Department of Automation, Xiamen University, Xiamen 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen 361005, China
| | - Yaru Su
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen 361005, China
| |
Collapse
|
68
|
Abstract
In the past 3 years, we have seen a flurry of publications on single-cell RNA sequencing (RNA-seq) analyses of pancreatic islets from mouse and human. This technology holds the promise to refine cell-type signatures and discover cellular heterogeneity among the canonical endocrine cell types such as the glucagon-producing α and insulin-producing β cells, going as far as suggesting new subtypes. In addition, single-cell RNA-seq has the ability to characterize rare endocrine cell types that are not captured by prior bulk analysis. With transcriptomics data from individual endocrine cells, cellular states can be profiled both along developmental processes and during the emergence of metabolic diseases. However, the promises of this new technology have not yet been met in full. While the methodology for the first time enabled the transcriptional definition of rare endocrine cell types such as ghrelin-producing ɛ cells, some of the conclusions regarding cell-type-specific gene expression changes in type 2 diabetes might need to be revisited once larger sample sizes become available. Data generation and analysis are continuously improving single-cell RNA-seq approaches and are helping us to understand the (mal)adaptations of the islet cells during development, metabolic challenge, and disease.
Collapse
Affiliation(s)
- Yue J Wang
- Department of Genetics and Institute for Diabetes, Obesity, and Metabolism, Perelman School of Medicine, University of Pennsylvania, 12-126 Smilow Center for Translational Research, 3400 Civic Center Boulevard, Philadelphia, PA 19104-6145, USA
| | - Klaus H Kaestner
- Department of Genetics and Institute for Diabetes, Obesity, and Metabolism, Perelman School of Medicine, University of Pennsylvania, 12-126 Smilow Center for Translational Research, 3400 Civic Center Boulevard, Philadelphia, PA 19104-6145, USA.
| |
Collapse
|
69
|
A discriminative learning approach to differential expression analysis for single-cell RNA-seq. Nat Methods 2019; 16:163-166. [PMID: 30664774 DOI: 10.1038/s41592-018-0303-9] [Citation(s) in RCA: 87] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Accepted: 12/13/2018] [Indexed: 12/16/2022]
Abstract
Single-cell RNA-seq makes it possible to characterize the transcriptomes of cell types across different conditions and to identify their transcriptional signatures via differential analysis. Our method detects changes in transcript dynamics and in overall gene abundance in large numbers of cells to determine differential expression. When applied to transcript compatibility counts obtained via pseudoalignment, our approach provides a quantification-free analysis of 3' single-cell RNA-seq that can identify previously undetectable marker genes.
Collapse
|