1
|
de Back TR, Wu T, Schafrat PJ, Ten Hoorn S, Tan M, He L, van Hooff SR, Koster J, Nijman LE, Vink GR, Beumer IJ, Elbers CC, Lenos KJ, Sommeijer DW, Wang X, Vermeulen L. A consensus molecular subtypes classification strategy for clinical colorectal cancer tissues. Life Sci Alliance 2024; 7:e202402730. [PMID: 38782602 PMCID: PMC11116811 DOI: 10.26508/lsa.202402730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 05/08/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024] Open
Abstract
Consensus Molecular Subtype (CMS) classification of colorectal cancer (CRC) tissues is complicated by RNA degradation upon formalin-fixed paraffin-embedded (FFPE) preservation. Here, we present an FFPE-curated CMS classifier. The CMSFFPE classifier was developed using genes with a high transcript integrity in FFPE-derived RNA. We evaluated the classification accuracy in two FFPE-RNA datasets with matched fresh-frozen (FF) RNA data, and an FF-derived RNA set. An FFPE-RNA application cohort of metastatic CRC patients was established, partly treated with anti-EGFR therapy. Key characteristics per CMS were assessed. Cross-referenced with matched benchmark FF CMS calls, the CMSFFPE classifier strongly improved classification accuracy in two FFPE datasets compared with the original CMSClassifier (63.6% versus 40.9% and 83.3% versus 66.7%, respectively). We recovered CMS-specific recurrence-free survival patterns (CMS4 versus CMS2: hazard ratio 1.75, 95% CI 1.24-2.46). Key molecular and clinical associations of the CMSs were confirmed. In particular, we demonstrated the predictive value of CMS2 and CMS3 for anti-EGFR therapy response (CMS2&3: odds ratio 5.48, 95% CI 1.10-27.27). The CMSFFPE classifier is an optimized FFPE-curated research tool for CMS classification of clinical CRC samples.
Collapse
Affiliation(s)
- Tim R de Back
- Cancer Center Amsterdam, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
- Amsterdam Gastroenterology Endocrinology Metabolism, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
- https://ror.org/01n92vv28 Oncode Institute, Amsterdam, Netherlands
| | - Tan Wu
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, China
- Department of Surgery, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Pascale Jm Schafrat
- Cancer Center Amsterdam, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
- Amsterdam Gastroenterology Endocrinology Metabolism, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
- https://ror.org/01n92vv28 Oncode Institute, Amsterdam, Netherlands
- Amsterdam UMC Location Vrije Universiteit Amsterdam, Department of Medical Oncology, Amsterdam, Netherlands
| | - Sanne Ten Hoorn
- Cancer Center Amsterdam, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
- Amsterdam Gastroenterology Endocrinology Metabolism, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
- https://ror.org/01n92vv28 Oncode Institute, Amsterdam, Netherlands
| | - Miaomiao Tan
- Department of Surgery, The Chinese University of Hong Kong, Hong Kong SAR, China
- Institute of Translational Medicine, Zhejiang Shuren University, Hangzhou, China
| | - Lingli He
- Department of Surgery, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Sander R van Hooff
- Cancer Center Amsterdam, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
- Amsterdam Gastroenterology Endocrinology Metabolism, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
- https://ror.org/01n92vv28 Oncode Institute, Amsterdam, Netherlands
| | - Jan Koster
- Cancer Center Amsterdam, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
- Amsterdam Gastroenterology Endocrinology Metabolism, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
| | - Lisanne E Nijman
- Cancer Center Amsterdam, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
- Amsterdam Gastroenterology Endocrinology Metabolism, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
- https://ror.org/01n92vv28 Oncode Institute, Amsterdam, Netherlands
| | - Geraldine R Vink
- Department of Medical Oncology, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
- Department of Research and Development, Netherlands Comprehensive Cancer Organisation, Utrecht, Netherlands
| | | | - Clara C Elbers
- Cancer Center Amsterdam, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
- Amsterdam Gastroenterology Endocrinology Metabolism, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
- https://ror.org/01n92vv28 Oncode Institute, Amsterdam, Netherlands
| | - Kristiaan J Lenos
- Cancer Center Amsterdam, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
- Amsterdam Gastroenterology Endocrinology Metabolism, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
- https://ror.org/01n92vv28 Oncode Institute, Amsterdam, Netherlands
| | - Dirkje W Sommeijer
- Cancer Center Amsterdam, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
- Flevohospital, Department of Internal Medicine, Almere, Netherlands
| | - Xin Wang
- Department of Surgery, The Chinese University of Hong Kong, Hong Kong SAR, China
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China
- Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, China
| | - Louis Vermeulen
- Cancer Center Amsterdam, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
- Amsterdam Gastroenterology Endocrinology Metabolism, Laboratory for Experimental Oncology and Radiobiology, Center for Experimental and Molecular Medicine, Amsterdam, Netherlands
- https://ror.org/01n92vv28 Oncode Institute, Amsterdam, Netherlands
| |
Collapse
|
2
|
Perelo LW, Gabernet G, Straub D, Nahnsen S. How tool combinations in different pipeline versions affect the outcome in RNA-seq analysis. NAR Genom Bioinform 2024; 6:lqae020. [PMID: 38456178 PMCID: PMC10919883 DOI: 10.1093/nargab/lqae020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 01/07/2024] [Accepted: 02/12/2024] [Indexed: 03/09/2024] Open
Abstract
Data analysis tools are continuously changed and improved over time. In order to test how these changes influence the comparability between analyses, the output of different workflow options of the nf-core/rnaseq pipeline were compared. Five different pipeline settings (STAR+Salmon, STAR+RSEM, STAR+featureCounts, HISAT2+featureCounts, pseudoaligner Salmon) were run on three datasets (human, Arabidopsis, zebrafish) containing spike-ins of the External RNA Control Consortium (ERCC). Fold change ratios and differential expression of genes and spike-ins were used for comparative analyses of the different tools and versions settings of the pipeline. An overlap of 85% for differential gene classification between pipelines could be shown. Genes interpreted with a bias were mostly those present at lower concentration. Also, the number of isoforms and exons per gene were determinants. Previous pipeline versions using featureCounts showed a higher sensitivity to detect one-isoform genes like ERCC. To ensure data comparability in long-term analysis series it would be recommendable to either stay with the pipeline version the series was initialized with or to run both versions during a transition time in order to ensure that the target genes are addressed the same way.
Collapse
Affiliation(s)
- Louisa Wessels Perelo
- Quantitative Biology Center (QBiC), University of Tübingen, Otfried-Müller-Str. 37, 72076 Tübingen, Baden-Württemberg, 72076, Germany
| | - Gisela Gabernet
- Quantitative Biology Center (QBiC), University of Tübingen, Otfried-Müller-Str. 37, 72076 Tübingen, Baden-Württemberg, 72076, Germany
| | - Daniel Straub
- Quantitative Biology Center (QBiC), University of Tübingen, Otfried-Müller-Str. 37, 72076 Tübingen, Baden-Württemberg, 72076, Germany
| | - Sven Nahnsen
- Quantitative Biology Center (QBiC), University of Tübingen, Otfried-Müller-Str. 37, 72076 Tübingen, Baden-Württemberg, 72076, Germany
- M3 Research Center, Faculty of Medicine, University of Tübingen, Otfried-Müller-Str. 37, 72076 Tübingen, Baden-Württemberg, 72076, Germany
- Department of Computer Science, Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Otfried-Müller-Str. 37, 72076 Tübingen, Baden-Württemberg, 72076, Germany
- Cluster of Excellence iFIT (EXC 2180), Image-Guided and Functionally Instructed Tumor Therapies, University of Tübingen, Otfried-Müller-Str. 37, 72076 Tübingen, Baden-Württemberg, 72076, Germany
| |
Collapse
|
3
|
Guo W, Coulter M, Waugh R, Zhang R. The value of genotype-specific reference for transcriptome analyses in barley. Life Sci Alliance 2022; 5:5/8/e202101255. [PMID: 35459738 PMCID: PMC9034525 DOI: 10.26508/lsa.202101255] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 04/10/2022] [Accepted: 04/11/2022] [Indexed: 12/31/2022] Open
Abstract
We demonstrate in this study that using a common reference genome may lead to loss of genotype-specific information in the assembled Reference Transcript Dataset (RTD) and the generation of erroneous, incomplete, or misleading transcriptomics analysis results in barley. It is increasingly apparent that although different genotypes within a species share “core” genes, they also contain variable numbers of “specific” genes and different structures of “core” genes that are only present in a subset of individuals. Using a common reference genome may thus lead to a loss of genotype-specific information in the assembled Reference Transcript Dataset (RTD) and the generation of erroneous, incomplete or misleading transcriptomics analysis results. In this study, we assembled genotype-specific RTD (sRTD) and common reference–based RTD (cRTD) from RNA-seq data of cultivated Barke and Morex barley, respectively. Our quantitative evaluation showed that the sRTD has a significantly higher diversity of transcripts and alternative splicing events, whereas the cRTD missed 40% of transcripts present in the sRTD and it only has ∼70% accurate transcript assemblies. We found that the sRTD is more accurate for transcript quantification as well as differential expression analysis. However, gene-level quantification is less affected, which may be a reasonable compromise when a high-quality genotype-specific reference is not available.
Collapse
Affiliation(s)
- Wenbin Guo
- Information and Computational Sciences, James Hutton Institute, Dundee, UK
| | - Max Coulter
- Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute, Dundee, UK
| | - Robbie Waugh
- Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute, Dundee, UK.,Cell and Molecular Sciences, James Hutton Institute, Dundee, UK
| | - Runxuan Zhang
- Information and Computational Sciences, James Hutton Institute, Dundee, UK
| |
Collapse
|
4
|
Hounkpe BW, Chenou F, de Lima F, De Paula E. HRT Atlas v1.0 database: redefining human and mouse housekeeping genes and candidate reference transcripts by mining massive RNA-seq datasets. Nucleic Acids Res 2021; 49:D947-D955. [PMID: 32663312 PMCID: PMC7778946 DOI: 10.1093/nar/gkaa609] [Citation(s) in RCA: 100] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 07/08/2020] [Indexed: 12/18/2022] Open
Abstract
Housekeeping (HK) genes are constitutively expressed genes that are required for the maintenance of basic cellular functions. Despite their importance in the calibration of gene expression, as well as the understanding of many genomic and evolutionary features, important discrepancies have been observed in studies that previously identified these genes. Here, we present Housekeeping and Reference Transcript Atlas (HRT Atlas v1.0, www.housekeeping.unicamp.br) a web-based database which addresses some of the previously observed limitations in the identification of these genes, and offers a more accurate database of human and mouse HK genes and transcripts. The database was generated by mining massive human and mouse RNA-seq data sets, including 11 281 and 507 high-quality RNA-seq samples from 52 human non-disease tissues/cells and 14 healthy tissues/cells of C57BL/6 wild type mouse, respectively. User can visualize the expression and download lists of 2158 human HK transcripts from 2176 HK genes and 3024 mouse HK transcripts from 3277 mouse HK genes. HRT Atlas also offers the most stable and suitable tissue selective candidate reference transcripts for normalization of qPCR experiments. Specific primers and predicted modifiers of gene expression for some of these HK transcripts are also proposed. HRT Atlas has also been integrated with a regulatory elements resource from Epiregio server.
Collapse
Affiliation(s)
| | - Francine Chenou
- School of Medical Sciences, University of Campinas, Campinas, SP, Brazil
| | - Franciele de Lima
- School of Medical Sciences, University of Campinas, Campinas, SP, Brazil
| | - Erich Vinicius De Paula
- School of Medical Sciences, University of Campinas, Campinas, SP, Brazil
- Hematology and Hemotherapy Center, University of Campinas, Campinas, SP, Brazil
| |
Collapse
|
5
|
Gamini R, Nakashima R, He W, Zhang C, Huang Y, Zhang Y, Zhang B, Zhao S. QuickIsoSeq for Isoform Quantification in Large-Scale RNA Sequencing. Methods Mol Biol 2021; 2284:135-145. [PMID: 33835441 DOI: 10.1007/978-1-0716-1307-8_8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
RNA-sequencing (RNA-seq) is a powerful technology for transcriptome profiling. While most RNA-seq projects focus on gene-level quantification and analysis, there is growing evidence that most mammalian genes are alternatively spliced to generate different isoforms that can be subsequently translated to protein molecules with diverse or even opposing biological functions. Quantifying the expression levels of these isoforms is key to understanding the genes biological functions in healthy tissues and the progression of diseases. Among open source tools developed for isoform quantification, Salmon, Kallisto, and RSEM are recommended based upon previous systematic evaluation of these tools using both experimental and simulated RNA-seq datasets. However, isoform quantification in practical RNA-seq data analysis needs to deal with many QC issues, such as the abundance of rRNAs in mRNA-seq, the efficiency of globin RNA depletion in whole blood samples, and potential sample swapping. To overcome these practical challenges, QuickIsoSeq was developed for large-scale RNA-seq isoform quantification along with QC. In this chapter, we describe the pipeline and detailed the steps required to deploy and use it to analyze RNA-seq datasets in practice. The QuickIsoSeq package can be downloaded from https://github.com/shanrongzhao/QuickIsoSeq.
Collapse
Affiliation(s)
- Ramya Gamini
- Pfizer Worldwide Research and Development, Cambridge, MA, USA
| | - Reiko Nakashima
- Pfizer Worldwide Research and Development, Cambridge, MA, USA
| | - Wen He
- Pfizer Worldwide Research and Development, Cambridge, MA, USA
| | - Chi Zhang
- Pfizer Worldwide Research and Development, Cambridge, MA, USA
| | - Ying Huang
- Pfizer Worldwide Research and Development, Cambridge, MA, USA
| | - Ying Zhang
- Pfizer Worldwide Research and Development, Cambridge, MA, USA
| | - Baohong Zhang
- Pfizer Worldwide Research and Development, Cambridge, MA, USA
| | - Shanrong Zhao
- Pfizer Worldwide Research and Development, Cambridge, MA, USA.
- AbSci Inc, Vancouver, WA, USA.
| |
Collapse
|
6
|
Boneva S, Schlecht A, Böhringer D, Mittelviefhaus H, Reinhard T, Agostini H, Auw-Haedrich C, Schlunck G, Wolf J, Lange C. 3' MACE RNA-sequencing allows for transcriptome profiling in human tissue samples after long-term storage. J Transl Med 2020; 100:1345-1355. [PMID: 32467590 PMCID: PMC7498368 DOI: 10.1038/s41374-020-0446-z] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 04/30/2020] [Accepted: 04/30/2020] [Indexed: 12/21/2022] Open
Abstract
This study aims to compare the potential of standard RNA-sequencing (RNA-Seq) and 3' massive analysis of c-DNA ends (MACE) RNA-sequencing for the analysis of fresh tissue and describes transcriptome profiling of formalin-fixed paraffin-embedded (FFPE) archival human samples by MACE. To compare MACE to standard RNA-Seq on fresh tissue, four healthy conjunctiva from four subjects were collected during vitreoretinal surgery, halved and immediately transferred to RNA lysis buffer without prior fixation and then processed for either standard RNA-Seq or MACE RNA-Seq analysis. To assess the impact of FFPE preparation on MACE, a third part was fixed in formalin and processed for paraffin embedding, and its transcriptional profile was compared with the unfixed specimens analyzed by MACE. To investigate the impact of FFPE storage time on MACE results, 24 FFPE-treated conjunctival samples from 24 patients were analyzed as well. Nineteen thousand six hundred fifty-nine transcribed genes were detected by both MACE and standard RNA-Seq on fresh tissue, while 3251 and 2213 transcripts were identified explicitly by MACE or RNA-Seq, respectively. Standard RNA-Seq tended to yield longer detected transcripts more often than MACE technology despite normalization, indicating that the MACE technology is less susceptible to a length bias. FFPE processing revealed negligible effects on MACE sequencing results. Several quality-control measurements showed that long-term storage in paraffin did not decrease the diversity of MACE libraries. We noted a nonlinear relation between storage time and the number of raw reads with an accelerated decrease within the first 1000 days in paraffin, while the numbers remained relatively stable in older samples. Interestingly, the number of transcribed genes detected was independent on FFPE storage time. RNA of sufficient quality and quantity can be extracted from FFPE samples to obtain comprehensive transcriptome profiling using MACE technology. We thus present MACE as a novel opportunity for utilizing FFPE samples stored in histological archives.
Collapse
Affiliation(s)
- Stefaniya Boneva
- Eye Center, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Anja Schlecht
- Eye Center, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Daniel Böhringer
- Eye Center, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Hans Mittelviefhaus
- Eye Center, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Thomas Reinhard
- Eye Center, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Hansjürgen Agostini
- Eye Center, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Claudia Auw-Haedrich
- Eye Center, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Günther Schlunck
- Eye Center, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Julian Wolf
- Eye Center, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Clemens Lange
- Eye Center, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany.
| |
Collapse
|
7
|
Zhao L, Wang C, Lehman ML, He M, An J, Svingen T, Spiller CM, Ng ET, Nelson CC, Koopman P. Transcriptomic analysis of mRNA expression and alternative splicing during mouse sex determination. Mol Cell Endocrinol 2018; 478:84-96. [PMID: 30053582 DOI: 10.1016/j.mce.2018.07.010] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Revised: 07/23/2018] [Accepted: 07/23/2018] [Indexed: 12/15/2022]
Abstract
Mammalian sex determination hinges on sexually dimorphic transcriptional programs in developing fetal gonads. A comprehensive view of these programs is crucial for understanding the normal development of fetal testes and ovaries and the etiology of human disorders of sex development (DSDs), many of which remain unexplained. Using strand-specific RNA-sequencing, we characterized the mouse fetal gonadal transcriptome from 10.5 to 13.5 days post coitum, a key time window in sex determination and gonad development. Our dataset benefits from a greater sensitivity, accuracy and dynamic range compared to microarray studies, allows global dynamics and sex-specificity of gene expression to be assessed, and provides a window to non-transcriptional events such as alternative splicing. Spliceomic analysis uncovered female-specific regulation of Lef1 splicing, which may contribute to the enhanced WNT signaling activity in XX gonads. We provide a user-friendly visualization tool for the complete transcriptomic and spliceomic dataset as a resource for the field.
Collapse
Affiliation(s)
- Liang Zhao
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, 4072, Australia
| | - Chenwei Wang
- Australian Prostate Cancer Research Centre - Queensland, Institute of Health and Biomedical Innovation, Queensland University of Technology, Princess Alexandra Hospital, Translational Research Institute, Brisbane, Queensland, 4102, Australia
| | - Melanie L Lehman
- Australian Prostate Cancer Research Centre - Queensland, Institute of Health and Biomedical Innovation, Queensland University of Technology, Princess Alexandra Hospital, Translational Research Institute, Brisbane, Queensland, 4102, Australia
| | - Mingyu He
- Longsoft, Brisbane, Queensland, 4109, Australia
| | - Jiyuan An
- Australian Prostate Cancer Research Centre - Queensland, Institute of Health and Biomedical Innovation, Queensland University of Technology, Princess Alexandra Hospital, Translational Research Institute, Brisbane, Queensland, 4102, Australia
| | - Terje Svingen
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, 4072, Australia
| | - Cassy M Spiller
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, 4072, Australia
| | - Ee Ting Ng
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, 4072, Australia
| | - Colleen C Nelson
- Australian Prostate Cancer Research Centre - Queensland, Institute of Health and Biomedical Innovation, Queensland University of Technology, Princess Alexandra Hospital, Translational Research Institute, Brisbane, Queensland, 4102, Australia
| | - Peter Koopman
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, 4072, Australia.
| |
Collapse
|
8
|
Evaluation of two main RNA-seq approaches for gene quantification in clinical RNA sequencing: polyA+ selection versus rRNA depletion. Sci Rep 2018; 8:4781. [PMID: 29556074 PMCID: PMC5859127 DOI: 10.1038/s41598-018-23226-4] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 03/07/2018] [Indexed: 12/17/2022] Open
Abstract
To allow efficient transcript/gene detection, highly abundant ribosomal RNAs (rRNA) are generally removed from total RNA either by positive polyA+ selection or by rRNA depletion (negative selection) before sequencing. Comparisons between the two methods have been carried out by various groups, but the assessments have relied largely on non-clinical samples. In this study, we evaluated these two RNA sequencing approaches using human blood and colon tissue samples. Our analyses showed that rRNA depletion captured more unique transcriptome features, whereas polyA+ selection outperformed rRNA depletion with higher exonic coverage and better accuracy of gene quantification. For blood- and colon-derived RNAs, we found that 220% and 50% more reads, respectively, would have to be sequenced to achieve the same level of exonic coverage in the rRNA depletion method compared with the polyA+ selection method. Therefore, in most cases we strongly recommend polyA+ selection over rRNA depletion for gene quantification in clinical RNA sequencing. Our evaluation revealed that a small number of lncRNAs and small RNAs made up a large fraction of the reads in the rRNA depletion RNA sequencing data. Thus, we recommend that these RNAs are specifically depleted to improve the sequencing depth of the remaining RNAs.
Collapse
|
9
|
Odhams CA, Cunninghame Graham DS, Vyse TJ. Profiling RNA-Seq at multiple resolutions markedly increases the number of causal eQTLs in autoimmune disease. PLoS Genet 2017; 13:e1007071. [PMID: 29059182 PMCID: PMC5695635 DOI: 10.1371/journal.pgen.1007071] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Revised: 11/02/2017] [Accepted: 10/11/2017] [Indexed: 01/12/2023] Open
Abstract
Genome-wide association studies have identified hundreds of risk loci for autoimmune disease, yet only a minority (~25%) share genetic effects with changes to gene expression (eQTLs) in immune cells. RNA-Seq based quantification at whole-gene resolution, where abundance is estimated by culminating expression of all transcripts or exons of the same gene, is likely to account for this observed lack of colocalisation as subtle isoform switches and expression variation in independent exons can be concealed. We performed integrative cis-eQTL analysis using association statistics from twenty autoimmune diseases (560 independent loci) and RNA-Seq data from 373 individuals of the Geuvadis cohort profiled at gene-, isoform-, exon-, junction-, and intron-level resolution in lymphoblastoid cell lines. After stringently testing for a shared causal variant using both the Joint Likelihood Mapping and Regulatory Trait Concordance frameworks, we found that gene-level quantification significantly underestimated the number of causal cis-eQTLs. Only 5.0-5.3% of loci were found to share a causal cis-eQTL at gene-level compared to 12.9-18.4% at exon-level and 9.6-10.5% at junction-level. More than a fifth of autoimmune loci shared an underlying causal variant in a single cell type by combining all five quantification types; a marked increase over current estimates of steady-state causal cis-eQTLs. Causal cis-eQTLs detected at different quantification types localised to discrete epigenetic annotations. We applied a linear mixed-effects model to distinguish cis-eQTLs modulating all expression elements of a gene from those where the signal is only evident in a subset of elements. Exon-level analysis detected disease-associated cis-eQTLs that subtly altered transcription globally across the target gene. We dissected in detail the genetic associations of systemic lupus erythematosus and functionally annotated the candidate genes. Many of the known and novel genes were concealed at gene-level (e.g. IKZF2, TYK2, LYST). Our findings are provided as a web resource.
Collapse
Affiliation(s)
- Christopher A. Odhams
- Department of Medical & Molecular Genetics, King’s College London, London, United Kingdom
| | - Deborah S. Cunninghame Graham
- Department of Medical & Molecular Genetics, King’s College London, London, United Kingdom
- Academic Department of Rheumatology, Division of Immunology, Infection and Inflammatory Disease, King’s College London, London, United Kingdom
| | - Timothy J. Vyse
- Department of Medical & Molecular Genetics, King’s College London, London, United Kingdom
- Academic Department of Rheumatology, Division of Immunology, Infection and Inflammatory Disease, King’s College London, London, United Kingdom
| |
Collapse
|
10
|
Zhang C, Zhang B, Lin LL, Zhao S. Evaluation and comparison of computational tools for RNA-seq isoform quantification. BMC Genomics 2017; 18:583. [PMID: 28784092 PMCID: PMC5547501 DOI: 10.1186/s12864-017-4002-1] [Citation(s) in RCA: 101] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Accepted: 08/01/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Alternatively spliced transcript isoforms are commonly observed in higher eukaryotes. The expression levels of these isoforms are key for understanding normal functions in healthy tissues and the progression of disease states. However, accurate quantification of expression at the transcript level is limited with current RNA-seq technologies because of, for example, limited read length and the cost of deep sequencing. RESULTS A large number of tools have been developed to tackle this problem, and we performed a comprehensive evaluation of these tools using both experimental and simulated RNA-seq datasets. We found that recently developed alignment-free tools are both fast and accurate. The accuracy of all methods was mainly influenced by the complexity of gene structures and caution must be taken when interpreting quantification results for short transcripts. Using TP53 gene simulation, we discovered that both sequencing depth and the relative abundance of different isoforms affect quantification accuracy CONCLUSIONS: Our comprehensive evaluation helps data analysts to make informed choice when selecting computational tools for isoform quantification.
Collapse
Affiliation(s)
- Chi Zhang
- Early Clinical Development, Pfizer Worldwide R&D, Cambridge, MA, 02139, USA
| | - Baohong Zhang
- Early Clinical Development, Pfizer Worldwide R&D, Cambridge, MA, 02139, USA
| | - Lih-Ling Lin
- Inflammation and Immunology RU, Pfizer Worldwide R&D, Cambridge, MA, 02139, USA
| | - Shanrong Zhao
- Early Clinical Development, Pfizer Worldwide R&D, Cambridge, MA, 02139, USA.
| |
Collapse
|
11
|
Nazarov PV, Muller A, Kaoma T, Nicot N, Maximo C, Birembaut P, Tran NL, Dittmar G, Vallar L. RNA sequencing and transcriptome arrays analyses show opposing results for alternative splicing in patient derived samples. BMC Genomics 2017; 18:443. [PMID: 28587590 PMCID: PMC5461714 DOI: 10.1186/s12864-017-3819-y] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 05/25/2017] [Indexed: 01/29/2023] Open
Abstract
Background RNA sequencing (RNA-seq) and microarrays are two transcriptomics techniques aimed at the quantification of transcribed genes and their isoforms. Here we compare the latest Affymetrix HTA 2.0 microarray with Illumina 2000 RNA-seq for the analysis of patient samples - normal lung epithelium tissue and squamous cell carcinoma lung tumours. Protein coding mRNAs and long non-coding RNAs (lncRNAs) were included in the study. Results Both platforms performed equally well for protein-coding RNAs, however the stochastic variability was higher for the sequencing data than for microarrays. This reduced the number of differentially expressed genes and genes with predictive potential for RNA-seq compared to microarray data. Analysis of this variability revealed a lack of reads for short and low abundant genes; lncRNAs, being shorter and less abundant RNAs, were found especially susceptible to this issue. A major difference between the two platforms was uncovered by analysis of alternatively spliced genes. Investigation of differential exon abundance showed insufficient reads for many exons and exon junctions in RNA-seq while the detection on the array platform was more stable. Nevertheless, we identified 207 genes which undergo alternative splicing and were consistently detected by both techniques. Conclusions Despite the fact that the results of gene expression analysis were highly consistent between Human Transcriptome Arrays and RNA-seq platforms, the analysis of alternative splicing produced discordant results. We concluded that modern microarrays can still outperform sequencing for standard analysis of gene expression in terms of reproducibility and cost. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3819-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Petr V Nazarov
- Proteome and Genome Research Unit, Department of Oncology, Luxembourg Institute of Health, Luxembourg, Luxembourg.
| | - Arnaud Muller
- Proteome and Genome Research Unit, Department of Oncology, Luxembourg Institute of Health, Luxembourg, Luxembourg
| | - Tony Kaoma
- Proteome and Genome Research Unit, Department of Oncology, Luxembourg Institute of Health, Luxembourg, Luxembourg
| | - Nathalie Nicot
- Proteome and Genome Research Unit, Department of Oncology, Luxembourg Institute of Health, Luxembourg, Luxembourg
| | - Cristina Maximo
- Proteome and Genome Research Unit, Department of Oncology, Luxembourg Institute of Health, Luxembourg, Luxembourg
| | | | - Nhan L Tran
- Departments of Cancer Biology and Neurosurgery, Mayo Clinic Arizona, Phoenix, USA
| | - Gunnar Dittmar
- Proteome and Genome Research Unit, Department of Oncology, Luxembourg Institute of Health, Luxembourg, Luxembourg
| | - Laurent Vallar
- Proteome and Genome Research Unit, Department of Oncology, Luxembourg Institute of Health, Luxembourg, Luxembourg
| |
Collapse
|
12
|
Odhams CA, Cortini A, Chen L, Roberts AL, Viñuela A, Buil A, Small KS, Dermitzakis ET, Morris DL, Vyse TJ, Cunninghame Graham DS. Mapping eQTLs with RNA-seq reveals novel susceptibility genes, non-coding RNAs and alternative-splicing events in systemic lupus erythematosus. Hum Mol Genet 2017; 26:1003-1017. [PMID: 28062664 PMCID: PMC5409091 DOI: 10.1093/hmg/ddw417] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Accepted: 12/05/2016] [Indexed: 12/19/2022] Open
Abstract
Studies attempting to functionally interpret complex-disease susceptibility loci by GWAS and eQTL integration have predominantly employed microarrays to quantify gene-expression. RNA-Seq has the potential to discover a more comprehensive set of eQTLs and illuminate the underlying molecular consequence. We examine the functional outcome of 39 variants associated with Systemic Lupus Erythematosus (SLE) through the integration of GWAS and eQTL data from the TwinsUK microarray and RNA-Seq cohort in lymphoblastoid cell lines. We use conditional analysis and a Bayesian colocalisation method to provide evidence of a shared causal-variant, then compare the ability of each quantification type to detect disease relevant eQTLs and eGenes. We discovered the greatest frequency of candidate-causal eQTLs using exon-level RNA-Seq, and identified novel SLE susceptibility genes (e.g. NADSYN1 and TCF7) that were concealed using microarrays, including four non-coding RNAs. Many of these eQTLs were found to influence the expression of several genes, supporting the notion that risk haplotypes may harbour multiple functional effects. Novel SLE associated splicing events were identified in the T-reg restricted transcription factor, IKZF2, and other candidate genes (e.g. WDFY4) through asQTL mapping using the Geuvadis cohort. We have significantly increased our understanding of the genetic control of gene-expression in SLE by maximising the leverage of RNA-Seq and performing integrative GWAS-eQTL analysis against gene, exon, and splice-junction quantifications. We conclude that to better understand the true functional consequence of regulatory variants, quantification by RNA-Seq should be performed at the exon-level as a minimum, and run in parallel with gene and splice-junction level quantification.
Collapse
Affiliation(s)
| | - Andrea Cortini
- Department of Medical & Molecular Genetics, King's College London, London, UK
| | - Lingyan Chen
- Department of Medical & Molecular Genetics, King's College London, London, UK
| | - Amy L Roberts
- Department of Medical & Molecular Genetics, King's College London, London, UK
| | - Ana Viñuela
- Department of Twin Research, King's College London, London, UK
| | | | - Kerrin S Small
- Department of Twin Research, King's College London, London, UK
| | | | - David L Morris
- Department of Medical & Molecular Genetics, King's College London, London, UK
| | - Timothy J Vyse
- Department of Medical & Molecular Genetics, King's College London, London, UK.,Division of Immunology, Infection and Inflammatory Disease, King's College London, London, UK
| | | |
Collapse
|
13
|
Zhao S, Xi L, Quan J, Xi H, Zhang Y, von Schack D, Vincent M, Zhang B. QuickRNASeq lifts large-scale RNA-seq data analyses to the next level of automation and interactive visualization. BMC Genomics 2016; 17:39. [PMID: 26747388 PMCID: PMC4706714 DOI: 10.1186/s12864-015-2356-9] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 12/23/2015] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND RNA sequencing (RNA-seq), a next-generation sequencing technique for transcriptome profiling, is being increasingly used, in part driven by the decreasing cost of sequencing. Nevertheless, the analysis of the massive amounts of data generated by large-scale RNA-seq remains a challenge. Multiple algorithms pertinent to basic analyses have been developed, and there is an increasing need to automate the use of these tools so as to obtain results in an efficient and user friendly manner. Increased automation and improved visualization of the results will help make the results and findings of the analyses readily available to experimental scientists. RESULTS By combing the best open source tools developed for RNA-seq data analyses and the most advanced web 2.0 technologies, we have implemented QuickRNASeq, a pipeline for large-scale RNA-seq data analyses and visualization. The QuickRNASeq workflow consists of three main steps. In Step #1, each individual sample is processed, including mapping RNA-seq reads to a reference genome, counting the numbers of mapped reads, quality control of the aligned reads, and SNP (single nucleotide polymorphism) calling. Step #1 is computationally intensive, and can be processed in parallel. In Step #2, the results from individual samples are merged, and an integrated and interactive project report is generated. All analyses results in the report are accessible via a single HTML entry webpage. Step #3 is the data interpretation and presentation step. The rich visualization features implemented here allow end users to interactively explore the results of RNA-seq data analyses, and to gain more insights into RNA-seq datasets. In addition, we used a real world dataset to demonstrate the simplicity and efficiency of QuickRNASeq in RNA-seq data analyses and interactive visualizations. The seamless integration of automated capabilites with interactive visualizations in QuickRNASeq is not available in other published RNA-seq pipelines. CONCLUSION The high degree of automation and interactivity in QuickRNASeq leads to a substantial reduction in the time and effort required prior to further downstream analyses and interpretation of the analyses findings. QuickRNASeq advances primary RNA-seq data analyses to the next level of automation, and is mature for public release and adoption.
Collapse
Affiliation(s)
- Shanrong Zhao
- PharmaTherapeutics Clinical R&D, Pfizer Worldwide Research and Development, Cambridge, MA, 02139, USA.
| | - Li Xi
- PharmaTherapeutics Clinical R&D, Pfizer Worldwide Research and Development, Cambridge, MA, 02139, USA.
| | - Jie Quan
- Computational Sciences Center of Emphasis, Pfizer Worldwide Research and Development, Cambridge, MA, 02139, USA.
| | - Hualin Xi
- Computational Sciences Center of Emphasis, Pfizer Worldwide Research and Development, Cambridge, MA, 02139, USA.
| | - Ying Zhang
- PharmaTherapeutics Clinical R&D, Pfizer Worldwide Research and Development, Cambridge, MA, 02139, USA.
| | - David von Schack
- PharmaTherapeutics Clinical R&D, Pfizer Worldwide Research and Development, Cambridge, MA, 02139, USA.
| | - Michael Vincent
- PharmaTherapeutics Clinical R&D, Pfizer Worldwide Research and Development, Cambridge, MA, 02139, USA.
| | - Baohong Zhang
- PharmaTherapeutics Clinical R&D, Pfizer Worldwide Research and Development, Cambridge, MA, 02139, USA.
| |
Collapse
|
14
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015; 4:1521. [PMID: 26925227 PMCID: PMC4712774 DOI: 10.12688/f1000research.7563.1] [Citation(s) in RCA: 1680] [Impact Index Per Article: 186.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/14/2015] [Indexed: 01/14/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Several different quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that while the presence of differential isoform usage can lead to inflated false discovery rates in differential expression analyses on simple count matrices and transcript-level abundance estimates improve the performance in simulated data, the difference is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I. Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D. Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
15
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015; 4:1521. [PMID: 26925227 PMCID: PMC4712774 DOI: 10.12688/f1000research.7563.2] [Citation(s) in RCA: 1508] [Impact Index Per Article: 167.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/23/2016] [Indexed: 12/21/2022] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I. Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D. Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
16
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.12688/f1000research10.12688/f1000research.7563.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/15/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
17
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.5256/f1000research.7563.d114723] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
18
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.5256/f1000research.7563.d114726] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
19
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.5256/f1000research.7563.d114724] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
20
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.5256/f1000research.7563.d114722] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
21
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.5256/f1000research.7563.d114730] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
22
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.5256/f1000research.7563.d114725] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|