1
|
Gunady MK, Mount SM, Corrada Bravo H. Yanagi: Fast and interpretable segment-based alternative splicing and gene expression analysis. BMC Bioinformatics 2019; 20:421. [PMID: 31409274 PMCID: PMC6693274 DOI: 10.1186/s12859-019-2947-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Accepted: 06/12/2019] [Indexed: 12/13/2022] Open
Abstract
Background Ultra-fast pseudo-alignment approaches are the tool of choice in transcript-level RNA sequencing (RNA-seq) analyses. Unfortunately, these methods couple the tasks of pseudo-alignment and transcript quantification. This coupling precludes the direct usage of pseudo-alignment to other expression analyses, including alternative splicing or differential gene expression analysis, without including a non-essential transcript quantification step. Results In this paper, we introduce a transcriptome segmentation approach to decouple these two tasks. We propose an efficient algorithm to generate maximal disjoint segments given a transcriptome reference library on which ultra-fast pseudo-alignment can be used to produce per-sample segment counts. We show how to apply these maximally unambiguous count statistics in two specific expression analyses – alternative splicing and gene differential expression – without the need of a transcript quantification step. Our experiments based on simulated and experimental data showed that the use of segment counts, like other methods that rely on local coverage statistics, provides an advantage over approaches that rely on transcript quantification in detecting and correctly estimating local splicing in the case of incomplete transcript annotations. Conclusions The transcriptome segmentation approach implemented in Yanagi exploits the computational and space efficiency of pseudo-alignment approaches. It significantly expands their applicability and interpretability in a variety of RNA-seq analyses by providing the means to model and capture local coverage variation in these analyses. Electronic supplementary material The online version of this article (10.1186/s12859-019-2947-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mohamed K Gunady
- Department of Computer Science, University of Maryland, College Park, Maryland, USA.,Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA
| | - Stephen M Mount
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, USA
| | - Héctor Corrada Bravo
- Department of Computer Science, University of Maryland, College Park, Maryland, USA. .,Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA.
| |
Collapse
|
2
|
Gan RC, Chen TW, Wu TH, Huang PJ, Lee CC, Yeh YM, Chiu CH, Huang HD, Tang P. PARRoT- a homology-based strategy to quantify and compare RNA-sequencing from non-model organisms. BMC Bioinformatics 2016; 17:513. [PMID: 28155708 PMCID: PMC5260104 DOI: 10.1186/s12859-016-1366-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023] Open
Abstract
Background Next-generation sequencing promises the de novo genomic and transcriptomic analysis of samples of interests. However, there are only a few organisms having reference genomic sequences and even fewer having well-defined or curated annotations. For transcriptome studies focusing on organisms lacking proper reference genomes, the common strategy is de novo assembly followed by functional annotation. However, things become even more complicated when multiple transcriptomes are compared. Results Here, we propose a new analysis strategy and quantification methods for quantifying expression level which not only generate a virtual reference from sequencing data, but also provide comparisons between transcriptomes. First, all reads from the transcriptome datasets are pooled together for de novo assembly. The assembled contigs are searched against NCBI NR databases to find potential homolog sequences. Based on the searched result, a set of virtual transcripts are generated and served as a reference transcriptome. By using the same reference, normalized quantification values including RC (read counts), eRPKM (estimated RPKM) and eTPM (estimated TPM) can be obtained that are comparable across transcriptome datasets. In order to demonstrate the feasibility of our strategy, we implement it in the web service PARRoT. PARRoT stands for Pipeline for Analyzing RNA Reads of Transcriptomes. It analyzes gene expression profiles for two transcriptome sequencing datasets. For better understanding of the biological meaning from the comparison among transcriptomes, PARRoT further provides linkage between these virtual transcripts and their potential function through showing best hits in SwissProt, NR database, assigning GO terms. Our demo datasets showed that PARRoT can analyze two paired-end transcriptomic datasets of approximately 100 million reads within just three hours. Conclusions In this study, we proposed and implemented a strategy to analyze transcriptomes from non-reference organisms which offers the opportunity to quantify and compare transcriptome profiles through a homolog based virtual transcriptome reference. By using the homolog based reference, our strategy effectively avoids the problems that may cause from inconsistencies among transcriptomes. This strategy will shed lights on the field of comparative genomics for non-model organism. We have implemented PARRoT as a web service which is freely available at http://parrot.cgu.edu.tw.
Collapse
Affiliation(s)
- Ruei-Chi Gan
- Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu, 300, Taiwan.,Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Ting-Wen Chen
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Timothy H Wu
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei City, Taiwan
| | - Po-Jung Huang
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Chi-Ching Lee
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Yuan-Ming Yeh
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Cheng-Hsun Chiu
- Molecular Infectious Diseases Research Center, Chang Gung Memorial Hospital, Taoyuan, Taiwan
| | - Hsien-Da Huang
- Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu, 300, Taiwan. .,Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsin-Chu, 300, Taiwan.
| | - Petrus Tang
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan. .,Molecular Infectious Diseases Research Center, Chang Gung Memorial Hospital, Taoyuan, Taiwan. .,Molecular Regulation & Bioinformatics Laboratory, Chang Gung University, Taoyuan, Taiwan.
| |
Collapse
|