1
|
Sharma H, Pani T, Dasgupta U, Batra J, Sharma RD. Prediction of transcript structure and concentration using RNA-Seq data. Brief Bioinform 2023; 24:6995379. [PMID: 36682028 DOI: 10.1093/bib/bbad022] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 11/25/2022] [Accepted: 01/06/2023] [Indexed: 01/23/2023] Open
Abstract
Alternative splicing (AS) is a key post-transcriptional modification that helps in increasing protein diversity. Almost 90% of the protein-coding genes in humans are known to undergo AS and code for different transcripts. Some transcripts are associated with diseases such as breast cancer, lung cancer and glioblastoma. Hence, these transcripts can serve as novel therapeutic and prognostic targets for drug discovery. Herein, we have developed a pipeline, Finding Alternative Splicing Events (FASE), as the R package that includes modules to determine the structure and concentration of transcripts using differential AS. To predict the correct structure of expressed transcripts in given conditions, FASE combines the AS events with the information of exons, introns and junctions using graph theory. The estimated concentration of predicted transcripts is reported as the relative expression in terms of log2CPM. Using FASE, we were able to identify several unique transcripts of EMILIN1 and SLK genes in the TCGA-BRCA data, which were validated using RT-PCR. The experimental study demonstrated consistent results, which signify the high accuracy and precision of the developed methods. In conclusion, the developed pipeline, FASE, can efficiently predict novel transcripts that are missed in general transcript-level differential expression analysis. It can be applied selectively from a single gene to simple or complex genome even in multiple experimental conditions for the identification of differential AS-based biomarkers, prognostic targets and novel therapeutics.
Collapse
Affiliation(s)
- Harsh Sharma
- Amity Institute of Integrative Sciences and Health, Amity University Haryana, Gurugram 122413, India
| | - Trishna Pani
- Amity Institute of Integrative Sciences and Health, Amity University Haryana, Gurugram 122413, India
| | - Ujjaini Dasgupta
- Amity Institute of Integrative Sciences and Health, Amity University Haryana, Gurugram 122413, India
| | - Jyotsna Batra
- School of Biomedical Sciences, Institute of Health and Biomedical Innovation (IHBI), Translational Research Institute, Queensland University of Technology (QUT), Brisbane, QLD, Australia
| | - Ravi Datta Sharma
- Amity Institute of Integrative Sciences and Health, Amity University Haryana, Gurugram 122413, India
| |
Collapse
|
2
|
Qi W, Fu H, Luo X, Ren Y, Liu X, Dai H, Zheng Q, Liang F. Electroacupuncture at PC6 (Neiguan) Attenuates Angina Pectoris in Rats with Myocardial Ischemia-Reperfusion Injury Through Regulating the Alternative Splicing of the Major Inhibitory Neurotransmitter Receptor GABRG2. J Cardiovasc Transl Res 2022; 15:1176-1191. [PMID: 35377129 DOI: 10.1007/s12265-022-10245-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 03/25/2022] [Indexed: 11/27/2022]
Abstract
Angina pectoris is the most common manifestation of coronary heart disease, causing suffering in patients. Electroacupuncture at PC6 can effectively alleviate angina by regulating the expression of genes, whether the alternative splicing (AS) of genes is affected by acupuncture is still unknown. We established a rat model of myocardial ischemia-reperfusion by coronary artery ligation and confirmed electroacupuncture alleviated the abnormal discharge caused by angina pectoris measured in EMG electromyograms. Analysis of the GSE61840 dataset established that AS events were altered after I/R and regulated by electroacupuncture. I/R decreased the expression of splicing factor Nova1 while electroacupuncture rescued it. Further experiments in dorsal root ganglion cells showed Nova1 regulated the AS of the GABRG2, specifically on its exon 9 where an important phosphorylation site is present. In vivo, results also showed that electroacupuncture can restore AS of GABRG2. Our results proved that electroacupuncture alleviates angina results by regulating alternative splicing.
Collapse
Affiliation(s)
- Wenchuan Qi
- Chengdu University of Traditional Chinese Medicine, Chengdu, 610075, Sichuan, China
| | - Hongjuan Fu
- Chengdu University of Traditional Chinese Medicine, Chengdu, 610075, Sichuan, China
| | - Xinye Luo
- Chengdu University of Traditional Chinese Medicine, Chengdu, 610075, Sichuan, China
| | - Yanrong Ren
- Chengdu University of Traditional Chinese Medicine, Chengdu, 610075, Sichuan, China.,Shanxi University of Traditional Chinese Medicine, Jinzhong, 030002, Shanxi, China
| | - Xueying Liu
- Chengdu University of Traditional Chinese Medicine, Chengdu, 610075, Sichuan, China.,Shanxi University of Traditional Chinese Medicine, Jinzhong, 030002, Shanxi, China
| | - Hongyuan Dai
- College of Life Sciences, Sichuan University, Chengdu, 610065, Sichuan, China
| | - Qianhua Zheng
- Chengdu University of Traditional Chinese Medicine, Chengdu, 610075, Sichuan, China
| | - Fanrong Liang
- Chengdu University of Traditional Chinese Medicine, Chengdu, 610075, Sichuan, China.
| |
Collapse
|
3
|
Beaulieu L, Vitseva O, Tanriverdi K, Kucukural A, Mick E, Hamburg N, Vita J, Freedman J. Platelet functional and transcriptional changes induced by intralipid infusion. Thromb Haemost 2017; 115:1147-56. [DOI: 10.1160/th15-09-0739] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 02/11/2016] [Indexed: 02/07/2023]
Abstract
SummaryMultiple studies have shown the effects of long-term exposure to high-fat or western diets on the vascular system. There is limited knowledge on the acute effects of high circulating fat levels, specifically on platelets, which have a role in many processes, including thrombosis and inflammation. This study investigated the effects of acute, high-fat exposure on platelet function and transcript profile. Twenty healthy participants were given an intravenous infusion of 20% Intralipid emulsion and heparin over 6 hours. Blood samples were taken prior to and the day after infusion to measure platelet function and transcript expression levels. Platelet aggregation was not significantly affected by Intralipid infusion, but, when mitochondria function was inhibited by carbonyl cyanide 3-chlorophenylhydrazone (CCCP) or oligomycin, platelet aggregation was higher in the post-infusion state compared to baseline. Through RNA sequencing, and verified by RT-qPCR, 902 miRNAs and 617 mRNAs were affected by Intralipid infusion. MicroRNAs increased include miR-4259 and miR-346, while miR-517b and miR-517c are both decreased. Pathway analysis identified two clusters significantly enriched, including cell motility. In conclusion, acute exposure to high fat affects mitochondrial-dependent platelet function, as well as the transcript profile.
Collapse
|
4
|
Consiglio A, Mencar C, Grillo G, Marzano F, Caratozzolo MF, Liuni S. A fuzzy method for RNA-Seq differential expression analysis in presence of multireads. BMC Bioinformatics 2016; 17:345. [PMID: 28185579 PMCID: PMC5123383 DOI: 10.1186/s12859-016-1195-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Background When the reads obtained from high-throughput RNA sequencing are mapped against a reference database, a significant proportion of them - known as multireads - can map to more than one reference sequence. These multireads originate from gene duplications, repetitive regions or overlapping genes. Removing the multireads from the mapping results, in RNA-Seq analyses, causes an underestimation of the read counts, while estimating the real read count can lead to false positives during the detection of differentially expressed sequences. Results We present an innovative approach to deal with multireads and evaluate differential expression events, entirely based on fuzzy set theory. Since multireads cause uncertainty in the estimation of read counts during gene expression computation, they can also influence the reliability of differential expression analysis results, by producing false positives. Our method manages the uncertainty in gene expression estimation by defining the fuzzy read counts and evaluates the possibility of a gene to be differentially expressed with three fuzzy concepts: over-expression, same-expression and under-expression. The output of the method is a list of differentially expressed genes enriched with information about the uncertainty of the results due to the multiread presence. We have tested the method on RNA-Seq data designed for case-control studies and we have compared the obtained results with other existing tools for read count estimation and differential expression analysis. Conclusions The management of multireads with the use of fuzzy sets allows to obtain a list of differential expression events which takes in account the uncertainty in the results caused by the presence of multireads. Such additional information can be used by the biologists when they have to select the most relevant differential expression events to validate with laboratory assays. Our method can be used to compute reliable differential expression events and to highlight possible false positives in the lists of differentially expressed genes computed with other tools. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1195-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Arianna Consiglio
- Institute for Biomedical Technologies of Bari - ITB, National Research Council, Bari, 70126, Italy.
| | - Corrado Mencar
- Department of Informatics, University of Bari Aldo Moro, Bari, 70121, Italy
| | - Giorgio Grillo
- Institute for Biomedical Technologies of Bari - ITB, National Research Council, Bari, 70126, Italy
| | - Flaviana Marzano
- Institute for Biomedical Technologies of Bari - ITB, National Research Council, Bari, 70126, Italy
| | | | - Sabino Liuni
- Institute for Biomedical Technologies of Bari - ITB, National Research Council, Bari, 70126, Italy
| |
Collapse
|
5
|
Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 2016; 34:525-7. [PMID: 27043002 DOI: 10.1038/nbt.3519] [Citation(s) in RCA: 5299] [Impact Index Per Article: 662.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2015] [Accepted: 02/25/2016] [Indexed: 12/18/2022]
Abstract
We present kallisto, an RNA-seq quantification program that is two orders of magnitude faster than previous approaches and achieves similar accuracy. Kallisto pseudoaligns reads to a reference, producing a list of transcripts that are compatible with each read while avoiding alignment of individual bases. We use kallisto to analyze 30 million unaligned paired-end RNA-seq reads in <10 min on a standard laptop computer. This removes a major computational bottleneck in RNA-seq analysis.
Collapse
Affiliation(s)
- Nicolas L Bray
- Innovative Genomics Initiative, University of California, Berkeley, California, USA
| | - Harold Pimentel
- Department of Computer Science, University of California, Berkeley, California, USA
| | - Páll Melsted
- Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavik, Iceland
| | - Lior Pachter
- Department of Computer Science, University of California, Berkeley, California, USA.,Department of Mathematics, University of California, Berkeley, California, USA.,Department of Molecular &Cell Biology, University of California, Berkeley, California, USA
| |
Collapse
|
6
|
Zeng X, Li B, Welch R, Rojo C, Zheng Y, Dewey CN, Keleş S. Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping. PLoS Comput Biol 2015; 11:e1004491. [PMID: 26484757 PMCID: PMC4618727 DOI: 10.1371/journal.pcbi.1004491] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Accepted: 08/06/2015] [Indexed: 11/19/2022] Open
Abstract
Segmental duplications and other highly repetitive regions of genomes contribute significantly to cells' regulatory programs. Advancements in next generation sequencing enabled genome-wide profiling of protein-DNA interactions by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq). However, interactions in highly repetitive regions of genomes have proven difficult to map since short reads of 50-100 base pairs (bps) from these regions map to multiple locations in reference genomes. Standard analytical methods discard such multi-mapping reads and the few that can accommodate them are prone to large false positive and negative rates. We developed Perm-seq, a prior-enhanced read allocation method for ChIP-seq experiments, that can allocate multi-mapping reads in highly repetitive regions of the genomes with high accuracy. We comprehensively evaluated Perm-seq, and found that our prior-enhanced approach significantly improves multi-read allocation accuracy over approaches that do not utilize additional data types. The statistical formalism underlying our approach facilitates supervising of multi-read allocation with a variety of data sources including histone ChIP-seq. We applied Perm-seq to 64 ENCODE ChIP-seq datasets from GM12878 and K562 cells and identified many novel protein-DNA interactions in segmental duplication regions. Our analysis reveals that although the protein-DNA interactions sites are evolutionarily less conserved in repetitive regions, they share the overall sequence characteristics of the protein-DNA interactions in non-repetitive regions.
Collapse
Affiliation(s)
- Xin Zeng
- Department of Statistics, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Bo Li
- California Institute for Quantitative Biosciences, University of California, Berkeley, California, United States of America
| | - Rene Welch
- Department of Statistics, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Constanza Rojo
- Department of Statistics, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Ye Zheng
- Department of Statistics, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Colin N. Dewey
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Sündüz Keleş
- Department of Statistics, University of Wisconsin, Madison, Wisconsin, United States of America
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin, United States of America
| |
Collapse
|
7
|
Chandramohan R, Wu PY, Phan JH, Wang MD. Benchmarking RNA-Seq quantification tools. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2015; 2013:647-50. [PMID: 24109770 DOI: 10.1109/embc.2013.6609583] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
RNA-Seq, a deep sequencing technique, promises to be a potential successor to microarrays for studying the transcriptome. One of many aspects of transcriptomics that are of interest to researchers is gene expression estimation. With rapid development in RNA-Seq, there are numerous tools available to estimate gene expression, each producing different results. However, we do not know which of these tools produces the most accurate gene expression estimates. In this study we have addressed this issue using Cufflinks, IsoEM, HTSeq, and RSEM to quantify RNA-Seq expression profiles. Comparing results of these quantification tools, we observe that RNA-Seq relative expression estimates correlate with RT-qPCR measurements in the range of 0.85 to 0.89, with HTSeq exhibiting the highest correlation. But, in terms of root-mean-square deviation of RNA-Seq relative expression estimates from RT-qPCR measurements, we find HTSeq to produce the greatest deviation. Therefore, we conclude that, though Cufflinks, RSEM, and IsoEM might not correlate as well as HTSeq with RT-qPCR measurements, they may produce expression values with higher accuracy.
Collapse
|
8
|
LeGault LH, Dewey CN. Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs. ACTA ACUST UNITED AC 2013; 29:2300-10. [PMID: 23846746 PMCID: PMC3753571 DOI: 10.1093/bioinformatics/btt396] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Motivation: Alternative splicing and other processes that allow for different transcripts to be derived from the same gene are significant forces in the eukaryotic cell. RNA-Seq is a promising technology for analyzing alternative transcripts, as it does not require prior knowledge of transcript structures or genome sequences. However, analysis of RNA-Seq data in the presence of genes with large numbers of alternative transcripts is currently challenging due to efficiency, identifiability and representation issues. Results: We present RNA-Seq models and associated inference algorithms based on the concept of probabilistic splice graphs, which alleviate these issues. We prove that our models are often identifiable and demonstrate that our inference methods for quantification and differential processing detection are efficient and accurate. Availability: Software implementing our methods is available at http://deweylab.biostat.wisc.edu/psginfer. Contact:cdewey@biostat.wisc.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Laura H LeGault
- Department of Computer Sciences, University of Wisconsin, Madison, WI 53706, USA
| | | |
Collapse
|
9
|
Abstract
During 2012, next generation sequencing (NGS) has attracted great attention in the biomedical research community, especially for personalized medicine. Also, third generation sequencing has become available. Therefore, state-of-art sequencing technology and analysis are reviewed in this Bioinformatics spotlight on 2012. Next-generation sequencing (NGS) is high-throughput nucleic acid sequencing technology with wide dynamic range and single base resolution. The full promise of NGS depends on the optimization of NGS platforms, sequence alignment and assembly algorithms, data analytics, novel algorithms for integrating NGS data with existing genomic, proteomic, or metabolomic data, and quantitative assessment of NGS technology in comparing to more established technologies such as microarrays. NGS technology has been predicated to become a cornerstone of personalized medicine. It is argued that NGS is a promising field for motivated young researchers who are looking for opportunities in bioinformatics.
Collapse
Affiliation(s)
- May Dongmei Wang
- Bioinformatics and Biocomputing Core in Emory-Georgia Tech Cancer Nanotechnology Center, Georgia Institute of Technology and Emory University, Atlanta, GA 30332-0535, USA.
| |
Collapse
|
10
|
Reddy ASN, Rogers MF, Richardson DN, Hamilton M, Ben-Hur A. Deciphering the plant splicing code: experimental and computational approaches for predicting alternative splicing and splicing regulatory elements. FRONTIERS IN PLANT SCIENCE 2012; 3:18. [PMID: 22645572 PMCID: PMC3355732 DOI: 10.3389/fpls.2012.00018] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2011] [Accepted: 01/18/2012] [Indexed: 05/20/2023]
Abstract
Extensive alternative splicing (AS) of precursor mRNAs (pre-mRNAs) in multicellular eukaryotes increases the protein-coding capacity of a genome and allows novel ways to regulate gene expression. In flowering plants, up to 48% of intron-containing genes exhibit AS. However, the full extent of AS in plants is not yet known, as only a few high-throughput RNA-Seq studies have been performed. As the cost of obtaining RNA-Seq reads continues to fall, it is anticipated that huge amounts of plant sequence data will accumulate and help in obtaining a more complete picture of AS in plants. Although it is not an onerous task to obtain hundreds of millions of reads using high-throughput sequencing technologies, computational tools to accurately predict and visualize AS are still being developed and refined. This review will discuss the tools to predict and visualize transcriptome-wide AS in plants using short-reads and highlight their limitations. Comparative studies of AS events between plants and animals have revealed that there are major differences in the most prevalent types of AS events, suggesting that plants and animals differ in the way they recognize exons and introns. Extensive studies have been performed in animals to identify cis-elements involved in regulating AS, especially in exon skipping. However, few such studies have been carried out in plants. Here, we review the current state of research on splicing regulatory elements (SREs) and briefly discuss emerging experimental and computational tools to identify cis-elements involved in regulation of AS in plants. The availability of curated alternative splice forms in plants makes it possible to use computational tools to predict SREs involved in AS regulation, which can then be verified experimentally. Such studies will permit identification of plant-specific features involved in AS regulation and contribute to deciphering the splicing code in plants.
Collapse
Affiliation(s)
- Anireddy S. N. Reddy
- Program in Molecular Plant Biology, Department of Biology, Colorado State UniversityFort Collins, CO, USA
| | - Mark F. Rogers
- Department of Computer Science, Colorado State UniversityFort Collins, CO, USA
| | - Dale N. Richardson
- Centro de Investigação em Biodiversidade e Recursos Genéticos, University of PortoVairão, Portugal
| | - Michael Hamilton
- Department of Computer Science, Colorado State UniversityFort Collins, CO, USA
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State UniversityFort Collins, CO, USA
- Program in Molecular Plant Biology, Colorado State UniversityFort Collins, CO, USA
| |
Collapse
|
11
|
Abstract
Abstract
Background
RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments.
Results
We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene.
Conclusions
RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
Collapse
|
12
|
Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 2011; 12:323. [PMID: 21816040 PMCID: PMC3163565 DOI: 10.1186/1471-2105-12-323] [Citation(s) in RCA: 12987] [Impact Index Per Article: 999.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2011] [Accepted: 08/04/2011] [Indexed: 02/07/2023] Open
Abstract
Background RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. Results We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. Conclusions RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
Collapse
Affiliation(s)
- Bo Li
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA
| | | |
Collapse
|
13
|
Abstract
BACKGROUND RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. RESULTS We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. CONCLUSIONS RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
Collapse
Affiliation(s)
- Bo Li
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA
| | | |
Collapse
|
14
|
Nicolae M, Măndoiu I. Accurate Estimation of Gene Expression Levels from DGE Sequencing Data. BIOINFORMATICS RESEARCH AND APPLICATIONS 2011. [DOI: 10.1007/978-3-642-21260-4_37] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|