1
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015; 4:1521. [PMID: 26925227 PMCID: PMC4712774 DOI: 10.12688/f1000research.7563.2] [Citation(s) in RCA: 1529] [Impact Index Per Article: 169.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/23/2016] [Indexed: 12/21/2022] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I. Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D. Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
2
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.12688/f1000research10.12688/f1000research.7563.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/15/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
3
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.5256/f1000research.7563.d114723] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
4
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.5256/f1000research.7563.d114726] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
5
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.5256/f1000research.7563.d114724] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
6
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.5256/f1000research.7563.d114722] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
7
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.5256/f1000research.7563.d114730] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|
8
|
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 2015. [PMID: 26925227 DOI: 10.5256/f1000research.7563.d114725] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02210, USA; Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, 02115, USA
| | - Mark D Robinson
- Institute for Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland; SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland
| |
Collapse
|