Goh W, Mutwil M. LSTrAP-Kingdom: an automated pipeline to generate annotated gene expression atlases for kingdoms of life.
Bioinformatics 2021;
37:3053-3055. [PMID:
33704421 DOI:
10.1093/bioinformatics/btab168]
[Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 02/26/2021] [Accepted: 03/08/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION
There are now more than two million RNA sequencing experiments for plants, animals, bacteria and fungi publicly available, allowing us to study gene expression within and across species and kingdoms. However, the tools allowing the download, quality control and annotation of this data for more than one species at a time are currently missing.
RESULTS
To remedy this, we present the Large-Scale Transcriptomic Analysis Pipeline in Kingdom of Life (LSTrAP-Kingdom) pipeline, which we used to process 134 521 RNA-seq samples, achieving ∼12 000 processed samples per day. Our pipeline generated quality-controlled, annotated gene expression matrices that rival the manually curated gene expression data in identifying functionally related genes.
AVAILABILITY AND IMPLEMENTATION
LSTrAP-Kingdom is available from: https://github.com/wirriamm/plants-pipeline and is fully implemented in Python and Bash.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Collapse