Qiu Y, Lu T, Lim H, Xie L. A Bayesian approach to accurate and robust signature detection on LINCS L1000 data.
Bioinformatics 2020;
36:2787-2795. [PMID:
32003771 PMCID:
PMC7203754 DOI:
10.1093/bioinformatics/btaa064]
[Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Revised: 12/13/2019] [Accepted: 01/24/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION
LINCS L1000 dataset contains numerous cellular expression data induced by large sets of perturbagens. Although it provides invaluable resources for drug discovery as well as understanding of disease mechanisms, the existing peak deconvolution algorithms cannot recover the accurate expression level of genes in many cases, inducing severe noise in the dataset and limiting its applications in biomedical studies.
RESULTS
Here, we present a novel Bayesian-based peak deconvolution algorithm that gives unbiased likelihood estimations for peak locations and characterize the peaks with probability based z-scores. Based on the above algorithm, we build a pipeline to process raw data from L1000 assay into signatures that represent the features of perturbagen. The performance of the proposed pipeline is evaluated using similarity between the signatures of bio-replicates and the drugs with shared targets, and the results show that signatures derived from our pipeline gives a substantially more reliable and informative representation for perturbagens than existing methods. Thus, the new pipeline may significantly boost the performance of L1000 data in the downstream applications such as drug repurposing, disease modeling and gene function prediction.
AVAILABILITY AND IMPLEMENTATION
The code and the precomputed data for LINCS L1000 Phase II (GSE 70138) are available at https://github.com/njpipeorgan/L1000-bayesian.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Collapse