Okada D, Zheng C, Cheng JH. Mathematical model for the relationship between single-cell and bulk gene expression to clarify the interpretation of bulk gene expression data.
Comput Struct Biotechnol J 2022;
20:4850-4859. [PMID:
36147671 PMCID:
PMC9474327 DOI:
10.1016/j.csbj.2022.08.062]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 08/26/2022] [Accepted: 08/26/2022] [Indexed: 11/03/2022] Open
Abstract
BACKGROUND
Differential expression analysis is a standard approach in molecular biology. For example, genes whose expression levels differ between diseased and non-diseased samples are considered to be associated with that disease. On the other hand, differential variability analysis focuses on the differences of the variances of gene expression between sample groups. Although differential variability is also known to capture biological information, its interpretation remains unclear and controversial. Recent single-cell analyses have revealed that differences between sample groups can affect gene expression in a cellular subset-specific manner or by altering the proportion of a particular cellular subset. The aim of this study is to clarify the interpretation of mean and variance of bulk gene expression data.
METHOD
We developed a mathematical model in which the bulk gene expression value is proportional to the mean value of the single-cell gene expression profile. Based on this model, we performed theoretical, simulated and real single-cell RNA-seq data analyses.
RESULT AND CONCLUSION
We identified how differences in single-cell gene expression profiles affect the differences in the mean and the variance of bulk gene expression. It is shown that differential expression analysis of bulk expression data can overlook significant changes in gene expression at the single-cell level. Further, differential variability analysis capture the complex feature affected by different gene expression shifts for each subset, changes in the proportions of cellular subsets, and variation in single-cell distribution parameters among samples.
Collapse