1
|
Weine E, Carbonetto P, Stephens M. Accelerated dimensionality reduction of single-cell RNA sequencing data with fastglmpca. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.23.586420. [PMID: 38585920 PMCID: PMC10996495 DOI: 10.1101/2024.03.23.586420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Summary Motivated by theoretical and practical issues that arise when applying Principal Components Analysis (PCA) to count data, Townes et al introduced "Poisson GLM-PCA", a variation of PCA adapted to count data, as a tool for dimensionality reduction of single-cell RNA sequencing (RNA-seq) data. However, fitting GLM-PCA is computationally challenging. Here we study this problem, and show that a simple algorithm, which we call "Alternating Poisson Regression" (APR), produces better quality fits, and in less time, than existing algorithms. APR is also memory-efficient, and lends itself to parallel implementation on multi-core processors, both of which are helpful for handling large single-cell RNA-seq data sets. We illustrate the benefits of this approach in two published single-cell RNA-seq data sets. The new algorithms are implemented in an R package, fastglmpca. Availability and implementation The fastglmpca R package is released on CRAN for Windows, macOS and Linux, and the source code is available at github.com/stephenslab/fastglmpca under the open source GPL-3 license. Scripts to reproduce the results in this paper are also available in the GitHub repository. Contact mstephens@uchicago.edu. Supplementary information Supplementary data are available on BioRxiv online.
Collapse
Affiliation(s)
- Eric Weine
- Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Data Science, Dana Farber Cancer Institute, Boston, MA 02215, USA
| | - Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
- Department of Statistics, University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
2
|
Lin KZ, Qiu Y, Roeder K. eSVD-DE: cohort-wide differential expression in single-cell RNA-seq data using exponential-family embeddings. BMC Bioinformatics 2024; 25:113. [PMID: 38486150 PMCID: PMC10941434 DOI: 10.1186/s12859-024-05724-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/28/2024] [Indexed: 03/17/2024] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA) datasets are becoming increasingly popular in clinical and cohort studies, but there is a lack of methods to investigate differentially expressed (DE) genes among such datasets with numerous individuals. While numerous methods exist to find DE genes for scRNA data from limited individuals, differential-expression testing for large cohorts of case and control individuals using scRNA data poses unique challenges due to substantial effects of human variation, i.e., individual-level confounding covariates that are difficult to account for in the presence of sparsely-observed genes. RESULTS We develop the eSVD-DE, a matrix factorization that pools information across genes and removes confounding covariate effects, followed by a novel two-sample test in mean expression between case and control individuals. In general, differential testing after dimension reduction yields an inflation of Type-1 errors. However, we overcome this by testing for differences between the case and control individuals' posterior mean distributions via a hierarchical model. In previously published datasets of various biological systems, eSVD-DE has more accuracy and power compared to other DE methods typically repurposed for analyzing cohort-wide differential expression. CONCLUSIONS eSVD-DE proposes a novel and powerful way to test for DE genes among cohorts after performing a dimension reduction. Accurate identification of differential expression on the individual level, instead of the cell level, is important for linking scRNA-seq studies to our understanding of the human population.
Collapse
Affiliation(s)
- Kevin Z Lin
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Yixuan Qiu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, People's Republic of China
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
3
|
Lin KZ, Qiu Y, Roeder K. eSVD-DE: Cohort-wide differential expression in single-cell RNA-seq data using exponential-family embeddings. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.22.568369. [PMID: 38045428 PMCID: PMC10690270 DOI: 10.1101/2023.11.22.568369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Background Single-cell RNA-sequencing (scRNA) datasets are becoming increasingly popular in clinical and cohort studies, but there is a lack of methods to investigate differentially expressed (DE) genes among such datasets with numerous individuals. While numerous methods exist to find DE genes for scRNA data from limited individuals, differential-expression testing for large cohorts of case and control individuals using scRNA data poses unique challenges due to substantial effects of human variation, i.e., individual-level confounding covariates that are difficult to account for in the presence of sparsely-observed genes. Results We develop the eSVD-DE, a matrix factorization that pools information across genes and removes confounding covariate effects, followed by a novel two-sample test in mean expression between case and control individuals. In general, differential testing after dimension reduction yields an inflation of Type-1 errors. However, we overcome this by testing for differences between the case and control individuals' posterior mean distributions via a hierarchical model. In previously published datasets of various biological systems, eSVD-DE has more accuracy and power compared to other DE methods typically repurposed for analyzing cohort-wide differential expression. Conclusions eSVD-DE proposes a novel and powerful way to test for DE genes among cohorts after performing a dimension reduction. Accurate identification of differential expression on the individual level, instead of the cell level, is important for linking scRNA-seq studies to our understanding of the human population.
Collapse
Affiliation(s)
- Kevin Z Lin
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Yixuan Qiu
- School of Statistics & Management, Shanghai University of Finance and Economics, Shanghai,People's Republic of China
| | - Kathryn Roeder
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
4
|
Zheng X, Wu B, Liu Y, Simmons SK, Kim K, Clarke GS, Ashiq A, Park J, Wang Z, Tong L, Wang Q, Xu X, Levin JZ, Jin X. Massively parallel in vivo Perturb-seq reveals cell type-specific transcriptional networks in cortical development. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.18.558077. [PMID: 37790302 PMCID: PMC10542124 DOI: 10.1101/2023.09.18.558077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Systematic analysis of gene function across diverse cell types in vivo is hindered by two challenges: obtaining sufficient cells from live tissues and accurately identifying each cell's perturbation in high-throughput single-cell assays. Leveraging AAV's versatile cell type tropism and high labeling capacity, we expanded the resolution and scale of in vivo CRISPR screens: allowing phenotypic analysis at single-cell resolution across a multitude of cell types in the embryonic brain, adult brain, and peripheral nervous system. We undertook extensive tests of 86 AAV serotypes, combined with a transposon system, to substantially amplify labeling and accelerate in vivo gene delivery from weeks to days. Using this platform, we performed an in utero genetic screen as proof-of-principle and identified pleiotropic regulatory networks of Foxg1 in cortical development, including Layer 6 corticothalamic neurons where it tightly controls distinct networks essential for cell fate specification. Notably, our platform can label >6% of cerebral cells, surpassing the current state-of-the-art efficacy at <0.1% (mediated by lentivirus), and achieve analysis of over 30,000 cells in one experiment, thus enabling massively parallel in vivo Perturb-seq. Compatible with various perturbation techniques (CRISPRa/i) and phenotypic measurements (single-cell or spatial multi-omics), our platform presents a flexible, modular approach to interrogate gene function across diverse cell types in vivo, connecting gene variants to their causal functions.
Collapse
Affiliation(s)
- Xinhe Zheng
- Department of Neuroscience, Dorris Neuroscience Center, Scripps Research, La Jolla, CA, USA
| | - Boli Wu
- Department of Neuroscience, Dorris Neuroscience Center, Scripps Research, La Jolla, CA, USA
| | - Yuejia Liu
- Department of Neuroscience, Dorris Neuroscience Center, Scripps Research, La Jolla, CA, USA
| | - Sean K. Simmons
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kwanho Kim
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Grace S. Clarke
- Department of Neuroscience, Dorris Neuroscience Center, Scripps Research, La Jolla, CA, USA
| | - Abdullah Ashiq
- Department of Neuroscience, Dorris Neuroscience Center, Scripps Research, La Jolla, CA, USA
| | - Joshua Park
- Department of Neuroscience, Dorris Neuroscience Center, Scripps Research, La Jolla, CA, USA
| | - Zhilin Wang
- Department of Neuroscience, Dorris Neuroscience Center, Scripps Research, La Jolla, CA, USA
| | - Liqi Tong
- Center for Neural Circuit Mapping, Department of Anatomy and Neurobiology, University of California, Irvine, CA, USA
| | - Qizhao Wang
- Center for Neural Circuit Mapping, Department of Anatomy and Neurobiology, University of California, Irvine, CA, USA
| | - Xiangmin Xu
- Center for Neural Circuit Mapping, Department of Anatomy and Neurobiology, University of California, Irvine, CA, USA
| | - Joshua Z. Levin
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Xin Jin
- Department of Neuroscience, Dorris Neuroscience Center, Scripps Research, La Jolla, CA, USA
| |
Collapse
|